Thursday, July 10, 2008

The Great Horn of Bullshit

Sometimes you read something and it just sticks in your craw. Like this:
"It used to be...All Metacritics were higher once upon a time because it was ten professionals rating them. Now, sort of anybody with a pen can rate them and it ends up with a bit of a wider track some times.

"EA doesn't usually get the benefit of the cult - 'everybody has to rate it a hundred' thing going on - that happens sometimes even when they may not, based on the review, have played more than the first fifteen minutes of the game. But that's a separate issue."

That was none other than Electronic Arts CEO John Riccitiello.

Let's see: John Riccitiello's base salary is $750,000 a year. His discretionary bonus target is $750,000 a year. When he was hired by Electronic Arts, he was given the opportunity to purchase 850,000 shares of stock via non-qualified stock options.

He's complaining about people who review games for little or no money, who do it because they love gaming.

Two words: poor sportsmanship.

While it's a bit of whining assery, though, what really matters here is that when Riccitiello says something like this, it becomes a meme. Analysts will start mentioning this when they discuss Electronic Arts and the quality of their games.

So quality does matter, but if people of quality aren't reviewing the games, how can they determine the quality of the game? Tap that magic wand three times and suddenly, EA's games are better than they ever were. It's just reviewers that have dropped in quality.

That's a nice bit of sleight-of-mouth.

I don't disagree with him when he says that some reviewers barely even play the game--I've written for years that the reviewer should always state how much actual time he spent with the game. But why is he assuming that only the bad reviews are produced that way?

I think we should take a look at Metacritic and check on John. And since EA makes a ton of team sports games that come out every year (which makes it easier to track trends), lets use them as an example. Are the commoners really review buzzkill for EA Sports?

Here's what I did. I looked at review scores in the Metacritic Database for the following games:
Fifa Soccer (2001-2007)
Madden NFL (2000-2007)
NBA Live (2000-2007)
NCAA Football (2001-2007)
NCAA March Madness (2000-2007)
NHL (2000-2007)
Tiger Woods PGA Tour (2000-2007)

The years refer to calendar dates, not the year listed on the game cover. I looked at releases on all console/PC platforms, because quality can vary radically between platforms. If a game didn't have at least four reviews, it wouldn't receive a score in Metacritic and wasn't counted (I think that happened once).

What we wind up with are seven series with a total of 193 releases (again, all pc/console platforms are included) since 2000.

Total number of reviews for those 193 releases? 3,859.

I selected four "professional" websites that have been around the longest: IGN, Gamespot, Electronic Gaming Monthly, and Gamespy. Of the 3,859 reviews, those four sites had 573 of them. Let's compare their reviews to the reviews of the unwashed masses.

There are two ways to calculate an average with this data. First, you could just average the Metacritic review score for each game, regardless of the number of reviews. Second, you could take the average Metacritic score and multiply it by the number of reviews for that game. Do that for every game, then divide the total by the number of reviews. That way, games with more reviews count more toward the average.

One note. Metacritic gives more weight to certain websites (the "professional" ones), but it's proprietary, so the averages score for a game is not exactly the average score for all reviews.

Why don't I just take the individual review scores and average them? Because manually compiling them would take weeks--it's just too labor intensive to be feasible. It would tell me how much a true average is off from the Metacritic average, but I don't care.

Okay, let's take a look.

The average Metacritic rating for all these games, across all console/PC platforms, was 80.67. If I used the second method, and gave more weight to games with more reviews, it made almost no difference: 81.06.

The professional sites are included in those numbers, by the way.

So what was the average review score of just the four professional sites I selected?

81.58.

No bias there, seemingly.

Let's look at it in two eras: 2000-2003 and 2004-2007 ("AR" is All Reviews, and "FP" is Four Professionals).
2000-2003:
AR 84.11
FP 84.30

2004-2007:
AR 78.23
FP 79.78

Those "all review" averages were calculated using the "first" method I explained earlier. The second method gave AR averages of 84.88 and 79.25, so again, either method gives nearly identical results.

Now let's look at it by year, with the AR average score first on the line, and the FP averages next.
2007: 74.63, 75.37
2006: 77.35, 77.70
2005: 79.07, 81.55
2004: 83.04, 84.10
2003: 86.52, 85.79
2002: 84.22, 83.65
2001: 82.70, 83.39
2000: 80.90, 83.50

Time to sound The Great Horn of Bullshit.

Riccitiello is right that there are more reviews now: in 2000, there were 14.2 reviews per game, which rose to 20.42 by 2007. But he is absolutely incorrect that "anybody with a pen" is dragging down the average review score.

John, your average review score is down twelve points in four years because EA treats sports games like fashion design: every version has to focus on "new features" instead of improving the foundation. That's why Sony's MLB: The Show is the best sports game franchise, by far, right now: they fix what doesn't work. It's a priority. That may not be sexy in a marketing sense, but it makes for a much better game over time.

And please don't give us any crap about the difficulty of an annual release cycle. You created that cycle, not us.

When we can play a game and find glaring issues within hours, issues that no one would even dispute, then something is wrong with your development process. Maybe you should look at that instead of the quality of today's reviewers.

Speaking as one of the unwashed masses, obviously.

Site Meter