Tobold's Blog
Monday, August 13, 2012
Another example on the evils of review scores

I am a firm opponent of review scores. Imagine somebody was wondering whether he should rather take up chess or basketball: How could a site which gives a review score to chess and basketball be of any possible help? Different games appeal to different people, and pretending that you can put a number on a game which is true for everybody is just plain crazy. And it can have negative consequences on innovation.

Case in point: The Secret World. It got a "low" metacritic score of 72, causing Funcom's shares to tank, and the company to announce layoffs. But the metacritic score was just an average of some people absolutely loving the game, and others not being impressed with the unusual setting, progression system, or pure technical performance. The relevant number for a subscription MMORPG is obviously the number of subscribers and the time it manages to hold onto them, not the review score.

One major reason why review scores have no information value is that nobody can agree what exactly should factor into them. Games have received low scores for having DRM, while obviously many legit players wouldn't think that being prevented from stealing the game should be reason for a downgrade. The Secret World got a lot of flak for its bugs, but different players are bothered to very different degrees by a game having bugs, and have very different acceptability standards for bugginess. The Secret World was "less buggy" than previous Funcom games like Anarchy Online on release. How important is polygon count for a game? How much do you add or substract to the score of a MMORPG because it is NOT a generic fantasy setting? You'll never get a group of people to agree on any of these questions, so how can a review score represent anything?

Reviews should simply tell people what they consider good or bad about a game, and describe it as good as possible. Explaining how The Secret World uses a skill wheel instead of levels does give valid information to both people who love levels and to those who hate them. Rating that feature on some 1 to 10 scale and then producing an aggregate score over many different features just tells the reader nothing much.

I'm not a fan of review scores either - even if I've been working as a professional games journalist for eight years - but this isn't so much the fault of the scores as it is investors and players (way too many of them) staring themselves blind on the aggregated Metacritic score instead of the actual reviews. A single bad review can make that score tumble, and then we get situations like this.

So I'd say this is more of a discussion about Metacritic (which I hate with a vengeance) than scores. I've had employers that have come under fire because of a low score dragged down the Metacritic score. Publishers go crazy, but the vitriol coming from players can be even more scary. That's a bigger enemy to proper and honest reviews than anything else.
I dont look that much on what the reviewers say on metacritic, its much more interesting to look at the user score and reading what the users think about a game (IMO).
I chose my games mainly based on blogs. If a lot of people start talking about a MMO (as they have done recently with GW2) I'm inclined to consider buying it.

I don't think I've ever even looked at Metacritic. It would be like choosing which game based on which raindrop runs down the window first. Without the credibility that comes from interesting writing (such as this blog) the internet is just random noise. Why would you make your spending and leisure decisions based on random noise?
Tobold, read the low scoring professional review and see whether you think it's fair or not.
Do you think that a game can be objectively good or bad?
I only look at the user scores and if they are enough votes to get a "valid" score. After that there are a type of games(RPG, turn base strategies) that I will play/try them out even if the score is 2/10 and there are game types that I need a 9/10 to even think to try them..

Its like the movies..when you see on IMDB a movie with a score 7+ out of 20.000 users, it is more likely that the movie is very good. Although still you have preferences...If I see a historic movie (like Braveheart) with a 4+ I will see it, but if it is a drama, I need 8+ to watch it :P

Generally, over the years, I think everyone have made his personal formula as to how to judge the critics and scores and by experience he already know how to treat them
So many problems with the whole Metacritic concept.

For me, a review only has meaning when I know enough about the reviewer's tastes, likes, dislikes and biases to parse it. I can use a Mark Kermode or Peter Bradshaw movie review to make meaningful decisions because I've read, watched and listened to them reviewing films for over a decade. I have sufficient context there to parse a negative review as positive-for-me and vice versa.

An aggregation of sores from sources about which I know nothing is utterly useless. How do I know whether some or all of the reviewers giving low scores have tastes that match or conflict with my own? I'd have to read the reviews even to make a start on finding that out and that completely defeats the purpose of a site that aggregates just the scores.

Then there's the issue of "good" and "bad" games. We have no objective standard for this and we never will have one. The review Spinks links is interesting in that it gives a low score for technical flaws. These, at least, are mostly going to be accepted as A Bad Thing in a way that an aesthetic choice that the reviewer dislikes couldn't be.

In an MMO, however, technical issues can and do get fixed. The problems with Missions that review focuses on did indeed cause an annoyance in The Secret World. For a couple of weeks. Then they got fixed, or most of them did. I haven't run into one for ages. But while the problems may go away, the bad score doesn't.

At the most a reviewer might re-visit an MMO once, maybe six months or a year down the line, to see if it's improved. Usually it will be one visit right after launch, one score then forget about it. That's bad enough for a review that sits there on the internet forever, but as a component of a supposedly current aggregate score it's poisonous.

Increasingly reviews of MMOs intended to suggest buying decisions are pointless anyway. Every MMO has something as near as dammit to an open beta at some point. If you want to know whether to buy one, just try it and see for yourself. Or wait until it's released then try the free trial. Why bother looking at a scorecard?
You all have super-excellent and valid misgivings over the whole Metacritic thing.

One thing, though: the average scores over there are pretty good at reflecting the quality of most games. I can't recall a single excellent-but-misunderstood game that has tanked on Metacritic.

I'm sure you all will remind me of that gem though!
Tobold, read the low scoring professional review and see whether you think it's fair or not.

Are the words fair? I think they are. Is the "2 out of 5" rating fair, just because there are a few broken puzzle quests? I don't think so.

Do you think that a game can be objectively good or bad?

I think a game can set out to achieve a certain goal, to challenge certain skills, and then either succeed or fail to reach that goal. Thus I would find it perfectly possible to play through a dozen Bejeweled clones and say afterwards which of them was better or worse than the others, because of some UI problems or gameplay additions that aren't fun. Thus you can have a good or a bad Bejeweled.

But if somebody hates puzzle games, he'll hate even the best of the Bejeweled games. It would be impossible to say whether this Bejeweled game is better or worse than that first-person shooter, because the two games set out to achieve very different things, and will appeal to very different people.
@zeno unfortunately user scores have become even more unreliable than critic scores these days. It has become almost a sport for semi organised groups to pollute user review scores over some perceived slight or another.
This comment has been removed by the author.
Tobold, read the low scoring professional review and see whether you think it's fair or not.

I have played this game since day one, actually before in a couple of beta events close to release, I have run into THREE quests that I could not complete the first time I tried them. One of those worked for me the next day. The other two I did not re-try.

I do EVERY quest I can find before leaving a zone. That is my play style across every game I have ever played. Three broken quests out of HUNDREDS. I simply did not see all of these broken quests the guy is complaining about. And I did them ALL (at least every one I could find - a few are in out-of-the-way places, to be stumbled upon while exploring. I did a lot of exploring).

So this guy could not have played the game for long. Why should his opinion matter, when he is not familiar with the subject?

One valid point he brings up is lack of documentation on the game mechanics, though an experienced gamer can figure it out easily enough. My personal peeve is the limited character customization - as compared to other more recent games. Not bad, but could be much better.

Other than that the game has been rock-solid for me. Enjoying it.
The journalist described how a single broken quest broke his faith in that the puzzle quests are doable. It only takes one broken quest to achieve that. And his opinion that this is very immersion-breaking is totally valid.

How many broken quests one will accept obviously differs much from player to player. That journalist appears to have an unusually low tolerance level.

Come on you really can't believe this "give us the positives and negatives" bs.

There is positives to any piece of dog poop game. What the heck does it matter if the "positives" hide the fact it's a bad game?

Any heuristic no matter how arbitrary is better than no metric at all. What we do is compare game scores to other games within the sub-type.

Getting a 72 is only "bad" when compared to other MMOs. In life there are winners and losers. Obviously Secret World was a loser game (at least initially).

To suggest that we should somehow go back to qualitative land. And just sit in a circle and 'feel' our evaluations of the world around us is pure sillyness. Your niceness flys in the face of a modern world that was based on empiricism. Why go all soft science on us?
Actually @Angrygamer, I rahter thing he does. And I do as well.

Rock Paper Shotgun is the only review site I bother to check anymore, simply because they don't give a number score.
The thing with critics is many of them have never actually tried to create the thing they are criticising.

Sure you can compare similar things but if you aren't a movie director or actor then I don't give a shit what you think about a movie. If you aren't a chef, then you don't have the right to judge food. If you aren't a game designer then your opinion on other games is irrelevant.

Sure you are entitled to your opinion, as is everyone. I just don't give a flying turd about it.
Thank you, Tobold. You put it exactly as I would: Tom Chick's written review was honest, fair, and, I think, correct. The number value associated with it was ridiculously low.

Worse, it's now forever associated with the game and it's had a demonstrably negative impact on the game's success with investors. Even if he revises his review and scores the game higher (which he's unlikely to do, as he's moved on already to different games), that original 2 out of 5 will continue to have striking impacts for the game's lifespan.

Same thing with the PC Gamer review: the text was quite positive and the review explicitly stated that he enjoyed the game. Then he scored it a 69/100.

This from a Website that gave Diablo 3 a 93/100 even while players were literally prevented from logging in, and Mass Effect 3 a 93/100 even though there were investigations into whether it was a finished product upon release (which, of course, it turns out that it was not).

Do I wish that Metacritic were less important, seemingly, in the eyes of investors? Of course. But if that's going to be the way things go, then I think reviewers (and the editors who select what reviews are aggregated at Metacritic) have a responsibility to score game using some objective criteria, and not give a game that the reviewer likes a sub-70 score and a clearly broken game a 90+.
The whole deal with 'broken puzzle quests' is a crock of shit anyway.

I've been playing TSW the last couple weeks now. On the handful of occasions this is known to happen, if you ask about the bugged mission in /general or /missionhints you will frequently get an invite to a working instance. If you don't, you can ask a GM to advance the mission for you. And they do do it. Cheerfully and quickly. I've never had to wait longer than 10min to get a GM to advance a quest that was broken, at all hours of day or night. I've only had to do it four times, over four zones.

Every week or so they patch these broken quests, and they use lateral thinking to do it in some instances. Sometimes, if the scripting is too difficult to figure out why it's breaking, or if it's something that a player can do to accidentally break the instance for everyone, they've actually found solid work-arounds that involve avoiding those mechanics altogether. That doesn't just bypass programmer stubbornness, but it's smart.

On MORE than one occasion, people have cried and stamped and jumped up and down about a quest that was bugged... but on closer investigation, they were just doing it wrong. (Think of the Raven-related missions in Kingsmouth specifically.) And the nice thing about the community is that if you actually have the fucking humility to ask if it's bugged or if you're doing it wrong, people will incredibly patiently try to help you and stay as spoiler-free as possible if it's you who's wrong.

That's probably what's wrong. When's the last time you saw a reviewer admit they were shit at something?

People who don't like that kind of game can say it's not for them, and that's fine. But it doesn't deserve a lower rating, because I love the hell out of that kind of absence-of-hand-holding. To actually be in the wrong, because the game was actually right, AND gave you all the clues you needed, but failed to see. But to call them game-breaking bugs is downright lazy and maliciously dishonest, disingenuous at best.
For me, the missing piece is context. I.e., I would like movie reviewers to rate say Star Wars, Die Hard, Annie Hall, Blue Velvet, Citizen Kane, Das Boot. I can predict much better my experience if I know they are a "no plots or sad endings" or a "no good movies since color" person.

Similarly, if I know someone rated CoD & SC2 highly and WoW/SWTOR and turn-based games low, then their review is not that relevant to me.

Hmm, could you add factor analysis to metacritic? Ask each reviewer 2-5 ratings of other things and ask me those questions and show me the metacritic rating of reviewers clustered around me.

In the real world, this allows people to view and read "news" and opinion that rarely disagrees with them.

Even better would be if the review picked a few stereotypical customers (e.g. Timmy the Twitch, Konsole Kid, MMO Mama, ... ) and rated the game for each of them.

Y'know... metacritic used to serve a fairly valuable function.

It helped to prevent us from getting completely whitewashed by publisher-paid IGN/Gamespot scores.

The older user reviews had balanced scores that reflected their overall opinion of the game, as opposed to 0 for dislike and 10 for like. We hadn't yet seen developer pay tied to score results, or off-site forum threads announcing which feature of a new game any given little community hates and signing up score-bombing in protest.

Go have a look at the 'top PC games' section of Metacritic right now. Compare it to the 'top xbox360' scores. Note that nothing in either has been able to breach 88 as the top score. Note the difference between user reviews and critic reviews. Note the relative lack of AAA titles.

Someone could do a Statistics PhD on Metacritic.
cam, I think you left the filter on 'last 90 days' and not 'all time'

still your point stands...

and wow, how is it that a site that claims that 'out of the park baseball 2007' is the second best reviewed PC game since the mid 90's has a huge financial impact of game developers
An 80+ Metacritic score will tell you that a game was released polished, finished and relatively bug free...80% of the time.

For that fact alone it's extremely useful, and it's for that fact that the gaming industry depends on it. It's the best, most objective method for tracking bugs/polish they have.

I will never buy a game with a sub 80 metacritic btw. I don't have the time nor money to gamble on games that haven't figured out you only get one launch. -shrug-
oh, and at Galleon: you have to look at the total number of reviews. How many does Out have? 6? c'mon man, that's 200 lvl statistics.
@jimr9999us: That is EXACTLY the point. You think the publishers are looking at sample size? No. They're saying: 80 or die. If you only had 800 reviews and 100 of them were pissed-off fanboys that you portrayed a non-canon version of a beloved character, then I guess you're fucked.

And you can't go hire some shady PR firm to go flood the site with positive reviews, because we've seen people get caught out with the Shady PR Firm dealings, and the fallout was biblical.
Well, duh. The fact remains though that all of the games are held up to the same metacritic scoring mechanism. Regardless of how the ultimate number is reached the process is the same. Secret World got a 72, which I feel personally is a fair score. Its a good game hobbled by a uninspired combat system and endless rounds of gofer quests, just like every other MMO out there. The 72 reflects not just the quality of a game in general but the overall genre fatigue that the market is experiancing.
The 72 reflects not just the quality of a game in general but the overall genre fatigue that the market is experiancing.

So how could that score be equally valid for one player who has played every MMORPG since Ultima Online and another player who is new to the genre?
@Gooneybird: Also, how do you account for the fact that your opinion is wrong? ( my opinion. Just proving a point, see. But no, seriously. You're wrong. I'm a 13yr industry connoiseur of unverifiable qualifications/experience. I should know.)

The answer can't always be, "That's because [opinion which doesn't match mine] is the result of [fanboyism/hatas]."
@Gooneybird: I'm sory but I thought TSW got a 72 because it launched with broken chat and no raids.

How the hell can you launch a mmorpg in 2012 with broken chat and no raids?

Sorry, I wanted to play TSW I really did and just personally I was more excited about it than GW2 but after that launch FunCom might be the first of the big mmorpg dev's to go completely under.
@Joseph Skyrim

"If you aren't a game designer then your opinion on other games is irrelevant."

I would say the opposite.

I couldn't give a monkeys what John Carmack, Notch or Ghostcrawler think makes a good game. They can reflect their opinions in their own creations.

A good game critic should understand his/her audience and relentlessly play every title to recommend what the audience will enjoy the most.

Game designers with an axe to grind need not apply.


I have a question. First, though, note that my starting point is that a 7 (or 70, or B or whatever) is a pretty good score. Not great, but certainly a game worth buying if you like the genre etc.

With that in mind, can you list some great games that were treated manifestly unfairly over at Metacritic? I'm not saying this to try to demonstrate a point. I'm genuinely interested in knowing if the system really is that so out of whack that most of you are saying.

My own feeling is that review scores are pretty useful. They give a handy summary of the reviewer's feeling towards the topic. Not an exclusive, singular metric, but a general idea. Oh, and Cam: you are never wrong.

And obviously publishers love metrics. And obviously they will make some really poor decisions based on those metrics from time to time. But does anyone really know that publishers demand certain Metacritic scores from developers? What happens if they don't achieve that score? Does the developer have to repay the development money? What if this lousy game sells millions of copies (I'm sure Cars 2 the game sold well, and I'm pretty sure critics didn't cling to it like bees to honey)?

And Galleon: I haven't played it, but I understand that OOTPB was a pretty awesome game. In its genre, of course.

First, though, note that my starting point is that a 7 (or 70, or B or whatever) is a pretty good score.

That actually touches another weirdness in review scores: A really, really bad game, one where actually everybody agrees that it is really, really bad, gets a score of around 5 out of 10. The scale does not go from 1 to 10, or 0 to 100, as one might think. Thus 7 or 70 is considered already rather bad.

With that in mind, can you list some great games that were treated manifestly unfairly over at Metacritic?

That would require me giving a score to a game and comparing it to the metacritic score. My point is that the best I could possibly do is a totally personal score, but not one which holds any universal truth.

Having said that, look at the difference in score between the critic reviews and user reviews of Diablo III and tell me which of the two is "right".

I'm not really sure what I'm saying here, but I think my point is that I agree that all reviewer scores are completely subjective. It's just that I draw a different conclusion from what happens when you aggregate many such scores.

For me, the Metacritic averages tend to fit my general opinion quite fairly. Yes, I thought Diablo III was a great game (it gave me plenty of hours of fun compared to most games and no, I'm not playing it right now. Will probably return to it though). Conversely, I have no illusions that "The Expendables 2 Videogame" (Metacritic score: 36) would be much fun to play.

The reason I asked for examples was really not to try to get you to attribute a specific point score to any given game. Rather, I'm interested in seeing whether you (and others) are critical against the Metacritic system out of principle or from practical experience (neither rationale is in any way wrong, of course!).

And yeah, the review scores thing is interesting. When Edge had one of their articles about this many years back, they pointed out that they wouldn't review most really bad games, so there would be a propensity for the average to hover around 6.5-7, rather than the more natural 5. I think that was a fair argument. But still, many review sources clearly have a much higher average than that.
Reviewers and Metacritic have no ties to developers. Why should they be thinking about the publisher's financials when writing reviews? If they are giving their honest opinion, then why has anyone got an issue? No one has to listen to that review. I tend to read a good, bad and average Metacritic review if I'm smacking a decision. Everything should be taken in context. If publishers are basing decisions purely on Metacritic, why is that the reviewers problem?
I don't think that the stock tanking was directly caused by bad reviews. The bad reviews correlate to bad sales of the game. Bad sales led to funcom not meeting their sales forecasts, and I'd wager that the missing forecasts is what caused the stock to tank, not the reviews. The reviews are just a convenient excuse.

Correlation is not causation.

So how could that score be equally valid for one player who has played every MMORPG since Ultima Online and another player who is new to the genre?

Obviously because the score is not representing the judgement of any particular player, rather an aggregation of review scores.

One could argue the validity of the process butit is, the same process for every game.

In other words, your objection would be valid if the whole review process existed only for TSW. Metacritic scores are derived from a whole chorus of voices and not just one.

Review scores exist for the purpose of quantification for several reasons including (but not limited to) games as a media have relatively short shelf lives, and quality can (and does) vary wildly from one title to the next. And of course carry a price tag that for most people limits consumption to a title or two a month for standard games.

How else would you review a game? How else could you compare it to other similar products? There are enough competing products available that one can compare them relatively objectively. So why not do it?
Scores must be reproducible/verifiable, and require a context. Without a context, they're meaningless. The context IS the review.

Scores are all too often taken out of context. It isn't only the reviewers to blame here. For example, the reader puts the score into context comparing the product with competitors which the reviewers for whatever reason did or did not use (ie. play), or the other way around. Your examples are solid, but don't go far enough. To understand a review you must understand the reviewer.

I believe scores are a lazy way to define a conclusion because the author lack the ability to define such in words, because the words themselves are not convincing enough.

Scores can work, if the procedure to define the score is reproducible/verifiable and if done by professionals. This is why you want goal line technology in football (soccer for you Americans), and this is also why gymnastics score definition is a median of various experts, and this is why the student can verify how the teacher reached their score due to feedback.

I remember reading gaming mags back in the 80s and 90s which would review games. Back then even the scores weren't verifiable. One gaming mag steadily refused to use scores though, instead referring to their reviews themselves as proof of the pudding. They were also a PC-only gaming mag. I ditched all the other ones who reviewed games on platforms I'd never own and moved on. I wrote above "To understand a review you must understand the reviewer". This is why I liked reviews from magazine. As a reader you started to learn to know the reviewer.

"Sure you can compare similar things but if you aren't a movie director or actor then I don't give a shit what you think about a movie. If you aren't a chef, then you don't have the right to judge food. If you aren't a game designer then your opinion on other games is irrelevant."

You don't have to "have done it" to criticize it. You don't have to sniff cocaine in order to understand how cocaine works and what it does to you. In WoW, as a raid leader you do not have to have played every class in every patch to understand how good a person playing class X role Y is peforming.

Of the other comments I think Hagu strikes a nail: if Google presents you the stuff you likely need, why can't there be some kind of algoritm so find you the reviewer you need?
I guess it really comes down to how much the game developers sell their game to "critics"-I quote-such as metacritics and IGN. It sucks, but hey it's economics.
Post a Comment

<< Home
Newer›  ‹Older

  Powered by Blogger   Free Page Rank Tool