It’s a given I like video games. Even as an amateur journalist, writing reviews for content-driven-turned-forum game sites, I had specific guidelines and requirements on writing reviews. I could, for the most part, do whatever I wanted to in the review, but I had to summarize my points with a set of scores, which would later be included amongst a list of games. If anyone cared to know why I gave a particular score to a game, they could click and find out more, and, given I reviewed games like Dragon Quest (especially the SNES entries and remakes), I could’ve possibly gotten a handful of people interested in trying those games, especially if they were fans of previous entries. Scores are and were commonplace and expected then, and even Controller Nation uses some sort of scoring system. But, maybe this approach has become a bit dangerous and uninformative. Since scores are arbitrary, there’s no cohesive rule to determine whether or not a given score is “valid,” and, especially with AAA titles, there’s a very likely possibility that there’s pre-existing bias or influence that ultimately is the reason for a particular score, and the ending result is an inability to succinctly itemize complaints and praise about a game that are consistent with what a review is about: an opinion and an experience.
This wasn’t something people really concerned themselves with when I first wrote reviews, however. It wasn’t as much of an issue in the late 1990′s because publishers relied on fewer outlets to advertise a game, and being a gamer wasn’t “cool” then. Fast forwarding to today, I don’t write as many reviews, but I still read and watch enough reviews on a weekly basis. The basic structure of a review hasn’t changed. Yet reviews felt increasingly less genuine. Reviews started feeling more and more like commercials, which is totally pliable when they start making websites with clickable backgrounds advertising some new game I don’t want to play, and when more sites thought it was pretty clever to start throwing bumper ads in the middle of web browsing experience. And, although not the focal point of this article, amateur reviewers have at least one franchise that turns them into instant fanboys (e.g. Final Fantasy) where any title in that franchise or any title published by owner of said title gets immediate bonus points.
It won’t take much time on the Internet until you read at least one comment or article from someone crying foul because professional reviewers carry too heavy of a bias and give undeserving games high scores. More elaborate commentaries will clarify that reviewers are clearly lobbied to give a favorable review, and just doing a small comparison of average scores will prove that. To most people, just looking at the scores is proof enough, but either because I’m compensating for having a small penis or because someone will feasibly try to say that, at the end of the day, there’s no significant difference, I decided to test scores to see if 1) there is a significant difference and 2) what the “true” score could likely be. Since my conclusion is to ditch score altogether and write a coherent review, I clearly have the testicular fortitude required to make such an assertive statement, so we can assume my penis size does not require any form of compensation other than the vaginal massage I already get. Therefore, we can assume that whether here or somewhere else, some unemployed part-time liberal arts major with a Tumblr account entirely consisting of pro-cannabis legislation reposts will try to argue bias or somehow say in the end, there is no difference.
Now, let’s consider three games, starting with Final Fantasy XII. A quick glance on Metacritic shows a professional score of 92/100, while it’s user score is 7.6/10. (Note: PC and Wii versions also have ratings, but given the PC version’s score is similar to the 360′s and PS3′s, I’m not going to waste time proving the obvious for every single edition of MW3.) The disparity between Metascore and user score is so huge that if we were to measure congruence by ground subsidence, this would be the New Orleans of video game scores. Not only do the scores for each platform version of MW3 not match, the range of values where the “true” Metascore is 16 compared to the “true” user score range of about 0.3 (or 3 when adjusted to the Metascore proportion)[*]. What does this mean? The current (Jan 2012) user score for MW3 is pretty close to what the score would be if all users reviewed MW3, while the professional score range is as stable as a Ps1 Final Fantasy villain. No standard error in the world could make it at all feasible that the Metascore reflects the views of those who bought and played the game with money that didn’t come from Activision themselves.
Of course, I decided to see how a lesser known title (in the West) fares to a basic hypothesis test, so I went to a game I trust: Dragon Quest IX. DQ9, with a Metascore of 87 with a user score of 8.8[In full disclosure, this is because the number of reviews used to determine the Metascore is relatively small compared to user reviews. This would require more scrutiny if the ranges for the Metascore and user score were proportionately closer, but the hypothesis test of these two proportions clearly illustrates that, with significant confidence, they are not.