|
Introduction
The more angry words and bitterness assigned to this debate over
numbers, the more it seems as though all parties involved have lost
sight of the full scope of complexity inherent in the scoring system.
Having followed Points of View since its inception, then a reader and
now a staff member, I feel it important to remind everyone involved in
this debate why it is an issue worth discussing. And, more
importantly, why it is a debate that will never be resolved by current
measures.
Scoring: Why Bother?
Scoring systems, by themselves, have no objective value. If I give
you a review score of 8 on a 10-point scale, but with no context,
there is no way you can accurately determine what that score is
supposed to represent. Is it a high score? Low? Well, if I submit nine
more reviews with scores of all 10, that 8 will seem quite low,
comparably. Conversely, if those nine reviews were all 2, the 8 will
seem exceedingly great. In effect, the value of a scoring system is
based entirely on its ability to differentiate between two items in a
numerical fashion. Of course, a scoring system can also be given
context simply by defining what each number is supposed to represent.
Points of View has done this already, and it is clear that a 5 was
average, 10 was perfect, and so on.
A subtle problem of logic arises though, when such a system lacks
variety. Put simply, a casual glance through the review archives will reveal that 1-5
scores are almost completely absent, at least comparably so to the
oceans of 7's and 8's. Roku mentioned in his editorial "[...] the
majority of the reviewers used 5/10 as average." And yet, if this were
true, then why would a change in scale be necessary? Where were all
the 5's when such a score was supposed to represent a clear majority
of all games? I also distinctly remember a sense of disgust
throughout my review career as a reader in how 7 become the unofficial
standard for average - that very disgust was the foundation of my own
attempts at an objective reviewing style that ensured even the games I
felt were the best on the market (i.e. my favorites) were scored no
higher than 8 if they had any flaws whatsoever. This leads into one of
the first problems with review scores.
Objective versus Sentimental
The overwhelming majority of RPGs are not afternoon affairs, to be
played between instant messages and commercial breaks. They are
investments of sorts, consuming significant amounts of time and money.
Quite honestly, we expect a return on our investments greater than
what we funneled into it - if we are merely coming up even, we have in
fact lost the opportunity to have invested that time/money into
something that could have been better. No one is angrier than the
person walking away from an investment that actually wasted time and
money with little to no benefits.
But I know firsthand the psychologically comfortable feeling that
comes from hasty rationalizations. "The battle system wasn't
that bad." "The script was terrible, but there were some funny
bits." "Music wasn't important in the game anyway." We have a
psychological motivation to inflate our review scores, as by doing so
we affirm to ourselves that the game we just completed was not as
significant a waste of time and money. This is more than just a
presumption - it is backed up by the very reviews whose scores are
inflated. The text the numbers are supposed to be based on actually
run counter to the numbers usually assigned. Moreover, I often noticed
that while a reviewer might be quite candid in the numbering system
inside the review (i.e. often using low scores for things such as
Interaction, Music, ect), the overall score would mysteriously jump up
several ranks in the final analysis.
This "mysterious" variable is supposed to be Fun Factor, and Points
of View encouraged its use in the final analysis under the old system
since it lacked its own explicit category. My main concern with Fun
Factor is that by making it a variable that applies to everything, it
reduces the usefulness of all scores. For example, you may hold the
opinion that watching paint dry is exceedingly fun, and as such your
scores for the hypothetical game "Final Paint Dry Fantasy IV" will
inexplicably be inflated. By the Points of View guidelines, you do not
have to explain that you think drying paint is fun, so there is no
real way to determine the source of the inflated scores. I do not
enjoy watching paint dry, and yet it is pretty difficult to argue with
a 9/10 score by itself. Some might reason I should be basing my
decisions on the text of the review instead on the final score, and I
agree. However, the point is moot - if scoring were not an issue worth
discussing, we would not be using numbers to begin with.
One of the least bereaved changes under the current system was the
amalgamation of Localization, Interface, and Fun Factor into the
catch-all category of Interaction. I have hitherto remained silent on
my distaste for this category primarily because Fun Factor was never a
real issue for my own reviews - I strive to remain as objective as
possible, allowing the reader to project their own sense of Fun Factor
onto my scores rather than the other way around. I began to discover,
however, that many readers apparently did not get the memo that Fun
Factor was no longer supposed to influence the final score more than a
single category can. Additionally, I also played a series of games
that had terrible interfaces and questionable localizations but
remained fun (similar to Xenogears), or ones with acceptable
interfaces and great localizations but were otherwise completely
boring (similar to Vanguard Bandits). It makes little sense to take
what makes games fun for most people in the first place - the battle
system or plot - and smash it into two other things that are usually
no more noteworthy than replay value.
The solution that I endorse the most would be giving Fun Factor its
own category. Generally speaking, the final score will still be
inflated (as Fun Factor will almost always be scored as a 5), but at
least now we can tell why simply by looking at the numbers. This
allows us to essentially remove the majority of a reviewer's bias from
the overall score in cases where we fundamentally disagree with that
person, as in the paint dry example. Having this new category will
also necessitate an explanation from the reviewer into what he or she
thinks is so fun about this game. Which, quite honestly, is the reason
most reviewers write in the first place. Under the current system, we
are supposed to sort of divine from the text and prior reviews what
the reviewer enjoys in an RPG - this hypothetical system makes it
explicit, and allows readers to understand the reviewer's bias from a
single context.
One Step Forward, Two Steps Back
I was recently made aware that Roku's campaign to bring back the
10-point scale has resulted in a compromise of sorts with a 5.0 scale.
Although I suppose a compromise in of itself is praiseworthy, it
should clear by now that it does not fix the scoring problem, and
really just obscures it under a false sense of progress. In practice,
you can be sure that a score of 1-2.5 will rarely be used just as 1-2
and 1-5 were rarely used before. I will also make the prediction now
that we will see sentimental scoring raise the overall score to a 3.5
in most reader reviews, just as a 7 was before. This only prolongs our
current predicament with guessing biases while attempting to
differentiate between "good," "great," "fantastic," and other such
synonymous words.
Furthermore, while this change seems to make a wider range of
scores available, this is a fairly moot point - the amount of reviews
with an overall score of 1-2.5 will not see a measurable increase, due
to the basic fact that there is little incentive to finish a game that
you do not enjoy playing and then taking the time to write a review
about it. Since the 5.0 scale does not allow fractions in the
individual categories, the one place where more options would actually
illustrate a point, the supposed victory is a hollow one.
Conclusion
Since we are unwilling to do away with the numbers altogether, the
most sensible choice given the circumstances would be to create a new
category labeled as Fun Factor and use that as a hedge against the
natural instinct to inflate scores. A return to the 10-point scale is
not entirely necessary, but it would be useful for the one numerical
section of the review that actually matters: the categories
themselves. In the worst case scenario of no further action taking
place, hopefully this editorial has made everyone just a bit more
aware of the difficulties of scoring, and why starving Points of View
out of some perceived slight is not going to ever change or even
address the underlying problem which caused the animosity. And to
those that do take the time to review, perhaps just a bit more
inclined to score objectively so as to help give the scoring system
itself a sense of actual value.
|