Sunday, February 26, 2012

The inflation of review scores, the cursed 7/10, and a few ways to overcome them [Rating Editorial Part 1]

[NOTE: As of late 2013, I've removed numeric ratings from all my posts, so there will be some references in this post to a rating system that no longer exists. My apologies for the confusion.

So, I'm about three quarters of the way through Warcraft III, but I'm in desperate need of a break, and, in addition, I've got my first big English paper (nearly 20% of my grade!) due in four days. What this means for you is that I won't have a review up for a little while yet, so I decided to post a short (it's me, so you know that's a lie and that this is going to be really long) editorial to help tide you over, in case my Jing: King of Bandits review wasn't enough. This editorial is about the inflation of review scores, the "cursed 7/10," and a few ways to deal with them.

I feel like this sometimes...

To start off, let's talk about rating inflation. I don't know exactly when the 10 out of 10 rating system was applied to games and anime, but since their installation, things have changed a little. Much like the U.S. Dollar (as many of us are all too keenly aware), review scores have become inflated over the years. I actually attribute some of this to the rise of bloggers and review sites that are not manned by so-called "professional critics." This is because review sites and blogs (especially blogs) are run on people's opinions and likes. Here's a hypothetical (but often true) example: a blogger, let's call him Jim, really likes an anime which we'll call (for example purposes) "ABC," and then he posts a review for it on his blog, JohnJacobsJingleheimenschmidt (give yourself a high five if you get that reference). His review is probably fairly brief and says "I liked this, this, and this about ABC," and he gives it a 10/10 because he likes it so much. What James didn't realize is that despite all the good things about ABC, it has (just as an example) terrible plot construction, which makes it unworthy of the 10/10, according to the "non-inflated criteria." What Jim just did was create score inflation; he rated something higher, perhaps far higher, than it should have been. Now, even if Jim is not a very prominent blogger, he has just lowered the standards for what makes something a 10/10. They were only lowered a tiny bit, but they were lowered nonetheless. Perhaps a more relevant example would be if Jim had reviewed ABC and gave it a 7/10, saying that he recognized it the plot construction was poor, but saying the good outweighed the bad. So, just for examples' sake, let's say that the plot construction was so terrible that Jim was wrong, and that ABC deserved to be a 5/10. This is a much more common type of inflation, but it is inflation nonetheless. Now, I'd like to point out that there is nothing, nothing, wrong with Jim giving ABC a 7/10, or even a 10/10. He is a blogger, and the whole point of blogs is to express opinion. Jim didn't do anything wrong, per se, he was just expressing his opinion. The problem, though, is that Jim inflated review scores for everyone, not just himself. Now, let's say blogger Mark posts a review of the show XYZ on his blog, MarkyMark44, and he gives that show a 7/10. Let's suppose that XYZ is a show truly deserving the 7/10 score. However, people who see Mark's 7/10 score will say "Oh, it's just a 7/10, that's the same score that stupid show ABC got. It must not be any better." And that is the problem with Jim posting that 7/10 review. Now, XYZ is considered just as bad as ABC, despite Mark wanting to communicate that the show was actually pretty good. It's Jim's fault, but you can't blame him because he didn't do anything wrong. He simply shared his opinions, and who can blame him for that? This example showcases the main reason why score inflation has grown so much (to the point where a 7/10, instead of a 5/10, is considered average). A non-professional critic posted a review with a score that was too high, and as a result, he made all scores worth a little less, at least in the minds of the readers. Now, I'm not trying to say that a "professional" critic would do any better. As a matter of fact, whenever I hear the word "professional" applied to critics, I tend to get an ugly look of disgust on my face (that, however, is another post for another time). What I am trying to say is that a "well-rounded critic who rates based on criteria," like blogger Mark, probably wouldn't give ratings that caused inflation. The only reason I chose bloggers for the example was because they, generally speaking, write reviews based more on opinion and not criteria. Other reviewers are certainly capable of making inflated ratings as well, I just chose blogs because they more commonly give such reviews. Onto another section, which discusses one of the reasons why a 7/10 is considered "average."

My reaction when I see a hugely inflated score.
As I was saying, score inflation has progressed to the point where a 7/10 is considered "average." However, there's another reason why a 7/10 (and not a 5/10) is considered normal, and it has nothing to do with inflation. Instead, it has to do with schooling and the grade system. As approximately everyone who's ever going to read this blog knows, the grading system is set up like this:

100 = A+
90-99 = A
80-89 = B
70-79 = C
60-69 = D
1-59 = F

This is a problem for reviewers, because people will naturally try to apply the same system to any rating scale of 10 or 100 (because the difference between a 10/10 scale and a 100/100 scale is paper thin). The result is something like this:

10/10 = Masterpiece, perfect, amazing, go watch/play it now (in other words, A+)
9/10 = Great show/game (A)
8/10 = Good show/game (B)
7/10 = Average show/game (C)
6/10 = Poor show/game (D)
1-5/10 = Failure of a show/game (F)

I have a big problem with this. Why? Well, I'm not going to challenge the grading system of schools, but a rating scale that ignores literally half the scale as "failure" is, at least in my opinion, both foolish and wasteful. You may as well have a 6/6 scale, because everything from 5 down is a failure anyways. However, I'll hold off on my opinions for a moment yet. What all this means if that people start associating grades with review scores. This results in the "average," the "passing grade," the "C," being a 7/10. This is the other reason that a 7/10 is average. Surprisingly, though, there is one last reason that 7/10's are average.

I shouldn't need to tell you what I reserve this face for.
This final reason is that we, the great connoisseurs, the wise, better-than-mainstream-masses experts, in our infinite wisdom, refuse to get any game or watch any show that is only "average." Of course, this particular example only applies to some people, but I'm one of those people, so I want to talk about it. There are those of us who still view 10/10 scales the way they should be viewed, with 5/10 as the perfect average in the middle. However, we make the mistake of ignoring anything that isn't a 7/10 or above, because we only appreciate those really good games or shows that are "better than average." This gets us in trouble fast.You see, we still view the 5/10 as average, while everyone else views average as 7/10. In other words, what we consider above average, everyone else thinks of as average. As a result, we are disappointed when we find that what we thought was average was terrible, and what we thought was good was only okay. Thus, we adjust our scores, so that only an 8/10 or 9/10 is good. The problem is, reviewers (many of whom are paid through advertisements by game/anime companies, who want their products to get good scores. Conflict of interests, anyone?) pick up on stuff like this, so they inflate their scores some more so they can still appeal (and sell media) to the "intelligent" types. This cycle continues on and on, until we get the warped scale like this: a 7/10 is okay, maybe a little less than average, an 8/10 is just "good," a 9/10 is a 7/10 on the old scale, and a 10/10 is everything that's great. Since there really are a number of good shows and games out there, the result is a lot of 10/10's. We haven't quite reached this point yet, but we get a little closer all the time. In fact, a number of review sites have already started doing this, even if we (or the reviewers themselves) have yet to realize it.

The desolate, rocky landscape that those who only go for higher-than-average will find themselves in.
     The result of all this is what I call the "cursed 7/10." It has become the threshold between good and bad, between worth it and not worth it. Any score below a 7/10 means that the game or show is unworthy of your attention. That's why I call it cursed. If something gets anywhere below it, then it's failed. Even a 6.5/10 means failure. To me, using a 10/10 scale with only 5 real ratings (<7, 7, 8, 9, and 10) is just strange.

So now, we can finally talk about how all this relates to me. All this score inflation has left those like me, the people who dislike this score inflation, in quite a sticky situation. We want the 10/10 scale to look something like this:

10/10 = Masterpiece, perfect, go watch/play this right now
9/10 = Excellent product
8/10 = Great product
7/10 = Good product
6/10 = Above average product
5/10 = Average product
4/10 = Below average product
3/10 = Bad product
2/10 = Really bad product
1/10 = Terrible product, awful, the worst, avoid this show/game at all costs.

However, due to the inflation of scores across the internet, using this is extremely difficult. Here are a few solutions to this problem.

 - Use a non-inflated 10/10 scale, and make your rating system very clear.

 - Use a 10/10 scale, but rate using different criteria, so the ratings don't mean the same thing (e.g. rate based on how much you personally liked it).

 - Don't use any kind of "overall" rating at all, but instead just let your review speak for itself.

 - Don't use a 10/10 scale, but instead use something like Pros/Cons.

I'll go through the ups and downs each of these styles offers, starting with my own.

The abandoned village that is the unadjusted 10/10 scale.
     It's one thing to try to use a non-inflated, balanced scale, but it's another thing entirely to communicate this to your readers. If this blog ever becomes super popular (hey, I can dream, can't I?) I'm sure I'm going to have to spend lots, and lots, and lots of time explaining and defending my scores. People will think that a 5/10 means it's bad, or that a 7/10 means it's only average. That's why my method requires lots of reinforcement and explanation of my rating score. Even then, however, I doubt I'll ever be able to totally banish the idea that 7/10 is average from my readers minds. On the plus side, though, It is fairly straightforward to give ratings, and additionally it is easy to convey whether or not I liked something. Lastly, a straightforward number rating gives those people who don't have time to read long reviews an easy, and more importantly a quick way of knowing what I thought about something. If they trust me and my reviews (and if they agree with me the majority of the time), then they can just see the rating and know they'll probably like or dislike it.

The alternative 10/10 scale is like being alone at sea; survival is difficult, but at least you'll never be crowded by others.
     As for the alternative 10/10 scale, this one is tricky, and often only works with a dedicated reader base. Let's look first at the example I gave, a rating based on how much you, personally, enjoyed the show/game. This one poses a bit of a problem, because you either have to do what the above method requires (make it clear what your rating system is), or you have to put in lots of quantifiers such as "in my opinion." That single phrase is one of the most dangerous to use in a review, because "in my opinion" instantly translates in the reader's brain to "only in my opinion." Of course, if you don't care what kind of reader base you keep, then that's not a problem, but otherwise...
     Let's change examples. Let's say you use a 10/10 based rating system that rates shows on how effective they are [at appealing to their audience, tugging the viewer's heart strings, whatever]. In this case, you still have to make very clear what kind of a system you're using. If someone unfamiliar with your blog sees a low rating for a good show that you reviewed, they will assume that your low rating was used to talk about the quality of the show and not about how poorly the show used symbolism (or whatever). That reader will then go "WTF?" and either bash you about it, leave unsatisfied, or actually read the review and see what you're talking about. The first and second options are the much more common ones. This is why you absolutely have to make your review system clear, even more so than with my method. Of course, there are advantages to this system. For one, if it works, then you've forever freed yourself from the shackles of other people's rating systems. Your reviews will always be unique, and uniqueness on the internet is difficult to attain, indeed. This also lets you appeal to a larger reader base, because you're different from everyone else and have something only you can offer.

A style that uses no ratings is like climbing a mountain. Uphill until you reach the peak (i.e. get a dedicated reader base), then everything is downhill (easier) after that.
     Next is the no rating system. This system is the most risky, because you have to rely on your reviews to draw readers in, keep them interested, and clearly state your opinions. In this method, ratings almost look like crutches that others can rely on, a crutch you no longer have. Your reviews have to be interesting enough in concept that your readers will enjoy them. Sometimes this isn't necessary, especially if you have a pleasant writing style, but the fact remains that something to lure your readers in (other than the promise of an explicit rating at the end) will always be helpful. Also, you have to keep your readers interested. I, for instance, would never be able to use this style, because my posts are basically just walls of text. That's not what people want to see. That's the kind of thing that intimidates them into not wanting to read anymore. A steady flow of nice-to-look-at pictures, comparatively brief posts, or a new angle on something are all good ways to keep your readers interested, but you'll probably need something. Lastly, you need to make your position clear through your words. You no longer have the nice bow to tie everything up that is a rating; you have find an alternative. This method does have its advantages, though. For one, you'll probably only get the people who actually read your reviews. If we could see everyone who looks at the reviews we worked so hard on, only to skip down to the bottom to find our rating, then we would probably cry. All that hard work was simply ignored. With this system however, people either read it or they don't. You might cry at the sight of all the people who turn away, but at least all those who remained can enjoy your work and appreciate the effort you put into it. Even if your reader base isn't as large as some others, at least it's dedicated. This style also frees you from the manacles that are other people's reviews. Your review will always be evaluated for its own worth; your reviews and ratings won't be compared to others.

An alternative-to-ratings system is a hybrid: impossible to find a picture for, so just enjoy this pleasant screencap.
     The last style, the one that uses something like Pros and Cons, is a bit of a hybrid. It has the disadvantages of all of the other styles, but at the same time, holds nearly all the advantages. You have to make sure that people understand your system, and you have to keep your readers interested with something other than that straight up rating at the end. At the same time, though, you get nearly all the advantages. Your style will be at least rare, if not totally unique, your reader base will be mostly made up of those people who actually read your review, and you can appeal to the readers who need a brief summary. Granted, neither the good nor the bad are as "extreme" as in the other cases. You don't need to try as hard to make them understand and you already have a some kind of wrap up to keep them interested. At the same time, your style is not quite as unique, you'll still get people who just skip to the end, and readers may avoid your summaries because they don't contain explicit ratings. A difficult method to pull off, but a very rewarding one in the end.

     In the end, though, there's no easy way to make an alternative rating system work. Overcoming the inflated scores problem is a difficult matter, and there's no easy answer. So there, that's part one of my first editorial (and it may very well be my last, but we'll see). I hoped you enjoyed it and will enjoy the rest of the stuff on my blog! Thanks for reading!

Read part two here

What do you think is the best way to overcome score inflation? Do you think it's a problem in the first place? Share your thoughts!


  1. (Darn my comment got lost. Here it is again, but less than perfect from before).

    I find scoring anime sort of a waste and for good reason. The reason I never used the scoring system is because numbers are just numbers and do not really say thing that I can not in words. Even if I was to discover a method that is concise and clear, it is still inferior to expressing what I mean and often times, people can get the wrong idea. Just think it is too much confusing just like one from MAL, but that is just my amateurish, no-nothing two cents.

    1. The way I see it, there are three reasons to use number scores. The first is because people are used to them. A familiar system is always a plus, especially with people unfamiliar with the site. The second is that it makes the opinions of the reviewer very clear. There are people who have a hard time making their stances clear, and an explicit number can help clear confusion. As you say, though, there are people who can make their opinions very clear through their writing alone, so this isn't that big an advantage to everyone. And the final reason is that, provided the reviewer can avoid being arbitrary, it can help them summarize (and therefore conclude) their review quite well, and as anyone who's ever done format writing before knows, a good conclusion is one of the hardest things to attain.

      I don't use MAL (though I'm considering it), but I think I understand what you're communicating just from hearsay. In answer to that, I would never consider trying to *convince* other people of my views without having a written review, no matter what method I'm using.

      All that said, I do understand and partially agree with your viewpoint. If the reviewer _is_ arbitrary, then number ratings can simply confuse, and in the worst case will probably weaken their reviews overall. Indeed, an alternative system like yours may be best.

      And hey, your opinion is just as good as, if better than mine. Thanks for commenting, I appreciate all the different views I can get.

  2. I think there are just way too many different rating scales across the sphere to attempt conformity. What is important, though, is to encourage consistency. In a sense, this would prevent "inflation" of scores, because as long as we all are consistent within our own reviews, then our differences in ratings become a mere labeling issue. i.e. What I call a 5, another may call a 7, and yet another call 3 stars.

    The alternative and no number approaches interest me too. Personally, I tend to have a number scale but only for myself (to remind myself for later on), and I try not to use numbers for reviews that I post publicly, precisely because of the concerns you posted about numbers.

    Anyway, great post!

    1. That seems to be the general consensus. As you say, it's simply not feasible for everyone to use the same rating scale, but as long as people are consistent with themselves, inflation can be at least partially avoided. The problems brought up mostly arise from those who aren't consistent, or who don't write enough reviews for consistency to matter (if someone writes three reviews and there all consistent, that's still not quite enough for people to get a feel for their scale).

      I only started thinking about using alternative approaches a little before I wrote this. If I hadn't already established my style as one that uses number ratings, I probably would have tried an alternative too.

      Thanks, I'm glad you liked it! And thanks for commenting!

  3. I wanted to make my own numerical rating system, and found your articles while 'researching'. A very interesting read, and influenced my own system, which is here:

    Each of the sub-boxes have their own weighting. The first 4 headings are broken down into sub-sections, which you rate, then add up with your "Enjoyment" score at the end for a score out of 100.

    NB: You can use decimal places, but I really wouldn't suggest more than 1 d.p. - 0.5 intervals should be fine in most cases.

    For rating the subs sections use the following guide:
    0% = non-existent
    25% = lacking
    35% = bits missing
    50% = acceptable/average
    75% = Great
    85% = really solid
    100% = god-tier. absolutely amazing

    for each sub-section start at 50% and move up or down as you see fit. Add together and you have a rating! Remember to judge your final score using the system above, not the "cursed 7/10" as average.

    1. Hey, how cool is that? Happy if I helped!

      Wow, that's a very in-depth guide! I may have to try this sometime and see what I think.

      Thanks for dropping by and providing your own system!

    2. Thanks for the quick reply! I wasn't sure I'd get any, seeing as this is a 3 month old post...

      Anyway, please let me know how you find it! This is only the first version of this system, if you think there's anything I missed or things weighted incorrectly, please let me know.

      I found the whole "start at 50%" is a lot easier if the sections are rated out of something apart from 10. That said, for my personal weighting I had to rate enjoyment and plot out of 10, but I am still toying with the idea of changing those around slightly. I'm open to any suggestions.

      It's in-depth because of me :/ I wanted a more consistent scoring system, so broke down the categories into sub-categories, then thought: "why not weight them too..." - you get the picture.

      It makes 90+ a real gem. Marking shouldn't be harsh, it should be fair with a touch of generosity. Madoka get 96.2/100 from me, Steins;Gate gets 84.6/100. I may write up a guide for marking within each sub-section - I'll post it here if I do. In the meantime there's - my first attempt at a scoring system, though this was done without any research. It should give a good "feel" of how I rate things.

    3. Not at all! I love it when people check out my older stuff, so thank *you*!

      All in all, I'd say it's a good rating guide. I mean, I'm no rating expert, but I used it for a couple shows and it seems very straightforward. If I had to pick some weaknesses in it, I'd have two. First off, I feel like it might be counting the characters as too important. It depends on the anime, but take something like, say, Mushishi. That's an amazing show, something I'd definitely give a 10/10 or place in the 90th percentile overall. And yet, it's not really a character-centric show. They aren't as important to the kind of story being told, and as such they won't fulfill the requirements you set as well, which automatically places this show below a 90. Definitely not true of all series (like character-centric ones), but the system could be weighing characters in a little too much. The second thing is that it doesn't really incorporate the "style" of a show, which, as you know if you've read my style vs. substance post, matters quite a bit (to me). It definitely could overlap with enjoyment, but if I had to pick an oversight in the system, it'd be this. Of course, neither of these are especially major problems, and for a general rating guide (as in, one for every anime as opposed to specific types), yours works pretty well.

      I also agree that starting from the middle makes rating a lot easier.

      Thanks again for visiting and commenting!

    4. Ah, you spotted the problems :)
      I agree weighting has to be different between genres, and some (Mushishi for example, though I haven't seen it) don't even have a genre to define. I tried to get around this using "in-between" weightings, which has worked somewhat, but unfortunately isn't pin-point perfect.

      Each sub-section is rated really subjectively. Basically:
      -Empathy - how much you liked the characters
      -Change - do they change (have to somewhat)
      -Interactions - characters interact with each other, animals, the environment. How much did you like these?
      -Individuality - Characters aren't "textbook"

      I suppose it's easier for me to use since I made it. I hope the above example gives an idea of how rating is to be done.

      For Mushishi at least:
      You say it tells a story. The detail in the stories, how they vary, how well they hold together as an overall story - that's your Plot mark.
      Do you like the premise? Find it new, different, interesting? There's premise.
      Were you left with a great Final Impression?
      Did you like the atmosphere created, was it correctly paced? - Execution

      Mushishi looks like it'd score highly for those.

      As for character - Mushishi "tells a story". For 26 eps, by the end you have to understand the character(s) - and the side ones should feel genuine - Empathy.
      A large part of how they are developed is their interactions.
      Change is their interaction with themselves - e.g. making hard decisions.
      Individuality is pretty straightforward.

      Mushishi is an "amazing show". If you didn't like the characters enough to rate this highly, I'd be amazed.

      A good story, and good storytelling, builds up good characters. The two are very closely linked, though in some genres Plot is more important. Perhaps a distinction between plot and story would be good.

      Style: as I said, everything is subjective, so incorporates your own overall impression - largely derived from style. That said I think style definitely needs to be included.
      One thing I was considering was the use of multipliers. Unfortunately they're not very neat (as in giving nice rational decimals) but using just 1 may work - perhaps by turning "Enjoyment" into a "Style multiplier".

      I fully acknowledge that 90+ is next to impossible to attain. It's a very rare anime that would be that good. As such, another consideration of mine was to "compensate", i.e. each sub-section is worth 1 or 2 more than the current weighting. That way being slightly worse in one area can be compensated in another that is top-notch. My only problem with this method is that those getting above 100 would all be considered "the same" - which I don't like.

      I have a feeling any comments on this will be really long, sorry about that. Using what I described above, in the next comment could you post your sub-scores for Mushishi? That'd be interesting to see. On review, this scale is designed for you to be somewhat generous in giving out marks.

      tl;dr - read the bits on Mushishi, and please give opinions on 1)multipliers, 2)"compensating"

    5. I don't think it's possible for a single guide to be pinpoint perfect. The different kinds of stories anime can tell are, practically, limitless. So coming up with a single, all-encompassing review guide would be pretty much impossible.
      Mushishi did do really well on the story part. Got 27.5/30 (4/4, 9/10, 6.5/8, 8/8).
      It's hard to explain. Basically, the characters' strongest point is how well they're constructed. You have empathy them, and that's the subsection that scored highest. As for interactions and development; well, that's where it gets tricky. The only re-occurring character in the show is the main character, Ginko. So he's the only one that can significantly change or develop over the course of the show, really. Only, he doesn't, and it'd be bad if he did. He's meant to have a bit of mystery and sameness to his character. I mean, we certainly see some different sides of him over the 26 episodes, but none are all that different from each other, and he doesn't actually "change." And it works. It'd be weird if he changed, because he's something like a constant, something you can count on to be familiar. That's important in fictional, unfamiliar settings. The reason it works is because the stories are mainly about identifying the cause of the problem, then finding a cure. So basically, the characters aren't really as important, which is why it can still be a "10/10" show despite them not being as "good."
      I guess I'd say it has less to do with not liking the characters and more admitting that they have faults, which is perfectly forgivable because of the kind of series it is. Considering the small amount of time most characters get, the show does a very good job of developing them. But that still doesn't make them "good." Like I said, it's hard to explain. My ratings were: Empathy: 10/12, Change: 1.5/5 (they don't really change. Some of the "victims" do have some slight ones, but these are usually not that big a deal. Let me put it this way: there aren't character "arcs." Though others might disagree.), Interactions: 4.5/7 (characters interactions feel "real," and there are some complex and interesting conflicts between character's desires. Thus, above average.), Individuality: 4/6 (the characters never really feel the same, but chances are you won't remember many them after finishing it. You won't confuse them with each other either, but...).
      Add my ratings and you get 87.5/100 (it got full marks in the other sections). This is very close to 90, which as you said is difficult to get. But if I had to pick one anime series that deserves a 10/10 rating, it'd be Mushishi. That's why I feel like this guide may consider characters too much, for MUSHISHI. A guide is just that, a guide. It's meant to lead you to a conclusion, not give you one outright. I added that emphasis 'cause I believe different shows weigh things uniquely. Usually, it's small enough that it doesn't matter (thus why rating guides actually work), but they do exist, and in certain cases can make the difference.
      Your guide, then, does a good job. Mushishi got close to a 90, and another show I tried, Baccano, got 93/100, which it rightly deserves. It gets ratings in the general area. Since that's the most I'd expect from a breakdown guide, I feel like this is pretty close to the best you could get. That may just be my opinion on guides, though.
      As for multipliers, they might work, but I feel that they wouldn't be worth the effort. It'd be very complicated, and each anime has its own standards (or at least, each genre does), and so trying to find something that works perfectly with each would be nigh-impossible. If you do try them out, I think I'd be interested to see what you come up with, but...

    6. Also, compensating could be the key to it. That might move ratings from the "general area" I mentioned to a more specific one. The problem would be avoiding arbitrary use; compensation might need its own guides.

      And Raggers, don't worry about tl;dr. As far as I'm concerned, there is no such thing on my blog. Do make sure you don't go over 4000 characters or so, though. I just spent over an hour editing that comment above down. I personally love long comments though, so as long as you don't tick off Blogspot... ;)

  4. 4000 characters... I'll try ;)

    I haven't seen it, so can't really say much about Mushishi, but it seems like this to me: A snapshot into another world. It takes you there, gives you some with it, and that's it. (This description makes me want to watch it so much right now, you wouldn't believe it. It sounds so beautiful and sad - I'm a sucker for gut-wrenching longing)

    Character is the one place it falls down, and I can see why. A snapshot wouldn't really be expected to show change, and something with Mushishi's setup has only 1 recurring character.

    I'm surprised Ending/Final Impression didn't get 8, or very close.

    The problem is "defining". In breaking up the big 5 sections into smaller sections, I limit each of them to their sub-sections.

    My thoughts for improvement:
    Split "Plot" into:
    -Plot/4+1 - detail, plotholes
    -Story/6 - how the whole thing hangs together, is it correctly balanced, does it work.

    ^-----Plot can, if exceptional, 5/4 (e.g. Steins;Gate)

    Add a fifth section to Character:
    -Character Study/8 - Use for anime heavily focused around a single character, with possibly a main side character. Score the depth to which this character is shown - this is far beyond normal anime; the main character should be hugely complex. Many different sides to this character should be shown, with varying reactions to different situations. Monologues reveal a lot about the character.

    ^-------Very complicated, but this 5th section also grades from 0 upwards, so very few will be able to get much on this. "Character studies" were at a disadvantage on the original system, so this corrects that - add this to Mushishi, and I think you'll find the score much better, without really affecting the other scores.

    Any sub-section score greater than the max score for the section will be lowered to the max score - e.g. if "Character" totals 33, only 30 could count towards the final score (or maybe 31).

    That's all I got for now, so what do you think of those changes? Horror, Psychological, and Mystery may also need some tweaking/compensation for Atmosphere. I'll play around with the idea of a multiplier - I was thinking 1.05 in the following formula:
    FinalMark=(Total-50)*Multiplier +50
    That is, only the difference between Average and the Total is changed, and this has the added benefit of pulling bad scores slightly further down, which fixes likely generosity, and pulls the top end further up, as it recognises 100/100 is unattainable. A show would need a raw mark of 97.6 to reach 100/100 - at that point I think the distinction is mute, so it'll work. I should say: with this added complication a spreadsheet is necessary for saving time and reducing errors.

    1. Sorry for the late reply, I was working on some other stuff and wanted to give this my full attention.

      I've never heard Mushishi described that way...but that might just fit it perfectly. I certainly hope that you do check it out and enjoy it though - if you read my review, then you'll know that I think very very highly of it.

      Hmm...I think I like that. It lets certain areas of a show make up for others (which is important, as we figured out), but with your "cap" system (over a certain amount doesn't count) shows still can't get "unfairly" far ahead of others.

      Ooh! That's a very interesting subsection indeed! I honestly have no idea how much I would want that to count for...but it would certainly make up for other areas that are neglected because they aren't focused on. That could work very well, especially with the cap system.

      Ah yes, atmosphere. I honestly feel like they would be extensions of style, so you'd only need to figure out style. That's just my opinion, but...
      As for multipliers...I am so out of my league. :) Just plugging in your formula, it seems very workable. If you ever go down that route and find a multiplier you really like (though this may be good enough anyways), then you should totally shoot me an e-mail or comment about it. I doubt I'd be any good for giving advice, but it'd be cool to see what you come up with.

    2. Don't worry, Mushishi is pretty high on my to-watch list.

      As for style:
      While I wholeheartedly agree that style is very important, as a concept it is kinda vague. I mean, after breaking down those sections into sub-sections, "style" doesn't seem to mean much. That's why I have atmosphere, pacing, art - they are all parts of "style". They target your thinking to the different aspects of style.

      Some more suggestions (I know I keep coming up with them, sorry):
      1)Uniqueness - This is an idea I really like. You use rewatchability, which works for you, but for me rewatching something is difficult at best. Maybe in the distant future...
      Anyway, this is what I think style is trying to do. It's to set the show apart from the crowd - e.g. a different art style (think Madoka, Lupin III, Katanagatari). If it's memorable it's for uniqueness.
      Now, this is really good because even a fanservice anime can get good marks if good enough. If your score for uniqueness affected the multiplier, that'd be even better - it recognises that style accents the substance, but the substance has to be there.

      2) A redo of the Character section. It doesn't consider the differences between main, major side, and minor side characters. Also, glancing down my anime list, change seems limited to certain genres of show. Putting it in the original "general" system was wrong. Interactions and Individuality are quite important though. Instead of empathy it should be "how much you enjoy watching the characters" - a bit wordy but would not exclude/penalise the OTT comedic characters.

      3)Execution could be split into Pacing and Atmosphere, with a possible +2 for atmosphere.

      Do you have MS Excel? And if so, what version? I was thinking I could make a spreadsheet and email it to you to try out (it makes mass-scoring infinitely easier). Or is there another program you would recommend?

      As for late replies: this is just an idea I'm running with to see where it goes. Any input you (or anyone else for that matter) gives is greatly appreciated, but this is pretty low-priority imho - I mean, it's hardly got a deadline.

    3. Oh, definitely. Plus, its definition within anime isn't even clear. It's the unforged territory of rating guides for good reason. :)

      Don't apologize! My intent was to create discussion, and you've done that on a level I hadn't even dreamed of. I'm loving it.

      I really like that idea? And honestly, with a few basic guidelines ("Is the plot unique..." Are the visuals different? In what way?..." Kind of thing), it wouldn't be too hard to avoid being arbitrary (which is the bane of all good ratings).

      I don't think there were any critical errors in the original system, but those changes seem pretty logical.

      As for Execution...I honestly don't know. I feel like I'd need to see the entire completed system before I could pass a judgement on it. At a cursory glance, though, certainly seems workable. Since you have the caps, it wouldn't get out of hand, so...

      I'm afraid I don't have Excel. My laptop is pretty barebones in terms of software...I do have OpenOffice, though (currently at 3.3, though I think there's an update. I don't know if it even has what is needed, but...). Like I said, I honestly don't think I'll be much help here. I'm a total amateur when it comes to spreadsheets, multipliers and the like. I mean, I'm happy to try, I just won't be much help.

    4. All right, after all that discussion I've finally got a v2. It's in the email I sent to

      I've attached a visual representation (JPEG) and 2 spreadsheets. The first is blank, the second has the series I've watched with some of their scores filled in.

      The "Insert" section: leave blank if not relevant. -2 is detracts from viewing experience, 0 is no difference, 2 is really good.

      The same basic guidelines still apply:
      0 - non-existant
      30 - lacking
      40 - missing something
      50 - nothing special
      60 - above average
      70 - great
      80 - excellent
      90 - amazing
      100 - A real gem
      >100 - If an section is done extremely well add the bonus. For all (except inserts) this is only worth 1 point per section.

      I suppose what'd be most useful would be if you tried out a range of anime (genres as well as ratings), and said how close they are to what you think they should be (be honest with yourself here) and if there any things over/under-weighted.

      Also: I gave up on the multiplier idea, and originality is a consideration for most sub-sections. Sorry I haven't further elaborated on what the sub-sections mean (time), but if any are not obvious reply and I'll elaborate.
      -I changed the name of sub-section "Story" to "Overall Coherence" in the spreadsheet. There may be some more minor changes, please inform me of any inconsistencies.

      The Character Section:
      --This means if the characters aren't just overused stereotypes, they seem to be their own person - character quirks and facial expressions are important here.
      --How rounded a character is - even OTT/comedic relief characters can be somewhat complex.
      --Do they react realistically (not out-of-character) to events? Do they grow/mature/change (if applicable)
      --Side characters are supporting characters - are their positions and actions believable?

      Main chars are the ones who appear in most eps, and have large roles
      Major side chars are supporting chars, with their own roles, but appear less often and are less important. Usually recurrent, perhaps with diminished roles.
      Minor side chars are "situation chars" - i.e. they set up situations into which the other chars are pushed. Very rarely recurrent, but still can be their own person.

      --Character design, backgrounds, settings, colour choices - detail
      -Own Art Style
      --0 is inappropriate, 1 is normal anime art, 2 is experimenting on occasion with own style, 3 is completely different art (e.g. Katanagatari, Shaft)
      -Character Expression
      --how well the characters are animated - their facial expressions, body language, character quirks (e.g. nervous ticks)

      Memorability is if you'll remember the anime for the right reasons, and how much it stands out from the crowd.

      P.S. I started this comment thinking it'd be quite short. Don't think it turned out quite as planned...

    5. Just wanted to let you know I got your e-mail. I'll check it out sometime soon, after I make a little headway on some posts I'm working on.

      And hey, long comments are par for the course at this point.