Friday, October 2, 2009

What’s the Best Way to Collect Ratings?

What’s the Best Way to Collect Ratings?





YouTube published (h/t TechCrunch) an interesting graph of its video ratings earlier in the week.

YouTube uses a five-star scale for rating videos, and according to them, rating a video one star means you “loathe” it, while rating a video five stars means you love it.

The data show that an overwhelming majority of the total ratings are five-star, with one-star coming in a very distant second. Use of two, three and four-star ratings was negligible, like a rounding error small.

These data expose a few problems with using a five-star system for rating artifacts. First off, it seems counter-intuitive to give something any stars if I “loathe” it, and more broadly, YouTube’s users seem motivated only to reward excellence, making the rating system essentially worthless.

These behaviors point to a Digg/Bury (vote up/down) model working better, and this is likely due to a fair amount of overlap between heavy YouTube and Digg users. I wonder if YouTube will change models, or try to motivate users to be less binary.

We’ve had similar frustrations with voting models with Connect.

The IdeaFactory initially launched with a five-star system for rating ideas, and we noticed issues with the ratings. However, the experience was the opposite of what YouTube reports; we saw the majority of ideas getting four and five stars, which seemed to indicate that people wanted to show support for colleagues.

Very few ideas were rated lower than three, I suppose for fear of insulting the creator, even though ratings were anonymous.

So, what we had was virtual heat of scads of ideas all with between four and five stars.

I have this same issue with Netflix. They use a five-star system, and I constantly run into problems choosing between three and four stars.

I don’t think these problems would be solved by adding more values, e.g. more stars or half-stars. Maybe that star model is broken.

We considered this when we built Oracle Mix back in late 2007 and decided to go with a Digg-style voting model, minus the buries. So, people could agree (+1) with any number of ideas.

When we moved Connect to the Mix codeline, we converted the star ratings to votes. Basically, if you rated an idea, we turned that into a +1 vote. Not very scientific, but the stars were pretty arbitrary. So, we figured it would be fine.

We decided to avoid buries (or down votes) to promote harmony among voters. A vote against something seems pretty harsh; not voting for something means you don’t think it’s a good idea. Voting against it carries some extra payload, like you actively don’t want it.

I’m not sure if it mattered, and a few people on Mix had legitimate use cases for down voting ideas. Still, it seemed like the best course of action.

Incidentally, this model was in the news this week too, when the Google Reader team opened up Product Ideas for Google Reader, where you can submit product ideas and vote for or against them.

I love Reader and embrace this opportunity to contribute to its direction. So, I submitted a couple myself (open source and add a proxy setting), but so far, the ideas have more votes against than for them. I find myself wondering why people would take the time to vote against an idea.

What’s the harm in open sourcing the Reader engine so I can install it behind my firewall and read internal feeds? Similarly, what’s the harm in adding a proxy setting to read feeds behind a firewall?

If anything, this makes me want to vote competitively against other ideas.

This is exactly the behavior we wanted to avoid.

Still, the voting model is diluted by the fact that there is no limit on the number of votes that can be cast. No scarcity of votes means everything is highly voted, causing essentially the same issue that the star system has.

In the latest revision to Connect, we converted all votes to likes. Likes are more inline with what voting had become, but there was some blowback around the conversion. Nothing major though; I suspect everyone felt that voting was pretty diluted anyway.

Obviously, likes don’t really rank artifacts. So, the question remains; if you really want to rate artifacts, what’s the best method?

My experience tells me that star and +/-1 systems don’t work well enough.

What might work? I think you need something scarce, like a currency, to create a market and prevent excessive spending on any artifact (idea) that looks decent. I’m not convinced that down voting is necessary in a market, although I suppose there are cases where you’d want to stop an idea that might do harm. Those cases are rare though.