Why Online Ratings Don’t Work

Related posts:


Recently I came across an article in the Wall Street Journal about online ratings. The article, which surveys a number of online properties, cites the tendency to 4.3: On the Internet, Everyone’s a Critic But They’re Not Very Critical. The article’s authors pretty much capture what many of us get intuitively about why online ratings really don’t work, but I thought I’d break this down from a social interaction design perspective to get at some of the causes of this. First and foremost is the fact that most online systems built to capture user tastes, preferences, and interests engender bias. And online media amplify bias, for a number of reasons.

This bias originates with the user’s intention, which goes unknown and is not captured in the rating system itself. The reasons a user may have for rating something can be many: a mood, attitude, a personal interest, a habit of use, interest in getting attention, building a profile, promoting a product, and so on.

Amplified distortions

Social media, because they provide indirect visibility in front of a mediated public, amplify any distortion baked into the selection itself (a selection being the act of rating something). This amplification is explained in part by the de-coupling of selective acts (rating) from consequences and outcomes.

Selections are de-coupled from personal consequences, which excuses a certain lack of accountability and responsibility. Selections are de-coupled from their context of use, which range from personal utility to social promotion. And selections are de-coupled from social implications, which removes the user from his or her contribution to a social outcome (eg, highly-rated items look popular).

Consider the reasons a user may have for making a selection (rating something). They include:

  • personal recollection (like favoriting)
  • to inform a recommendation engine (so that it can make better personal recommendations)
  • because the item is a favorite (sharing favorites)
  • because the social system has no accountability
  • because it always creates the possibility of recognition for the user
  • because it promotes the item
  • because it’s nice (socially; possibly karmic)
  • because it’s a gesture about how the user felt

Social selections are thus encumbered by ambiguity: of intent, of meaning, of relevance, and of use.

Can these be addressed and resolved by better system design? Or can they only be resolved by social means?

Considerations

It might be possible to couple ratings with outcomes. This would involve new sets of selections and activities made available to other users and used to create consequences. Users would then consider these consequences when making a rating selection.

Contexts of use could be distinguished, so that users rate with greater purpose. This would involve creating new views of rated content, such as “rate your favorite item this wk,” “rate your favorite genre,” “rate your personal favorite,” “rate which you think is the best,” and so on. Each of these distinctions, if followed by users (!) would specify the selection by means of a different social purpose.

Reduce ambiguity

It might be possible to reduce ambiguity by means of some cross-referencing achieved by algorithms and relationships set up in the data structure. Without detailing these, they would probably include means by which to distinguish:

  • the bias of the user him or herself, measured in terms of personal tastes
  • the domain expertise of the user, as demonstrated by ratings provided by the user on other items and in which categories/genres/domains
  • the social communication and signaling style of the user, which would reveal some of his/her relation to the social space
  • use by other users and the public, as a measure of relevance

Cross references could then be applied when aggregating ratings, used to filter and sort the ratings sourced for averaged results. Theoretically, the system would be able to identify experts, promoters, favoriters, and others by their practices.

Social solutions

Social solutions might be created to supply distinctions among the different kinds of social capital involved in ratings. Such as:

  • the user’s expertise (domain knowledge)
  • trust capital, or the user’s standing within his/her social graph
  • credibility capital, or the user’s believability, as measured in loyalty perhaps
  • reputation capital, or the tendency of the user’s ratings to be referred to and cited beyond his/her immediate social graph

Finally, ratings systems can diversify possibilities for making selections, and separate communication from ratings selections so that ratings are used less for visibility and attention-seeking reasons (eg users who rate a lot).

There are too many kinds of socially-themed activities and practices in which ratings play a part for me to delve into this here. But each theme could be examined for the social benefits of ratings, for how they attribute value to the user, add value to content, and distinguish social content items to result in shared social and cultural resources. Those distinctions could be used to isolate different rating and qualification systems so that they are tighter and less biased.

Adrian Chan

Adrian Chan is a social media expert and social interaction theorist at Gravity7. You can follow him on twitter at /gravity7

7 comments on this article

  1. I’m a sucker for the one-star product reviews on Amazon.

  2. Cory on

    I wish Google would read this and open up some of it’s data to either support or refute what you’ve said. From the search side of things, I think the last bullet in the Reduce Ambiguity section plays a role in ranking sites for geo-targeted queries, but it gets incredibly interesting when user content comes into play as well – assigning numerical values to terms like \good\, \best\, \dark\ and \slow.\ Nice post!

  3. maureen on

    While doing my own research on rating systems I found this great little article (with graphs!) about the ratings in different Yahoo communities: http://buildingreputation.com/writings/2009/08/ratings_bias_effects.html

    In the article the authors write about the one community that *did* have a healthy mix of ratings, Autos Custom.

    “Looking more closely at how Autos Custom ratings worked and the content was being evaluated showed why 1-stars were given out so often: users were providing feedback to other users in order to get them to change their behavior. Specifically, you would get one star if you 1) Didn’t upload a picture of your ride, or 2) uploaded a dealer stock photo of your ride. The site is Autos Custom, after all! The 5-star ratings were reserved for the best-of-the-best. Two through Four stars were actually used to evaluate quality and completeness of the car’s profile. Unlike all the sites graphed here, the 5-star scale truly represented a broad sentiment and people worked to improve their scores.”

  4. Thanks all for your comments!

    I think the behaviors we see today indicate how well, or how poorly, these systems are working in their current social context and within existing constraints on implementation. Some contexts (eg autos custom) may have turned into gestural and signalling systems (as you note, maureen). Some could become richer (Scott) with better filtering and pre-qualification. I don’t know if Goog will open up its data, or FB, but i do think that in the distributed conversation space (activity streams) attempts are being made to capture and share more meta data. Analytics engines and metrics providers will jump on this as it becomes available. It’ll take time, it seems only inevitable. And I think it’s worth the social design community thinking about, for there are certainly design issues involved, both of the UI and social interaction models.

  5. Tyesha on

    I’ve found that the when business needs trump user needs in the design of rating systems is when the ratings loose value and become victim to what you are discussing here.

    Many of these systems are designed for easy entry in order to collect as much input as possible so the site (business) appears successful and supported by an engaged community. Adding the functionality necessary to garner better ratings would have an impact on the volume of ratings.

    But I do see this changing, businesses are starting understand the real value of good UGC.

    Thanks for the article, it articulates a lot of what I’ve been trying to talk with my clients about, I’ll get a lot of use out of it!

  6. Pingback: Why online ratings don't work | Userplus