Why You Shouldn't Trust Automated Sentiment Scoring
Why You Shouldn’t Trust Automated Sentiment Scoring
Why You Shouldn’t Trust Automated Sentiment Scoring
by

Automated sentiment scoring has evolved as the feature du jour of the social media monitoring platforms of late. So many services were offering it, even big players like Radian6 had to bring it to the table for fear of losing prospects. It’s not enough to tell brands how many conversations are being had. Social media monitoring services have to now must report on whether or not they like us.

A few months ago, we discussed automated sentiment scoring with Jeffrey Caitlin, CEO of Lexalytics, a leading natural language processing firm that serves as the engine behind many monitoring service’s offerings. His assertions were that while natural language processing can do a lot to get you close to scoring the sentiment and tone of a given piece of content, teaching computers to recognize sarcasm, false positives and the like is a significant challenge. Looking at sentiment scoring across a large data set gives you a more accurate view, but is still an estimation and not exact.

So the belief is held by many that sentiment and tone scores by social media monitoring services are pretty good, but not perfect. Still, it’s the accepted mechanism to know if people like you or not. Human analysis is better, but more expensive and potentially cost prohibitive, particularly for medium and small businesses. So, automated scoring is accepted as an industry feature and we’re all happy, right?

Not so fast. Scott Marticke of Sentiment360 reached out to me after my assertion earlier this month that human deciphering of information is something the monitoring services don’t offer, to let me know he had some surprising comparison information of automated sentiment scoring vs. human analysis I might like to see. Sure, he wanted to ensure I knew about Sentiment360, which adds a layer of human analysis to social media monitoring, but it was the disparity in analysis that struck me in our chat.

In a recent comparison for CBS Television on conversations around the show NCIS, Sentiment360 found as much as a 50% swing in sentiment, or lack there of, in machine vs. human analysis. Essentially, once you let humans analyze the data, the machine-produced results are crap.

Sentiment360's human vs. machine sentiment analysis

Here’s a run-down of their comparison:

  • Sentiment360 used an unnamed social media monitoring service to collect the data. (They’re tool agnostic and say they use several different ones depending upon the client need. Companies they report as part of their arsenal include Radian6 and ScoutLabs.)
  • The results showed 50,000 conversations around the show in a given month with the search performed in March of this year.
  • According to the service’s automated scoring, of the 50,000 conversations, 84 percent were neutral or passive mentions, 11 percent were positive and five percent were negative. NCIS is talked about a lot, and more positively than negatively, but the vast majority of the conversations don’t hold an identifiable opinion.
  • Sentiment360 pulled a sample of 3,000 of those conversations and had their analysts go to work. So the results show just the human analysis of the sample, but the sample is six percent of the total data set, far better than most market research firms offer.

And here’s what Sentiment360’s analysis found. Some of these numbers astounded me:

  • 23 percent of the entries were irrelevant. They mentioned NCIS or linked to the show, but contained no other qualifying information about the show or were spam sites.
  • Once the irrelevant entries were removed, only 30 percent of the entries reviewed were found to be neutral or passive, a 54 percent difference than the machine analysis.
  • Human analysis found that 63 percent of the online conversation around NCIS was positive, not 11 percent as the machine asserted.

Looking at the comparison, a couple of thoughts came to mind for me. In using several monitoring solutions, I’ve noticed a great deal of the automated scores I see are passive or neutral, proving overall useless to a brand. I’ve also noticed an awful lot of irrelevant posts appear in your searches, almost regardless of how minutely you program your keyword searches. If human analysis shows 23 percent of the results are irrelevant and that more than 50 percent of the passive/neutral results can be scored, then automated scoring needs to get a LOT better.

Caitlin was right. Automated sentiment scoring can only get you so far, but if this experiment is representative of what would happen with your brand, I’d say automated sentiment scoring doesn’t get us very far at all.

Don’t get me wrong: I’m not saying the people behind automated scoring aren’t working hard or helping us accomplish a difficult task easier. I am saying, however, that we need to be clear that letting a machine supply us with this particular piece of marketing intelligence is flawed. It’s not that we shouldn’t do it, but that if we do, we must understand the limitations and prioritize the intelligence accordingly.

You can certainly question the human analysis. Sentiment360 uses analysts from the Philippines to provide their analysis. Outsourcing overseas is cheaper and allows them to offer human analysis at a much lower price point than the big research firms. But they claim all their analysts are graduate or post-graduate level analysts and recognize that the Philippines is the third-largest English speaking country in the World. It was also once a U.S. colony, so there’s a faint cultural commonality, too.

But what you can’t really question is who’s believing the results. Saatchi & Saatchi‘s New York operation just named Sentiment360 as their preferred social media listening provider with VP for Digital Strategy Shel Kimen saying, “Sentiment360 demonstrated that their combination of machine listening and human analysis provided us with excellent intelligence. We had looked at a number of their competitors and Sentiment360 excelled in quality of the analysis, ROI and delivery time.”

Not bad for a firm that began to exist in December.

Sentiment360 is going to run you around $7,000 per month, so it’s still prohibitive for medium to small businesses. But more importantly, they are helping us all see that the machines are good but not great (or, depending upon your perspective, not really all that good at all) and natural language processing has a long way to go.

I don’t see Sentiment360 as a competitor for many social media monitoring services because of their price point. They’ll hob knob with the major brands and do well, but for companies that need to refine monitoring to less than $2,000 per month (which is the majority of companies), Sentiment360 doesn’t fit.

What I do see them doing, however, is forcing companies like Lexalytics, or even the core social media monitoring services to either get their algorithms better faster, or add a layer of human analysis on top of what they offer.

Have you or your company conducted similar experiments with machine vs. human analysis? How about service vs. service analysis? If so, please share your results or thoughts in the comments. If not, go try it and report back. It will make the industry better as a whole.

Enhanced by Zemanta

SME Paid Under

About the Author

Jason Falls
Jason Falls is the founder of Social Media Explorer and one of the most notable and outspoken voices in the social media marketing industry. He is a noted marketing keynote speaker, author of two books and unapologetic bourbon aficionado. He can also be found at JasonFalls.com.

Comments are closed.

VIP Explorer’s Club

Categories

Archives