Social Sentiment’s Missing Measures

Social-analytics accuracy is essential, whether you seek broad understanding of attitudes that affect your business, early warning of emerging threats, or to spot individual issues for customer care. Yet as I wrote in 2012, “focus on accuracy distracts users from the real goal, not 95% analysis accuracy but rather support for the most effective possible business decision making.” The most accurate and sophisticated listening/response program will fail if you’re not measuring values that matter and communicating useful and usable insights.

“Emotions tend to signal ‘what really matters’,” according to Rosalind Picard, professor at the MIT Media Lab. Given emotions’ importance, sentiment analysis is key to effective social intelligence. It’s also one of my focus areas: I organize a conference on sentiment and other “human data,” the Sentiment Analysis Symposium, coming up March 5-6, New York. I will concentrate on sentiment in this article, aiming to bring out principles that apply to all of social analytics. Let’s take a look then at social sentiment’s missing measures, and at other, metrics-related steps we can take to improve our business decision making.

Over-Simplification

Over-simplification is issue #1. I find that business analysts are too-often satisfied with crude positive/negative sentiment scoring. We can do better. There are two issues here. First issue is the over-simplification involved in shoe-horning sentiment into two catch-all categories. Second is the practice of scoring itself. Scoring is useful but reductionist; if you stop with a score, you will fail to see indicators and explanations that lie beyond. Add to these a third, linked point: If you treat analysis dashboards as the measurement end-product, you miss out on truly valuable insights, that can be gained by studying abstract attributes such as engagement, advocacy, and connection. And a fourth point: For true social intelligence, you need to reach beyond the metrics and indicators to explore root causes.

Evaluating each of these points —

I define sentiment analysis as a systematic attempt to identify, quantify, and exploit attitudes, opinion, and emotion in online, social, and enterprise sources. Attitudes, opinion, and emotion: These reflect everyday “affective” states; they convey our feelings about a person, product, brand, or policy.

We have the computing power and the data to classify sentiment in fashions that make business sense. Missing measure #1 is sentiment classified beyond positive/negative polarity, categorized instead according to business-aligned splits: promoter/detractor, satisfied/disappointed, happy/sad/angry, or whatever means business for you. Flexible automated methods, for instance supervised and unsupervised machine learning (that is, with and without predefined categories), but also expert or crowd-sourced human analysis, turn these analyses into just another classification problem. So don’t accept categories that aren’t the best match for your business problem.

Who gets this point? Vendors Kanjoya and Crimson Hexagon for two, but they, and other vendors I’ll cite in this article, are exceptions to the rule.

Scoring Points

Both an NPS and a sentiment score should be considered a starting point for deeper exploration

Promoter/detractor, which I suggested as a sentiment category pair, is famously measured via the question, “How likely is it that you would recommend Company X to a friend or colleague?,” with answers aggregated into a Net Promoter Score (NPS). A sentiment score is similarly a summation of positives and negatives, although in this case, the ratings are extracted via natural-language processing (NLP) from documents or messages or verbatim (free-text) survey responses. Both an NPS and a sentiment score should be considered a starting point for deeper exploration. On their own — and unlike a consumer-finance credit score — neither score is adequate to support decision making.

Blame the gap on the qualitative, subjective, and highly variable nature of the sentiment source material, typically text. Text-derived scores are unreliable for comparison purposes, carry no explanatory power, and mask other important measures… more on which toward the end of this article. For now, what measures lurk behind sentiment scores?

Behind the Scores

Missing measure #2 is sentiment density. When a lot of feeling is packed into a short text, that’s a message, review, or article your customer care, market research, or product quality staff should home in on. NetBase, via the Brand Passion Index, is an example of a vendor that shows it gets this need.

The challenge is that in mixed messages, positives and negatives may balance out to a near-neutral score. An example along the lines “The location was great, but I’ve never experienced worse service” — net mildly negative — illustrates why we need sentiment density as a new, distinct measure, to complement sentiment scores. A density measure could also help create comparability across sources, between 140-character-max tweets and long-form product reviews.

Refining sentiment resolution by creating scores for individual features or aspects, for instance for food versus service versus price as reported in restaurant reviews, will help, but you can still have a cancellation effect within each score. How can you surface a cancellation problem? By looking at missing measure #3, variation. A variation (or dispersion) metric would flag a mixed-ratings example like my location/service case. It would tell you to create more-granular scores (e.g., separate scores for service and price) or, if the tool you’re using isn’t capable of that refinement, it would flag cases for closer human examination.

To flag volatility, as an indicator of both risk and opportunity, we need a new measure

Similar to variation/dispersion is volatility, missing metric #4, a measure of variation over time. A basic trend line will help you see volatility, if the reporting is at the right frequency. If it’s not, the volatility — significant, rapid swings in mood or feeling — will be hidden. To flag volatility, as an indicator of both risk and opportunity, we need a new measure. Among vendors, the closest I’ve found to supporting these metrics is Social Market Analytics, which targets financial-market analyses.

Now, how else can we improve our social intelligence, and our overall business decision making?

Too Much Information, and Too Little

“I wouldn’t ask Facebook for another measurement. I’d ask it to cull the 95% of metrics that mean absolutely nothing to most social media marketers, let alone clients.”

There is such a thing as too much information. I was caught by one particular response to a blogged eConsultancy question, If you could ask Facebook for one new metric, what would it be? The reply from Peter Wood, UK social media director at STEAK, was “I wouldn’t ask Facebook for another measurement. I’d ask it to cull the 95% of metrics that mean absolutely nothing to most social media marketers, let alone clients.”

Definitely, every public-facing organization wants to know what’s being said, online and on-social, about its (and competitors’) brands, products, and people. In social-media measurement, too much information is a distraction, just as too little is, per my missing sentiment measures.

There’s plenty to social intelligence beyond sentiment, so here I’ll refer you to Steve Rappaport of the Advertising Research Foundation, whose Digital Metric Field Guide will be coming out soon. Steve provided me a preview of the guide, which recommends 197 metrics, based on consultations with 30+ “authoritative metrics sources,” citing nearly 150 research studies and reports, with 12 essays contributed by recognized industry experts. For now, you can read snippets on Steve’s Blog, and also check out Steve’s workshop, The Insider’s Guide to Social Media Measurement, which is one of several offered as part of my March Sentiment Analysis Symposium.

That’s a lot of metrics, 197, and we haven’t heard the last on this topic. Some metrics that are useful to one organization seeking to accomplish a certain task will be of no use to another, in a different industry or with a different task. What I referred to as “abstract attributes” however — I gave the examples of engagement, advocacy, and connection — will be universally applicable. (Symposium speakers will cover those topics too.) And in all cases, ability to reach beyond the metrics, to assess and explore root causes, the Why behind the measured What, is of great value.

Reach Through, Reach Beyond

I claimed that text-derived scores carry no explanatory power. For explanations, you have to explore the source material.

Many dashboard products will allow you to reach through to the text — tweets, survey verbatims, product reviews, or e-mail messages — that were the source of measured and reported values. Reach-through is more complicated when you’re working with derived indicators such as advocacy and engagement, but possible nonetheless, and it’s less than highly precise if your dashboard was populated with keyword-reliant technology.

Capable NLP technologies will identify topics and concepts related to keywords — call all that good stuff “features” — and associated relationships and attributes including sentiment. Via these associations, reach-through can be particularly useful. You’ll be able to assess salient information that is topically relevant to your business challenge and explains the stats reported by your metrics and indicators.

Root-cause analysis is an exploratory process rather than a measured quantity, otherwise I’d offer root causes as my fifth and final missing metric. They’re essential, as is social-analytics accuracy and the right choice of metrics, in the search for business insight.

Now go and explore.

SME Paid Under