Previous HP Labs studies have shown that it is possible to predict the ebb and flow of Twitter “trending topics” by extrapolating from early observations.
In new research, the Labs team takes this predictive capability one step further by inventing a technique to forecast popularity on Twitter for a news article before it is published – with up to 84% accuracy.
The analysis, “The Pulse of News in Social Media: Forecasting Popularity,” by Sitaram Asur, Roja Bandari, and Bernardo Huberman, draws on data from over 40,000 news articles published over nine days in August 2011.
HP Labs’ study reinforces the intuition that the source of news matters most in determining how many tweets will link to a given article. But as social media continues to increase in importance, key questions remain about how users are influenced to act – beyond clicking the “retweet” button – and the nature of journalism in 140 characters.
How the system works
The researchers hypothesized that four factors would determine an article’s popularity:
- The news source that creates and publishes the article
- The category of news the article belongs to (e.g. business, health, or sports)
- Whether the language in the article was emotional or objective
- Whether celebrities, famous brands, or other notable institutions are mentioned
The team then used publicly-available tools to assign each news article a score for all four factors. For example, the Stanford Named Entity Recognizer was used to identify text representing a person or company name, and the historical prominence of the person or company on Twitter determined the score.
Finally, a prediction for the number of tweets the article would receive was calculated using standard statistical models.
(For a detailed look into the methodology, read the full paper)
Results, limitations, and future research
The model can predict how often an article will be tweeted with surprising accuracy. And in many ways, it confirms common editorial wisdom: stories that mention celebrities, come from credible sources, and belong to popular categories of news (e.g. technology) are more likely to generate tweets. The difference, of course, is that HP’s software automates this process with a level of accuracy the researchers did not expect.
Interestingly, whether or not the language in the article was emotional or objective had a very minor influence on its distribution on Twitter, suggesting that “link bait” headlines and hysterical adjectives don’t propagate messages any further than straightforward reporting.
“The tool we’ve created is not just useful to news organizations that want to increase their stories’ distribution on Twitter,” says Bernardo Huberman, an HP Senior Fellow and co-author of the study. “For example, activists and politicians are increasingly using social media to influence public opinion. By testing their messages using our algorithm, they may be able to improve the visibility of their cause.”
Huberman goes on to say that, in principle, a reporter could swap out her editor for algorithmic analysis, but that would lead to a singular focus on generating tweets at the expense of other priorities. “Social media is an important and growing influence on the public agenda, but it is not the only driver,” he allowed.
Indeed, previous research from HP Labs has shown that even on Twitter, influence and popularity is not the same thing. Expect the brilliant minds at HP Labs to build on this research in the future by integrating their influence algorithm to more accurately measure the economics of attention.