Data Central
Official news from HP

HP Labs study forecasts the popularity of news articles on Twitter – before they’re published

Previous HP Labs studies have shown that it is possible to predict the ebb and flow of Twitter “trending topics” by extrapolating from early observations.

 

In new research, the Labs team takes this predictive capability one step further by inventing a technique to forecast popularity on Twitter for a news article before it is published – with up to 84% accuracy.

 

The analysis, “The Pulse of News in Social Media: Forecasting Popularity,” by Sitaram Asur, Roja Bandari, and Bernardo Huberman, draws on data from over 40,000 news articles published over nine days in August 2011.

 

HP Labs’ study reinforces the intuition that the source of news matters most in determining how many tweets will link to a given article.  But as social media continues to increase in importance, key questions remain about how users are influenced to act – beyond clicking the “retweet” button – and the nature of journalism in 140 characters.

 

How the system works

The researchers hypothesized that four factors would determine an article’s popularity:

 

- The news source that creates and publishes the article

- The category of news the article belongs to (e.g. business, health, or sports)

- Whether the language in the article was emotional or objective

- Whether celebrities, famous brands, or other notable institutions are mentioned


The team then used publicly-available tools to assign each news article a score for all four factors.  For example, the Stanford Named Entity Recognizer was used to identify text representing a person or company name, and the historical prominence of the person or company on Twitter determined the score.

 

Finally, a prediction for the number of tweets the article would receive was calculated using standard statistical models.

 

(For a detailed look into the methodology, read the full paper)

 

Results, limitations, and future research

The model can predict how often an article will be tweeted with surprising accuracy.  And in many ways, it confirms common editorial wisdom: stories that mention celebrities, come from credible sources, and belong to popular categories of news (e.g. technology) are more likely to generate tweets. The difference, of course, is that HP’s software automates this process with a level of accuracy the researchers did not expect.

 

Interestingly, whether or not the language in the article was emotional or objective had a very minor influence on its distribution on Twitter, suggesting that “link bait” headlines and hysterical adjectives don’t propagate messages any further than straightforward reporting.

 

“The tool we’ve created is not just useful to news organizations that want to increase their stories’ distribution on Twitter,” says Bernardo Huberman, an HP Senior Fellow and co-author of the study.  “For example, activists and politicians are increasingly using social media to influence public opinion.  By testing their messages using our algorithm, they may be able to improve the visibility of their cause.”

 

Huberman goes on to say that, in principle, a reporter could swap out her editor for algorithmic analysis, but that would lead to a singular focus on generating tweets at the expense of other priorities.  “Social media is an important and growing influence on the public agenda, but it is not the only driver,” he allowed.

 

Indeed, previous research from HP Labs has shown that even on Twitter, influence and popularity is not the same thing.  Expect the brilliant minds at HP Labs to build on this research in the future by integrating their influence algorithm to more accurately measure the economics of attention.

Comments
Lauren Proctor(anon) | ‎02-22-2012 08:32 PM

Truly fascinating research.  I've found in my work that there's also a brand authority tipping point that's closely related to what you note about the importance of the news source.  

 

For example, when a business is first establishing their online presence it's an uphill climb to create buzz and reach across channels.   Then somewhere along the way there's a clear momentum shift we half-jokingly call the industry-celebrity effect.  

 

It's amazing to see that tipping point but I wonder...as more brands and news sources learn how to use the Internet will we see an unlimited number of authorities and increased sharing or is there a saturation point that means there can only be so many Mashable's, HuffPo's, etc?

laurenproctor | ‎02-22-2012 08:38 PM

Truly fascinating research.  I've found in my social media consulting that there's also a brand authority tipping point that's closely related to what you note about the importance of the news source.  

 

For example, when a business is first establishing their online presence it's an uphill climb to create buzz and reach across channels.   Then somewhere along the way there's a clear momentum shift we half-jokingly call the industry-celebrity effect.  

 

It's amazing to see that tipping point but I wonder...as more brands and news sources learn how to use the Internet will we see an unlimited number of authorities and increased sharing or is there a saturation point that means there can only be so many Mashable's, HuffPo's, etc?

Search
About the Author


Follow Us
Guidelines

Data Central is the official HP corporate blog, brought to you by the corporate communications team in Palo Alto. Before commenting, please read our community guidelines. For more news and press contacts, visit the HP newsroom. Note: all times GMT

Blogroll