- Channel HP
- :
- Data Central - the official HP corporate blog
- Mark all as New
- Mark all as Read
- Float this item to the top
- Bookmark
- Subscribe to RSS Feed
- Invite a Friend
Displaying articles for: 02-05-2012 - 02-11-2012
HP Labs study forecasts the popularity of news articles on Twitter – before they’re published
Previous HP Labs studies have shown that it is possible to predict the ebb and flow of Twitter “trending topics” by extrapolating from early observations.
In new research, the Labs team takes this predictive capability one step further by inventing a technique to forecast popularity on Twitter for a news article before it is published – with up to 84% accuracy.
The analysis, “The Pulse of News in Social Media: Forecasting Popularity,” by Sitaram Asur, Roja Bandari, and Bernardo Huberman, draws on data from over 40,000 news articles published over nine days in August 2011.
HP Labs’ study reinforces the intuition that the source of news matters most in determining how many tweets will link to a given article. But as social media continues to increase in importance, key questions remain about how users are influenced to act – beyond clicking the “retweet” button – and the nature of journalism in 140 characters.
How the system works
The researchers hypothesized that four factors would determine an article’s popularity:
- The news source that creates and publishes the article
- The category of news the article belongs to (e.g. business, health, or sports)
- Whether the language in the article was emotional or objective
- Whether celebrities, famous brands, or other notable institutions are mentioned
The team then used publicly-available tools to assign each news article a score for all four factors. For example, the Stanford Named Entity Recognizer was used to identify text representing a person or company name, and the historical prominence of the person or company on Twitter determined the score.
Finally, a prediction for the number of tweets the article would receive was calculated using standard statistical models.
(For a detailed look into the methodology, read the full paper)
Results, limitations, and future research
The model can predict how often an article will be tweeted with surprising accuracy. And in many ways, it confirms common editorial wisdom: stories that mention celebrities, come from credible sources, and belong to popular categories of news (e.g. technology) are more likely to generate tweets. The difference, of course, is that HP’s software automates this process with a level of accuracy the researchers did not expect.
Interestingly, whether or not the language in the article was emotional or objective had a very minor influence on its distribution on Twitter, suggesting that “link bait” headlines and hysterical adjectives don’t propagate messages any further than straightforward reporting.
“The tool we’ve created is not just useful to news organizations that want to increase their stories’ distribution on Twitter,” says Bernardo Huberman, an HP Senior Fellow and co-author of the study. “For example, activists and politicians are increasingly using social media to influence public opinion. By testing their messages using our algorithm, they may be able to improve the visibility of their cause.”
Huberman goes on to say that, in principle, a reporter could swap out her editor for algorithmic analysis, but that would lead to a singular focus on generating tweets at the expense of other priorities. “Social media is an important and growing influence on the public agenda, but it is not the only driver,” he allowed.
Indeed, previous research from HP Labs has shown that even on Twitter, influence and popularity is not the same thing. Expect the brilliant minds at HP Labs to build on this research in the future by integrating their influence algorithm to more accurately measure the economics of attention.
HP Labs 2011 Annual Research Report
(Contributed by Martina Van Trucco, HP Labs)
This year’s HP Labs Annual Report is now available, giving a unique look into the state-of-the-art in a wide variety of technologies.
The breadth and depth of innovation coming out of HP Labs worldwide R&D centers is nothing short of astonishing. A few of the 138 Labs innovations appearing in HP products last year were:
- low-power servers for web-scale data centers (Project Moonshot)
- a file systems for massive databases
- Photonics that replace copper wire light
- the Vayu Internet-on-your-TV device
Our demonstrator program, which takes selected research and creates real-life product demos, and customer co-innovation programs continue to generate great results. We have featured each of our 2011 demonstrator projects, including work on secure device architectures for personal computing, enterprise collaboration tools, seamless and secure document workflows, and predictive analytics applications in the oil & gas and healthcare industries.
This year we have also included a series of spotlights on specific topics for which many of our researchers have real passion: healthcare, sustainability, global and social impact, and thought leadership.
Finally, we are pleased to share a companion report, which features the 658 technical papers published by our researchers in a separate HP Labs 2011 Publications report. If you come across a publication that interests you while browsing it, remember you can access all of HP Labs’ external Tech Reports on our site.





