It’s a fiscal touchstone of the Internet era: there’s enormous value to be found in the information we collectively share through our online likes, check-ins, searches, browsing and buying histories.
Companies of all kinds are keen to learn ever more about us, and are increasingly willing to pay for the privilege, notes Bernardo Huberman, HP Senior Fellow and director of the Social Computing Research Group at HP Labs. Yet currently the people actually generating the data aren’t benefitting financially at all (even though, in many cases, they do receive free access to a useful
“There’s no reason, in principal, why individuals shouldn’t be paid in return for the data they create,” suggests Huberman. “If we can do that while taking into account the privacy attitudes of the participants, we can help people better control how their data is used and at the same time open up new possibilities for innovative social and technological research.”
That’s the argument behind a new paper, “A Market for Unbiased Private Data: Paying Individuals According to their Privacy Attitudes,” [pdf] written by Huberman and HP colleague Christina Aperjis.
For such a market to work, they argue, buyers need to be sure they’re receiving unbiased data and individual sellers must be confident they’re getting the best possible price for their information within their tolerance for sharing.
Existing research shows that people tend to want either a significant price for data they feel is revealing, or very little for data that’s liable to biases of various kinds. The result: buyers generally can’t afford the data sets that would be most useful to them.
What’s your appetite for risk?
The solution, say Aperjis and Huberman, is to create a market that lets sellers participate according to their specific attitudes about privacy and risk.
This results in smaller but statistically valid sets of data that can be made available to buyers. Because of the validity of that data, buyers need to purchase only a fraction of a larger data set to compute reliable statistics about that larger set – allowing them to pay relatively fewer individuals the higher sums it takes to get more valid data.
Trust, transparency, and choice as competitive advantages
Managing the process is an intermediary known as the market-maker. Taking a small cut of every trade, market-makers want to maximize market volume and thus have an incentive to act as an honest broker, enforcing transparency and choice with respect to privacy.
In fact, the better job the market-maker does to promote and protect individuals’ privacy, the more trust will exist within the market – and trust is a critical factor in consumer adoption of new business models. For example, giving individuals the most choice about how and why their data is used can widen the appeal of opting-in, increasing the efficiency of the market and giving the market-maker a competitive advantage.
How to move beyond present-day data markets
Third party dealers of private data sets already exist, note Aperjis and Huberman. What’s been missing is the chance for individuals to participate in these transactions – whether by defining how the data may be used, or to get paid. In particular, the inability to compensate sellers is in no small part due to the fact that prior models for private data markets don’t specify how to set appropriate prices.
For example, simply asking sellers to name a specific dollar value would result in biased datasets because people who value their private information the least would always offer the lowest prices. Instead, Aperjis and Huberman set prices by asking sellers to make a series of choices between payment plans that trade off risk and reward.
That said, the researchers acknowledge that in many common scenarios, individuals do receive a service in return for sharing their private data – a free query from a search engine, for example, or a valuable connection with friends through a social network. But it’s not hard to imagine new markets where buyers would be willing to pay individuals directly for their personal information.
A pharmaceutical company, for example, might need a reliable sample of people with a particular disease and who use a specific drug. In a case like that, say the HP researchers, “it is of the utmost importance for the buyer of the data to obtain an unbiased sample of individuals with certain characteristics.”
The research value of unbiased, private data
Such data sets may cost more money to compile, but that actually makes them more likely to be available for peer review and verification. And that’s a positive change, suggests Huberman.
In a letter published in the Feb 16th edition of Nature, Huberman drew attention to the research value of information produced by users of social media. Yet analyses of such data, he wrote, are typically “not accessible to researchers beyond the authors of the work.”
“In some cases the source of the data itself remains hidden, leading not only to problems of verification but also about the generality of the results,” he added.
In contrast, Huberman and Aperjis’ new approach could create data sets that are both reliable and available to others to either confirm or challenge the research inferences drawn from them.
These sets could be used for commercial purposes, certainly. But they might also drive new insights across many fields of research – in education, for example, in urban planning or for medical applications such as the tracking of disease outbreaks.
“Overall, in a global economy where the proper handling of data is an increasing concern, there’s a lot to be said for creating markets where private data is traded openly,” Huberman suggests, “and for doing it in a way that benefits the people who actually generated it.”
Scientific analysis of data produced by users of social media sites has world-changing potential. By understanding patterns in activity, these analyses can improve everything from business to education to the overall human condition.
But because the data is often proprietary, the scientific process of peer-review is beginning to break down, potentially leading to troubling results.
In a letter published this week by the premier science journal Nature, Bernardo Huberman, the Director of the HP Labs Social Computing Group, shines a light on the issue, noting “many of the big data results that are coming out are obtained from private sources that are not accessible to researchers beyond the authors of the work.”
“Even worse, in some cases the source of the data itself remains hidden, leading not only to problems of verification but also about the generality of the results,” he says.
(Bear in mind that Huberman’s reference to “private” should be interpreted as “owned by a certain corporation,” not “an individual’s personal information.” The point is that meaningful, verifiable results can still be gleaned from data that is aggregated and anonymous, rather than individual and personally identifiable).
Dr. Huberman continues:
More importantly, we need to recognize that these results will only be meaningful if they are universal, in the sense that many other data sets reveal the same behavior. This actually uncovers a deeper problem. If another set of data does not validate results obtained with private data, how do we know if it is because they are not universal or the authors made a mistake?
Huberman and the HP Labs Social Computing Research Group have published dozens of scientific research experiments based on publicly available data from services as diverse as Twitter, YouTube, Digg, Wikipedia, and Gnutella.
Their latest study, "The Pulse of News in Social Media: Forecasting Popularity," was recently featured in Technology Review and The Atlantic.
Read the full letter:
The interactive nature of the web has created research opportunities which have been highlighted in this journal and exploited by a number of researchers from the social and information sciences. Patterns that were hard to discern when operating with limited data sets have become apparent as enormous repositories of data collected by large services such as Twitter, Facebook, and Google are accessed by researchers and business professionals.
There is however a serious problem with many of these studies. As recently re-iterated in this journal by Ravetz (Nature 481, 25 (2012)), Science is unique in that peer review, publication and replication are essential to its progress. And yet, many of the big data results that are coming out are obtained from private sources that are not accessible to researchers beyond the authors of the work. Even worse, in some cases the source of the data itself remains hidden, leading not only to problems of verification but also about the generality of the results. While ideally one would like to have the authors share the data, at least these data sources should be accessible to others to verify the findings. This is common practice within the physical and biological communities.
More importantly, we need to recognize that these results will only be meaningful if they are universal, in the sense that many other data sets reveal the same behavior. This actually uncovers a deeper problem. If another set of data does not validate results obtained with private data, how do we know if it is because they are not universal or the authors made a mistake? Moreover, as many practitioners of social network research are starting to discover, many of the results are becoming part of a “cabinet de curiosites” devoid of much generality and hard to falsify.
Besides the potential for fraud, if this trend continues we'll see a small group of scientists with access to private data repositories enjoy an unfair amount of attention in the community at the expense of equally talented researchers whose only flaw is the lack of right "connections" to private data.
Bernardo A. Huberman
Director, Social Computing Group
Editor’s note: for another analysis of this emerging topic, read “Why Facebook’s data sharing matters,” by Marshall Kirkpatrick in Read/Write Web.
Imagine you’re planning to book a hotel room at the Hotel Canal Grande in Venice, Italy. Before you make a reservation, you’ll probably visit an online ratings site like Trip Advisor to see how many stars previous guests give the hotel. You might also ask your Facebook friends to see if they have an opinion, too.
Which source is more likely to influence your decision, the user-generated ratings site or your friends?
HP Labs today published a new study answering this very question, following up other recent research on how social influence influences decision-making. In their new experiment “Swayed by Friends or by the Crowd?”, HP researchers discovered:
- For a user deciding between two choices, an additional rating star has a much larger effect than an additional friend’s recommendation on the probability of selecting an item.
- Negative opinions from friends are more influential than positive opinions.
- People exhibit more random behavior in their choices when the decision involves less cost and risk.
The study validates the intuition that negative reviews from customers are more powerful than positive ones, adding a new dimension to the adage that “satisfied customers tell three friends, angry customers tell 3,000.”
It also suggests that enabling positive and negative feedback to flow equally through online social networks is a potentially powerful feature for users communicating the value of goods and services to each other.
Read the full research report here [PDF] or below.
New research from HP Labs shows that our choices can be reversed when we’re exposed to the recommendations of others. In a surprising twist, though, we’re more likely to change our minds when fewer, not more, people disagree with us.
“What this implies,” he says, “is that rather than overwhelming consumers with strident messages about an alternative product or service, in social media, gentle reporting of a few people having chosen that product or service can be more persuasive.”
The experiment – devised by Huberman along with Haiyi Zhu, an HP labs summer intern from Carnegie Mellon University, and Yarun Luon of HP Labs – reveals several other factors that determine whether choices can be reversed though social influence, too. It’s the latest product of HP Lab’s pioneering program in social computing, which is dedicated to creating software and algorithms that provide meaningful context to huge sets of unstructured data.
Study results: the power of opinion
Opinions and product ratings are everywhere online. But when do they actually influence our own choices?
To find out, the HP team asked several hundred people to make a series of choices between two different pieces of furniture. After varying amounts of time, they were asked to choose again between the same items, but this time they were told that a certain number of other people had preferred the opposite item. (Separately, the experiment also asked subjects to choose between two different baby pictures, to control for variance in subject matter).
(above: an example comparison used in the experiment)
Analysis of the resulting choices showed that receiving a small amount of social pressure to reverse one’s opinion (by being told that a just few people had chosen differently) was more likely to produce a reversed vote than when the pressure felt was much greater (i.e. where an overwhelming number of people were shown as having made a different choice).
The team also discovered:
- People were more likely to be influenced if they weren’t prompted to change their mind immediately after they had expressed their original preference.
- The more time that people spent on their choice, the more likely they were to reverse that choice and conform to the opinion of others later on.
Attention marketers: Implications in theory and practice
The experiment pits two theories of social influence against each other.
Psychological reactance theory suggests that when we face opposition to our beliefs, our need for self-preservation drives us to stick to them strongly. On the other hand, social influence and conformity theory argues that we like to feel socially connected with others and as a result will reverse our opinion if we feel it will restore that sense of belonging and self-esteem.
The research team’s results suggest that the first theory is more powerful when we’re presented with the opinions of many others, while the second has more power when we’re imagining ourselves as members of a smaller group. It also supports earlier Labs work showing that our votes on rankings are influenced by our own desire to impact the choices of others.
These are all insights that online marketers can use to alter the design of their recommendation systems and thereby influence their customers’ behavior, Huberman suggests.
Tackling the problem of ‘big data’
To conduct the experiment the HP team used Rankr, a new mobile, cloud-based polling application created by the Social Computing Group.
As well as polling people on simple preferences, Rankr can be used to sort much larger lists of relative preferences, such as desired product features, or crowdsourced concert playlists.
Given the vast number of choices that companies and individuals now face when dealing with information, efficient mechanisms for filtering and ranking sets of possibilities are of increasing value. Similarly, understanding how product rankings work, and how to best exploit them, will grow in importance as ever more such ‘unstructured’ data is created online every day.
“Customers see tremendous value in the ability to make sense of this data,” remarked HP CEO Léo Apotheker last week. “HP has an opportunity to lead in this area, transform unstructured information into meaningful insights and deliver it to customers better than anyone else.”
Next up: the power of friends
Huberman and his team next plan to focus on whether specific quantities of recommendations or the source of those recommendations (their perceived ‘quality,’ in a sense) carry more influence.
When presented with the fact that 1,000 people recommend a specific item but that 4 close friends like the alternate, for example, which would most of us choose?
Social media has exploded in popularity in China – HP’s launch of a Chinese-language blog two weeks ago is just one indication. Still, scientists and sociologists around the world are only beginning to turn their attention towards understanding how these services are actually being used.
Adding to this nascent body of research, a paper published today by HP Labs delivers new insights into the content-sharing preferences of users on Chinese microblogging service Sina Weibo.
The paper, titled “What Trends in Chinese Social Media,” [PDF] highlights HP’s unique approach to data analysis and follows a series of recent announcements about the company’s investments in the areas of cloud computing, R&D and social media in China.
Why study social media in China?
As the country with the largest number of Internet-connected citizens (420 million as of June 2010 [pdf]), the rise of social media in China has been particularly explosive.
For example, Sina Weibo -- a Chinese microblogging site similar to Twitter -- has already amassed over 140 million users as of May 2011, nearly all of whom are located inside China and post in the Chinese language.
One of the principal tasks of the Social Computing research group at HP Labs is to create software analytics and algorithms that can add meaningful context to the huge data sets produced by those millions of social media users.
For example, in 2010 the Social Computing team introduced an algorithm that ranks Twitter users according to their influence. This algorithm has a number of applications, from spam filtering or advertising targeting to recommending what users to follow.
Key takeaways from the research
For their experiment, HP scientists examined the topics that are most popular on Sina Weibo during a 30-day period and compared them to previous findings from research on Twitter. According to the paper’s authors, Louis Yu, Sitaram Asur, and Bernardo Huberman, there are significant differences in the content shared in China, when compared to a global social network such as Twitter:
- In China, people tend to use Sina Weibo to share entertainment-type content such as jokes, images, and videos, and a significantly large percentage of posts are “retweets” (a re-posting of another user’s tweet). The trends that are formed are almost entirely caused by retweets of this kind of content.
- In contrast, prior HP research shows that trending topics on Twitter have more to do with current events than entertainment and that the effect of retweets is not as large (although it is still substantial).
- There are more unverified accounts among the top 100 trend-setters on Sina Weibo than on Twitter, and most of the unverified accounts feature discussion forums for user-contributed videos, images, and humor.
“It is important to note that the differences in behavior between the two user bases are relative and not absolute,” says Bernardo Huberman, director of HP’s Social Computing lab. “The study doesn’t suggest that news-sharing does not occur on Sina Weibo, because it certainly does. However, our experiment does show that this type of behavior was relatively less common than on Twitter during the trial period.”
The paper also includes a brief history of social media in China and illustrates key differences in how Sina Weibo and Twitter work.
Future plans for researching Chinese social media
The HP Labs researchers responsible for this research are already planning to build on this initial work, as demand for social media analytics in China is sure to continue to grow.
“This paper is just the beginning,” adds Huberman. “It is useful and interesting for us to understand at an aggregate level the type of content being shared on services like Sina Weibo, but we believe there is even more value for HP and our customers in understanding the flow of content among specific, influential individuals on the network.”
Find HP in social media…in China and around the world
HP on Sina Weibo: www.weibo.com/hpchina
China HP blog: www.hp.com.cn/hpchinablog
Follow the author of this post: @ethanbauley
Frequently asked questions about the research
Why are posts on Sina Weibo referred to as “tweets”?
On Sina Weibo, the act of “retweeting” is called “zhuanfa,” which actually means “retweet” in Chinese. Rather than defining “zhuanfa” in the paper, the authors chose to use this translation.
What is a “trending topic”?
Trending topics reveal popular themes from current discussions on a social network. Sina Weibo offers a list of 50 keywords that appear most frequently in users’ tweets over the past hour. They are ranked according to the frequency of appearances. Similarly, Twitter provides users a list of the top ten trending topics of the moment.
What is a “verified account”?
Both Twitter and Sina Weibo authenticate certain accounts, verifying the ownership and identity of the account holder. This authenticity is communicated to other users via a badge or other visual cue.