A warning about causation, correlation and big data

by on 12-21-2011 11:09 AM - last edited on 12-18-2011 11:30 PM

unlimited data.pngThe January 2012 issues of Wired had a very useful article for anyone diving into the big data pool titled: Trials and Errors Why Science is Failing UsThe reason it is useful is it talks about the basic and counter intuitive concept that correlation cannot be confused with causation – just because two things appear to be linked one may not be the cause of the other. This is the mantra of statistics professors everywhere.

 

There was a great analogy I heard a week or so ago driving the concept home related to the deployment of telephone service into rural areas. There was a direct correlation made between the installation of telephones and murder rates. This obviously does not mean that telephone poles caused murders. There was something going on there but it may be much more subtle than that.

 

This is also the concern about that many expressed about the global warming issue. Although some may debate “if” something is going on, many more are much more concerned about exactly what is going on since the underlying arguments and simulations seem to move around too much. It can seem a bit too easy to place the blame on humans because that’s about the only thing humans can control. I am not trying to come down on one side or the other of this issue – just pointing out why some even scientifically minded people are going to be hard to persuade without a more formal causal understanding.

 

It is worth always asking “Why?” whenever you hear someone say “From this relationship we can conclude that A causes B”, you need to see if they can explain why! The article goes into some great examples of common sense observations in the pharmaceutical space that turned out to be totally incorrect.

 

These are the same types of issues that business who start to tap into the massive data collection and analysis capabilities that exist will have to be concerned. Root cause analysis skills will need to be honed since we can’t rely on just a computer generation correlation to justify actions that may significantly impact business. These higher level skills are going to have to be developed.

We encourage you to share your comments on this post. Comments are moderated and will be reviewed and posted as promptly as possible during regular business hours.

To ensure your comment is published, please follow our community guidelines.

Comments
by on 01-28-2012 06:36 PM
Post a Comment
Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.

Find HP in Social Media

Facebook Twitter YouTube SlideShare Flickr
About the Author
Labels