Around the Storage Block Blog
Find out about all things data storage from Around the Storage Block at HP Communities.

What’s the big deal about big data?

By Vish Mulchand and Patrick Osborne, HP Storage


A quick Google search on the term “big data” turns up 37,900,000 results in 0.11 seconds. That’s a lot of data about a term that’s generating equal parts buzz and confusion these days.


For a basic definition, Wikipedia describes big data as “datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visualizing. . .” That’s a good start but consider this...


Who’s picking up the pieces of the data explosion?

The biggest big data challenges come from larger enterprises dealing with storing, accessing, managing and analyzing petabytes of data. Web 2.0 companies are coping with very large distributed aggregations of loosely structured and often incomplete data. Healthcare providers are figuring the best ways to handle the storage and archival of large amounts of medical image data. And media and entertainment companies are wrestling with creating, storing and distributing high-definition content. Increasingly huge data volumes also pose a challenge for any enterprise needing to store its data for compliance and regulatory purposes.


More than a “big data” problem—it’s a “big everything” problem

In all these scenarios, complexity and scalability limitations of legacy architectures can stunt emerging application deployment, along with the ability to more effectively harness the power of corporate information.


Think about it: For today’s organizations, this really is a “big everything” problem—one brought on not only by the rapid growth of unstructured data and ineffective archive solutions, but also by general content proliferation, the analytical data explosion and the advent of massive content depots. This in turn creates big challenges—like how to:

  • Effectively scale infrastructure to meet new requirements?
  • Manage spiraling costs?
  • Move data to different parts of the infrastructure to better optimize cost and performance?
  • Refresh cycles and protect technology investments?

The solution lies in scalable, converged infrastructure

However you label it, you can tame the data explosion with a converged infrastructure that includes:

  • Scalable storage to start small and grow, scaling out with storage and advanced software that allows large content depots and archives to be addressed as one global data resource
  • Thin provisioning to delay capacity purchases to when capacity is written as opposed to when capacity is needed
  • Automated storage tiering for non-disruptive data movement to  store the right data on the right tier for the right cost metric
  • Data deduplication to reduce capacity and enable capacity efficiency that keeps up with data growth

Learn more about Big Data


>> @HPStorageGuy Calvin Zito podcast HP Vertica for big data  

>> Around the Storage Block blogs on Big Data

>> HP Vertica Analytics System page


(Editor's note: this blog post was updated on 13 September 2011, removing details about HP Discover sessions from June 2011 and adding current links under the "Learn more about Big Data" subheading)

irshadraihan | ‎06-02-2011 11:31 PM

Great post, Vish and Patrick.


Indeed this is a "Big Everything" era, as enterprises rely more and more on IT as a competitive differentiator. As the need for analytics and compliance drives the storage and analysis of data, companies will have to become smart about how to archive inactive data and process active data.


Your post offers a great storage perspective on the challenge of Big Data while driving home the point about HP's Converged Infrastructure initiative. The Vertica platform, that also runs on Converged Infrastructure, is available to customers and has gained much traction in the marketplace. Readers who wish to learn more can visit the booth at HP Discover. 

Philippe Blondeaux | ‎10-05-2011 09:30 AM

If you are interested in Big-Data, I would also strongly  recommend that you check out our Enterprise Data Warehouse Solution Appliance from HP . It is an MPP platform that can handle from 50 TB up to 500 TB with 10 to 40 active computing nodes and an ultra shared nothing architecture optimized for parallel processing of time consuming queries on replicated or distributed tables with response times many times better than competing solutions not having a shared nothing architecture and claiming to cater for both OLTP and DW workloads.

Its has an intenal high speed infiniband network for re-organizing the data for optimizing query performance. Find out all the technical details in the following pdf document at

Showing results for 
Search instead for 
Do you mean 
About the Author
This profile is for team blog articles posted. See the Byline of the article to see who specifically wrote the article.

Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.