The HP Discover Event blog is the official blog for the HP Discover showcase technology event where you can learn about game-changing innovations in enterprise software, hardware, services, and networking. Visit the blog for regular behind the scenes updates and insights.

Think Hadoop's massive analytics power requires massive energy consumption? Think again.

When we look back a few years from now, it may well turn out that Hadoop is the "it technology" of 2012—much the way cloud was in, say, 2009. It's no wonder, considering that at the end of last year, 600 blog posts, 34,000 tweets and 240,000 pieces of content were being published on the Web every minute, according to a report by JMP Securities, as reported by CBS Marketwatch.


As you're probably aware, is an open-source-distributed data-processing technology that takes advantage of large clusters of industry-standard servers to create a single, highly available environment capable of storing and managing petabytes of information. The trick—as many enterprises are discovering—is figuring out how to analyze that information and use it in real time to make better business decisions.


We don't need to crow about the advantages of a technology that enables you to handle massive amounts of data more easily. But HP blogger William Kosik raises a good point in his recent post that one more advantage of Hadoop is that it's got a lot of potential for enterprises looking for green data center solutions.


Kosik writes: "When considering the advances in computing efficiency that can be achieved by using Hadoop, HP’s Autonomy is a powerful solution to increase computational ability by creating a link between the Autonomy's IDOL search software and the Apache Hadoop computing platform. It can be embedded in each node of the Hadoop cluster to analyze and summarize data, giving users the ability to automatically analyze any piece of information across large amounts of unstructured data, such as web pages, email and digitized office documents. Efficiency is the name of the game here."


If that doesn't pique your interest, consider that the HP AppSystem for Apache Hadoop is the first to deliver industry-leading performance for a 10-terabyte (TB) dataset processed in 5,128 seconds (approximately 1.5 hours). Built on HP Converged Infrastructure consisting of an 18-node HP ProLiant Generation 8 (Gen8) DL380 cluster and HP Networking, HP solutions proved to be 3.8 times and 2.6 times faster than Oracle and SGI Hadoop offerings, respectively.[1]



Learn more:


Join the Conversation:



Sources cited:


  1. As the first vendor to submit performance results for the 10TB Terasort benchmark, an 18-node cluster of HP ProLiant Gen8 DL380 servers sorted the 10TB data set in 5128 seconds, a rate of 1.99 gigabytes per second; it sorted the 100 gigabyte data set in 55 seconds at a rate of 1.82 gigabytes (GB) per second. On a per node basis, the HP ProLiant Gen8 DL380 was 3.8 times faster than Oracle’s 2010 100GB result and 2.6 times faster than SGI’s 100GB 2011 result. Hardware Configuration: 18 HP ProLiant DL380 Gen8 servers; Dual 6 core Intel® E5-2667 2.9GHz processors; 64 GB memory; 16 x 1 TB SAS 7.2K disks per node; 4 x 1GB Ethernet. Software Configuration: Red Hat Enterprise Linux 6.2; Java Platform, Standard Edition, JDK 6 Update 29-b11.

Showing results for 
Search instead for 
Do you mean 
About the Author
Sharing info and insights to help you get the most out of HP Discover.

Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.