When we look back a few years from now, it may well turn out that Hadoop is the "it technology" of 2012—much the way cloud was in, say, 2009. It's no wonder, considering that at the end of last year, 600 blog posts, 34,000 tweets and 240,000 pieces of content were being published on the Web every minute, according to a report by JMP Securities, as reported by CBS Marketwatch.
As you're probably aware, is an open-source-distributed data-processing technology that takes advantage of large clusters of industry-standard servers to create a single, highly available environment capable of storing and managing petabytes of information. The trick—as many enterprises are discovering—is figuring out how to analyze that information and use it in real time to make better business decisions.
We don't need to crow about the advantages of a technology that enables you to handle massive amounts of data more easily. But HP blogger William Kosik raises a good point in his recent post that one more advantage of Hadoop is that it's got a lot of potential for enterprises looking for green data center solutions.
Kosik writes: "When considering the advances in computing efficiency that can be achieved by using Hadoop, HP’s Autonomy is a powerful solution to increase computational ability by creating a link between the Autonomy's IDOL search software and the Apache Hadoop computing platform. It can be embedded in each node of the Hadoop cluster to analyze and summarize data, giving users the ability to automatically analyze any piece of information across large amounts of unstructured data, such as web pages, email and digitized office documents. Efficiency is the name of the game here."
If that doesn't pique your interest, consider that the HP AppSystem for Apache Hadoop is the first to deliver industry-leading performance for a 10-terabyte (TB) dataset processed in 5,128 seconds (approximately 1.5 hours). Built on HP Converged Infrastructure consisting of an 18-node HP ProLiant Generation 8 (Gen8) DL380 cluster and HP Networking, HP solutions proved to be 3.8 times and 2.6 times faster than Oracle and SGI Hadoop offerings, respectively.
Join the Conversation:
- Transforming IT blog
- Converged Infrastructure blog
- Around the Storage Block blog
- Eye on Blades blog
- Reality Check: Server Insights blog
- Mission Critical Computing blog
- Rethink BI: Business Insights Over Business Intelligence blog
- Servers: Hyperscale Computing blog
- As the first vendor to submit performance results for the 10TB Terasort benchmark, an 18-node cluster of HP ProLiant Gen8 DL380 servers sorted the 10TB data set in 5128 seconds, a rate of 1.99 gigabytes per second; it sorted the 100 gigabyte data set in 55 seconds at a rate of 1.82 gigabytes (GB) per second. On a per node basis, the HP ProLiant Gen8 DL380 was 3.8 times faster than Oracle’s 2010 100GB result and 2.6 times faster than SGI’s 100GB 2011 result. Hardware Configuration: 18 HP ProLiant DL380 Gen8 servers; Dual 6 core Intel® E5-2667 2.9GHz processors; 64 GB memory; 16 x 1 TB SAS 7.2K disks per node; 4 x 1GB Ethernet. Software Configuration: Red Hat Enterprise Linux 6.2; Java Platform, Standard Edition, JDK 6 Update 29-b11.