I find it interesting how the energy efficiency path usually takes me to strange and wonderful places. This time I am led to the world of Hadoop, where, I have been told, definitely strange and certainly wonderful things are happening. So as I entered with trepidation I soon found that, yes indeed, wonderful things are happening.
After hearing about the copious amounts of near-miraculous feats that Hadoop can produce, I honed in on just a couple: Hadoop (and a stellar technology company like HP of course) is able to reduce the energy that is required to run the massive computational workloads that Hadoop is famous for. Hadoop also can increase node reliability and reduce backbone traffic by distributing that data across different racks. Having this distributed approach in place will theoretically improve energy use since nodes that have duplicated data can go into a low power state or even shut down entirely based on how the workload is instructed to move through the system. And it also increases reliability if a server or rack is lost due to some type of failure - the job just moves on to a node that has the duplicated data and continues without missing a beat.
As further proof that Hadoop has some green roots, Rutgers University and Barcelona Supercomputing Center researchers have theorized a platform called GreenHadoop, (http://bit.ly/LyxCKn) which is "a MapReduce framework for a data center powered by a photovoltaic solar array and the electrical grid (as a backup)." In their paper, "GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks" the academics postulate that GreenHadoop can "significantly increase green energy consumption and decrease electricity cost" in the data center.
When considering the advances in computing efficiency that can be achieved by using Hadoop, HP’s Autonomy is a powerful solution to increase computational ability by creating a link between the Autonomy's IDOL search software and the Apache Hadoop computing platform. It can be embedded in each node of the Hadoop cluster to analyze and summarize data, giving users the ability to automatically analyze any piece of information "across large amounts of unstructured data, such as web pages, email and digitized office documents". Efficiency is the name of the game here. See the Fact Sheet for more info.
In a similar vein, researchers at Stanford University published a paper on the merits of Hadoop's ability to process workloads and keep energy use to a minimum. The paper (http://bit.ly/KlO96a) delineates an approach that can have a fundamental impact on how data center power and cooling systems are designed. Using a multi-job batch workload consisting of several scans and sorts of 32 GB of data and putting nodes to sleep, the experimental results indicated a 9% to 51% reduction in computer energy use.
The paper says it best: "The energy efficiency of a cluster can be improved...by matching the number of active nodes to the current needs of the workload, placing the remaining nodes in low-power standby modes…” Using this kind of thinking (classic IT and facilities synergy), data center energy use will continue to fall while the uptime will continue to rise.
Read more about new HP offerings for Hadoop.
Check out these posts about Hadoop from HP Technology Services bloggers:
Ian Jagger: Where the Hadoop Wheels Hit the Road