In my previous blog, I wrote about the basics of a transformation initiative for your IT Infrastructure, and I focused on modeling the “As Is” state. I closed by asking: How can we deal with the different ”V’s” of Big Data (like Volume, Velocity, Variety or Voracity), and what are the possible impacts on “As Is” IT architecture?
To help answer these questions, HP has created a new “To Be” model with input from many sources, such as an article by Hortonworks’ Shaun Connolly titled Big Data Refinery Fuels Next-Generation Data Architecture. And we have further elaborated the concept of the Big Data Refinery System, since this is the key element in an IT transformation for Big Data.
The future IT infrastructure for Big Data requires:
- A Big Data Refinery system that can provide key ‘Big Data Services’ like:
- Handle the Capture/Store/Aggregation of data,
- Elaborate on requests or queries or other transactions on this data,
- Provide information interaction and linkage with other processes.
- Provide a development environment to create, discover and test new analytics.
2. A well-defined interlock system to link the Refinery with the Transaction and current Business Intelligence platform
3. An integrated approach when it comes to Governance, Protection and Management of data.
1. The Refinery System
The core element of this infrastructure transformation is a system that will enable IT to be a Big Data Service provider to the business.
The primary Services for a Refinery system that we have identified are: Capture, Store, Analyze, Develop and Search.
Having a specific Refinery system will allow us to choose the best platform to store, aggregate and transform multi-structured data without interfering with actual business transactions and interactions of systems. Having all defined as Service will enable IT to choose where best to provision the service –on premises, on the cloud – and provide the best approach. Each service of the refinery has a technology impact and/or a technology decision.
Here is where packages like Hadoop, MapReduce, Pig, Hive and Vertica, or solutions like Autonomy, can provide the technology platforms to build the refinery system. The platform decision will depend on the business case and on the data sources and data types to manage.
2. The Interlock System
To unlock the value of the Big Data refinery system we need to have a clear interlock with runtime models and data for ongoing refinement and analysis, as well as linkage with the Business Intelligence part of the system. Seamless transaction and integration between these systems will unlock value. When choosing the platform for the refinery system, a key consideration, then, is the integration capabilities and impact of the platform.
An Interlock platform can be composed of several and different connectors between the Refinery Transaction and Business Intelligence platform.
3. Integrated Governance, Protection and Management
The refinery system does require a new integrated Data Lifecycle Management approach that will become a critical component of the Big Data solution. All aspects need to be considered, beginning with data creation, continuing through data storage, and terminating with data destruction. Big Data Lifecycle Management is essential to the success of any Big Data initiative.
Today, IT infrastructures that support data architecture need to evolve. They will need to accommodate new systems that support the services we identify as the “Big Data Refinery”. These must be capable of storing, aggregating, and transforming multi-structured raw data sources into usable formats that help fuel new insights for the business. The connection of this system with actual business transactions and interactions, as well as business intelligence, will allow the generation of business value. A new Big Data Lifecycle Management approach will be critical to ensure long term success.
Enterprises’ IT departments need to understand the analytics requirements for Big Data and plan for the transformation of today’s infrastructure in order to provide Big Data services to the business.
Click here to learn how the HP Big Data Strategy Workshop can help you build a roadmap for your Big Data strategy, while reducing risk and accelerating decision-making.
Update, 12.5.12: This week at HP Discover in Frankfurt, HP announced new services that can help your organization plan, deploy and support a Big Data environment. Click here to learn more about one-day workshops that focus on your Big Data infrastructure strategy, analytics infrastructure or storage platforms; and to learn how the new HP Proactive Care for SAP can help you speed up the analysis of large amounts of enterprise data.