Big Data analytics played a significant role in Barack Obama's presidential campaign as outlined in this featured story in the MIT Technology Review by Sasha Issenberg. In order to understand the underlying voter sentiment that drove their strategies in various voter segments, the campaign needed to informationalize their 180-million-person voter file, as well as the data about volunteers, donors and online constituents. And they needed to do this fast with two very simple objectives—get 2008 Obama voters to do it again; and register and mobilize new voters. To do this, they needed a robust analytics platform.
The Obama Campaign strategy was to build out a holistic view of each constituent based upon their data as well as their voting pattern and campaign interactions. And they wanted to make this information available to the campaign as a whole. Like many other political campaign infrastructures, knowledge about the constituents and the campaign interactions were in disparate databases. They needed to analyze and extract an integrated perspective on this large volume data to realize valuable information. Enter HP Vertica Analytics Platform.
Here are the top 5 defining characteristics of Vertica, along with an analysis of its enabling capabilities in this context:
1. Storage: Analytical processing involves large volumes of data. Aggressive encoding and compression of the data stored allows for high-volume storage and retrieval of data in a timely manner, enabling more views. Vertica supports 13 different types of encoding with compression ratios of up to 60 to 1. Several operations can be performed on the native encoded data. Deferred decoding of the data ensures its timely materialization.
2. Query. The platform must enable high-speed query of data that matters. Traditional relational databases tend to retrieve all the columns in a row—pertinent or otherwise. The purpose behind the query should drive the columns retrieved for analysis. Vertica supports columnar storage, which effectively enables high-speed queries on data that matters.
3. Processing. Massively Parallel Processing (MPP) is critical to leveraging data projections that enable distributed storage and workload. MPP enables real-time analytics on large volumes of data. Vertica’s asynchronous tuple mover process enables concurrent load/query with very low data latency (seconds) and full context (years of detailed history).
4. Scalability: In this environment, the only constant is the continuous growth of data to be analyzed. The ability to add more resources on the fly through clustering is essential. Vertica's grid-based architecture provides linear scalability on clusters of commodity servers, allowing the choice of the appropriate performance curve.
5. Availability. The need to have the right information at the right time makes the availability of the analytics platform a vital requirement. Vertica's grid-based architecture has built-in, native, high availability. The projections are organized so that if a node fails, a copy is available on one of the surviving nodes.
Vertica is architected from the get-go for complex, large-scale, real-time analytics. In our world of petabytes (that is headed toward Brontobyte land), we are guaranteed to have unimaginable volumes of data with the ever-increasing need for analysis to glean valuable intelligence. In the absence of such analysis, all we will have is Big Data—without information—and therefore, no ROI (Note: Return on Information).
To win a presidential campaign in a country the size of the United States, you need the right tools. And these tools must enable the effective execution of an underlying strategy. I have no intentions of running for political office. But if I do, I know which analytics platform I will use to better comprehend my voter base.
What say you? How are you analyzing the data that your enterprise has access to? What do you think are other defining characteristics for a viable analytics platform? Please let me know.