In my last blog post in the series “Tracing Data for Provenance, Transparency and Accountability in Cloud Computing”, I discussed how data is taking centre stage in cloud computing, and how current system tools are unable to effectively log file accesses and transfers within a Cloud environment.
These two factors call for a data-centric, detective approach which enables data events in the cloud to be captured, recorded and analyzed. We need a solution that enables all cloud stakeholders to monitor data files in the cloud and ensure that they remain where should be.
In this post, I want to introduce our proposed framework for managing the information in the cloud, as well as the granularities of data logs collected.
TrustCloud, proposed by HP Labs Singapore in collaboration with ArcSight (an enterprise security leader HP acquired in 2010) , enables all cloud stakeholders to trace their data in and out of the cloud. It adopts an end-to-end, data-centric methodology grounded on a five-layer framework: systems, data, workflow, laws and regulations, and policies (See Figure 1).
Figure 1: HP Labs Singapore TrustCloud Framework
At the systems layer, data events—such as file create, write, delete, or transfer— are tracked at the file- and block-level. They are logged as data logs via kernel-space sensors planted on all virtual and physical machines in the cloud. These logs are then securely transmitted and analyzed for end-to-end cloud data provenance at the data layer. Workflows and audit trails linking to human users and policies are then distilled at the workflow layer and checked against the laws and regulations layer and policies layer.
HP Labs Singapore and ArcSight are collaborating to build data-centric cloud forensics tools, which are designed to empower cloud stakeholders with the ability to track their data. For example, they would allow cloud stakeholders to identify important files to track, and if these files are violated or stolen, they would be alerted about the history of data violations. Another example of how these tools work is the ability to send text messages to cloud stakeholders when their files leave predefined boundaries (e.g. banking data leaving a country).
Now the key question is “how did TrustCloud achieve data traceability in as complex, virtualized, dynamic and distributed an environment such as the Cloud?” The answer lies in the Systems Layer, which I will discuss in my next post.
For more information on TrustCloud, please refer to the following HP Labs Technical Report: TrustCloud: A Framework for Accountability and Trust in Cloud Computing