You may have heard that cloud computing represents a paradigm change. But what exactly is this paradigm change that so many sales and marketing folks are referring to? Surely, there must be a basis for these words.
This was an issue we pondered at great length when HP Labs Singapore was started in 2010. And it led us to ask three fundamental questions we felt needed to be answered:
- What exactly differentiates cloud computing from traditional computing?
- Once these differences are understood, what then are the biggest challenges with cloud computing?
- How can we solve these challenges?
These questions led me on a journey to understand the cloud paradigm change. I consulted former business partners and friends who work in the private and public sectors, and even spoke with relatives. The less technical the interviewees were the better, as they were be able to articulate common-sense problems with the cloud, minus the buzzwords.
At the end of my straw poll, the answers to my questions were clear:
- The cloud means we no longer physically own our computing infrastructure and resources; we would only own the data that we put in the cloud.
- But to put our data in the cloud, we would have to trust the cloud service provider with our data. As such, a key challenge and possible barrier to entry with cloud computing is trust.
- There were several approaches to solving the problem of trust, and they generally fell into one of two buckets: either preventive or detective approaches.
Existing approaches to cloud security weren’t adequate
Preventive approaches, such as fully-homomorphic encryption, were all the rage in cloud security research, but they didn’t allow you to diagnose or solve a security breach once it occurred. And detective approaches, which can identify breaches in privacy or security policies and procedures (e.g. intrusion detection systems, or security audit trails, logs and analysis tools) were only minimally explored.
The reason so few types of detective approaches existed at the time was due to a lack of mechanisms for users to trace who touched their data, when, and where. In other words, there was no way for a cloud user to detect how many copies of their files were in the cloud, and if or when their files were accessed, moved or modified.
Realizing this traceability problem helped elucidate and underscore the cloud paradigm change: It is a shift from a focus on systems to a focus on data. This led us to investigate whether current tools are able trace the creation, edition accesses, transfers and the history of change of data in the cloud, i.e. cloud data provenance. To our surprise and joy as researchers, no tools that enabled full traceability of data in cloud infrastructures exist.
The existing logging mechanisms were mainly system-centric and built for debugging or monitoring system health. They were not built for tracing data created within and across machines. Furthermore, current logging mechanisms only monitor the virtual machines layer, without paying attention to the physical machines hosting them.
Additionally, while file-intrusion detection and prevention tools such as TripWire existed, they merely compared key signature changes and did not record and track the history and evolution of data in the cloud!
These findings led to the formation of TrustCloud, a research project we launched to increase trust in cloud computing via detective, file-centric approaches that increase data traceability, accountability and transparency in the cloud. We are excited about the potential for TrustCloud and the possibility of solving a problem that will only grow in importance over the next few years. After all, this is the beginning of a new era of data transparency and accountability in cloud computing.
Part 1 of a 4-article series on Tracing Data for Provenance, Transparency and Accountability in Cloud Computing. Stay tuned for Part 2 next week.