Guest post by Andy Sparkes, Technical Director, HP Storage
The recent release of HP StoreAll with Express Query and Autonomy IDOL integration demonstrates the power of storage to provide an intelligent large-scale data repository. This intelligence resides in two locations:
1. The Express Query database is integrated into the file-system.
This deep integration means that the database is able to use the snapshot and replication features of HP StoreAll for its protection and disaster recovery. Current solutions use a side-band database, which are typically relational—and not optimal for managing constantly updated and changing metadata in a scale out system. Side-band databases require their own hardware, management, monitoring and DR resources. StoreAll has a unique segmented architecture that enables the deployment of a distributed database and can utilise the underlying capabilities of the file-system for management and protection.
Generating and extracting metadata from objects and files and subsequent operations on them is a complex task. This has to be done without slowing down the system ingest rate whilst carrying out numerous insert operations into the database.
The database design is based on understanding how to trade off scale, ingest rate, query rate and freshness and uses technology from HP Labs. The task becomes even more complex in a scale-out solution with entry points into the namespace across multiple nodes. Each segment is able to journal all the changes to the file system. These are scanned and run through a pipeline that sorts, indexes and merges all of the records. This journaling mechanism means that an audit record of the file system is also generated and stored in the Express Query database. Each stage of the pipeline is discrete and is able to be asymmetrically scaled.
The Express Query database mirrors all the system metadata associated with files and objects including its retention state, who owns it, what tier it sits on and when last accessed. Custom metadata for each file or object can also be added via a published API. Tagging or addition of custom metadata allows an application or user to start enabling structure around the data.
The metadata database provides the first intelligent layer. It can be queried to carry out numerous operations ranging from generating reports on the state and composition of your data repository and identifying candidate lists of files and objects for subsequent actions. This intelligence allows multiple policies to be deployed against the data. Express Query has a synchronisation mechanism so it can be used on existing file-systems enabling the installed base to exploit this technology.
2. The Autonomy IDOL connector offers second deeper level of intelligence.
IDOL generates indexes from content and uses that to generate meaning-based relationships. With Express Query, the IDOL indexes become real time as IDOL is efficiently informed of what has changed and where it is located allowing a targeted indexing process to run. File-system scans for new content have been completely eliminated.
Two technologies come together for one big data solution
What we are saying is this: The storage layer has become smart. It now understands what it is storing and is able to take decisions on how it stores its data.
IDOL allows you to understand the relationships and content that is being stored. This approach also transforms the backup and archive workloads as simple queries can be used instead of expensive find and stat commands.
Focusing on protocols and APIs
Because HP StoreAll is now a truly multi-protocol device, it supports the mainstream classic protocols such as CIFS and NFS. It also supports a number of RESTful protocols. The key differentiation is how the product allows cross protocol access across its object and classic protocols.
The REST protocols provide two options. The first is a pure PUT/GET object layer. The second option interacts directly with the file-system and the Express Query database. The StoreAll object API acts as both a data path and metadata tagging process. This API is fully documented with supported commands and code examples.
ISVs and customers are using this API to remove the need for a secondary database and provide a fast search capability into large-scale data repositories.
Examples of API power and flexibility
First, an example of a file upload:
curl -T temp/a1.jpg http;//22.214.171.124/storeall_share1/lab/images/xyz
The API can also be used to add further metadata by assigning keys and associating values with them:
curl –X PUT https;//126.96.36.199/storeall_share1/lab/images/xy
&value=”Smith, John; 8136”
Further operations such as deleting and modifying metadata, setting WORM status and retention periods are also available. The API is also used for searching for files based on system and custom metadata.
Here the example shows files being selected on custom metadata:
The files can be identified either by full pathname or by a GUID. Files stored with pathnames can be subsequently accessed via the classic protocols with that pathname. This cross-protocol nature allows you to experiment and maintain you current data structures whilst learning how to exploit the capabilities that object storage brings.
What makes this a smart choice for hyperscale storage
The combination of distributed database, scale-out storage and Autonomy technology make HP StoreAll a very smart place to store your unstructured data and start to exploit it.
Here’s where you can find more information about HP StoreAll Storage.