Around the Storage Block Blog
Find out about all things data storage from Around the Storage Block at HP Communities.

Welcome to Flash 2.0: HP 3PAR Thin Deduplication with Express Indexing

Ivan.jpgBy Ivan Iannaccone, 3PAR StoreServ Product Management

 

With the introduction of HP 3PAR Thin Deduplication and Thin Clones software, HP 3PAR StoreServ continues to set the gold standard for hardware-accelerated thin technologies that drive up capacity efficiency and extend flash media lifespan. Together with cMLC SSD and Thin Deduplication, HP 3PAR StoreServ truly enables a flash-based solution at HDD-based costs.  Welcome to Flash 2.0!

 

Flash_Mainstream.JPG

 

How the magic happens

HP 3PAR StoreServ has two secret sauces: the ASIC and the highly virtualized Architecture. Thin Deduplication is the perfect blend of hardware and software, with the ASIC generating hashes and assuring data integrity and the HP 3PAR OS handling metadata and volume management between the various virtualization layers.

 

The implementation of Thin Deduplication is influenced by 3PAR StoreServ support for Mixed Workloads, to serve concurrently different I/O Streams (sequential/random) of variable I/O Sizes. As writes are received, they are segmented in Cluster Memory Pages (CMPs). The CMP is 16KB – the same size that has made HP 3PAR Thin Provisioning so successful. It’s a perfect balance for:

  • Performance – CPU interrupts, Host I/O sweet spot, Host I/O average block size, metadata handling AND
  • Efficiency – granularity of write updates, deduplication and space reclamation

As a new I/O write comes in, HP 3PAR Express Indexing performs metadata lookups to compare the signatures of the incoming request to signatures of data already stored in the array. The technology uses the computed hash signature as an index to lookup a match using a three-level translation exception table mechanism. If a match is found, then the L3 page table entry will be set to point to the existing copy of the data page. If no match is found, a new block is allocated on the back-end to host the new page and its hash saved for comparisons of future writes. To prevent any hash collision, Thin Deduplication leverages the ASICs to perform a bit-to-bit comparison before any new write update is marked as a duplicate.

 

All of the above operations occur inline after the write has been acknowledged back to the host but prior the data is flushed to the back-end, thus not impacting host latency.

 

Thin Deduplication implements an online garbage collection process to reclaim space that is no longer referenced. This runs continuously and is completely automated and transparent. Unlike some competitive implementation Thin Deduplication does not restrict the number of times a given page can be referenced.

 

Dedupe[1].JPG

 

Why HP 3PAR 7450 “All-Flash Array” versus an “Amateur Flash Array”?

There are more than 30 flash companies in the industry today, all with some level of interesting technology and niche implementation. HP 3PAR StoreServ is taking flash mainstream, by combining low cost of flash storage from a $/GB usable/raw perspective with a solution that satisfies all the existing use cases that are important in Primary Storage.

  • Primary storage is all about protecting the data in any possible way, while offering integrity and availability and assuring the best performance with optimized efficiency with data compaction, rack density, power/cooling and ease of use. This is what HP 3PAR StoreServ Architecture is all about. A proliferation of flash companies brings “architectures” and new ways to spin features or functionalities:
  • Architectures that claim to be built from the ground up for flash and interestingly only support certain types of flash from certain vendors/suppliers and fixed drive sizes. All complimented with a lot of DRAM cache with a side of NVRAM.
  • Architectures where scale-out equals replacing controllers (Replace-Out), where there is documentation but no implementation (Fake-Out) or where you just aggregate the manageability layer (Knock-Out).
  • Architectures that say RAID is bad, but actually still use a RAID-based implementation and concepts. (For those who wonder, 3PAR introduced its RAID alternative in 2009, HP 3PAR RAID MP, multiple parities with multiple mirrors wide striped across multiple enclosures.)
  • Architectures with guaranteed performance and some with 100% performance claims while actually just using 50% of available performance. And even others with the best metadata handling, all in cache or not, shared or distributed, yeah…hard to keep up.

There are some interesting choices and some good technology out there, but only one Flash-Optimized Architecture that meets the criteria for Modern Tier-1 Storage that is ready for the Next Style of IT.

 

Read up on flash

 

Edison Competitive Review

Check out the Edison assessment of efficiency technologies with an overview of other solutions from the competition.

 

ESG White Paper on HP 3PAR 7450

Read Mark Peters’ assessment on HP 3PAR approach to flash

 

BrightTALK Flash Vendor Panel

Watch this BrightTALK vendor panel discussion that happened last week between HP, EMC, Pure Storage and Kaminario. It’s a fair and interesting conversation.

Comments
nate | ‎06-19-2014 09:30 PM

Can you talk more about the RAID MP ? My understanding (as a 3PAR customer since '06) that the RAID MP was simply an extension of RAID 5 with another parity bit stored on another disk. At the time of introduction 3PAR specifically told me that RAID 6/MP was little more than a "check box" for some customers who had stupid requirements of a platform that "must" support RAID 6 even though there was no need(at the time) on 3PAR (I'd argue the need isn't there still on most 3PAR systems other than ones that are pretty small).

 

The magic was already there in the chunklet-based RAID going back to the origins of 3PAR.

 

Though your post above seems to imply there is somethin special in RAID MP.

 

A couple articles I wrote years ago

  • 81,000 RAID arrays on one of my 3PAR boxes at the time (and I include the sample script so other customers can get their number if they are curious how many arrays are running on their systems)
  • Do you really need RAID 6? - explaining some of the bits behind why RAID on 3PAR is so different/better. Also shows an example on how RAID 6 failed horribly for one user, and how there is already triple parity RAID out there on some platforms (perhaps quadruple by now that post is four years old)

Myself am in the process of quoting out a 7450 with these new SSDs, likely a 4-node right off the bat, to replace/augment an aging F-class running a 93% write workload. It will be a nice upgrade, even before the de-dupe stuff. Will be putting that 5 year warranty on the SSDs to the test.

 

I would like to see HP give customers the ability to cluster nodes in 74xx vertically instead of horizontally though - I mentioned this directly at tech day last year. It would provide the ability to survive a failure of the shelf that contains the two controllers and stay online. Given the 74xx cross system interconnect is now cable based instead of hard wired it should not be difficult. I suppose with the 6 nines guarantee program that the liklihood of a "shelf" that has the controller pair failing completely is probably quite rare... but still would like to see the option :smileyhappy:

 

 

 

Gabriel_py | ‎06-20-2014 09:44 PM

Nice feature! There seems to be an typo in the first image.. it says 460TB raw and 1.2PB usable.. it should be the other way isn't?

| ‎06-20-2014 10:16 PM

Gabriel - Absolutely correct. Remember that deduplication increases useable capacity so useable capacity we quoted is based on 4:1 dedupe ratio.

 

Nate - I'm sitting in a Turkish restaurant in Cologne Germany and will leave it to the team to respond to you. Long way from Vegas. 

Ivan Iannaccone | ‎06-22-2014 10:21 PM

Hi Nate,

thanks for your comment and the interest in the article. RAID MP is actually more than that and as a matter of fact it's the other way around, since 3.1.2 RAID 5 is just RAID MP with only a parity. We just continued to use industry standard terms because that is what the majority of customers are used to and we believe it's important for customer to know and understand RAIDs overheads up front. RAID MP allows us to integrate further with the background pd scrubbing process to detect any potential issue at the device block layer and trigger raid rebuilds down to 512byte granularity if necessary ,versus having to do read and rebuild at the step-size level. This is particularly important when it comes to flash as it allows us to be pro-active in detecting and correcting any CRC/bit errors. Regarding the need of using or not single parity protection versus double parity we are still leaving this decision to customers, data shows that with flash MTBF are greater and rebuilds faster. We are working on a few new things and I will talk about them in a blog later this year.

Regarding the cluster interconnect that was something we were actually limited by how HDD worked and the fact you could only have active/passive connectivity to a device. Flash does not have this and other limitations and it's great to have an architecture that can now exploit this new media. And you are correct the occurrence of the failure is beyond the six nines boundaries from both a model perspective (how the architecture is designed) and observation perspective (how the systems perform in production).

 

Alex Macaronis | ‎06-24-2014 05:59 AM

So... This is dedupe... but its only on the 7450? Not for conventional disk based appliances?

 

 

Ivan Iannaccone | ‎06-25-2014 08:32 PM

Hi Alex,

 

HP 3PAR StoreServ 7450 is an All Flash model, therefore deduplication on HDD is not applicable.

 

Ivan

AllMarketing | ‎10-28-2014 02:36 AM

I find the conversation of thin deduplication and compaction quite amusing.  HP has to be the only vendor that can start talking about having deduplication and compaction (compression) since June and not deliver anything.  How about you innovate and quite lying to the public about these features.  There is NO deduplication or compaction as of Oct 27th and I don't think it is appropriate to mislead customers/prospects.

| ‎11-04-2014 10:13 PM

@AllMarketing - when we announced Thin Deduplication, we were clear that it was coming.  And customers were absolutely aware.  That said, Thin Deduplication has been available to customers for a while, first in beta and now in general release.  In fact I just saw an email from the field reporting that a customer who moved data from HDDs to Thin Dedupe SSDs saw 14.6:1 compaction. 

 

HP 3PAR Thin Deduplication is here.

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
This profile is for team blog articles posted. See the Byline of the article to see who specifically wrote the article.
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.