Falconstor Community

You are here: FalconStor Blog Virtual Tape Library (VTL) Optimizing Deduplication Performance

Optimizing Deduplication Performance Featured

Rate this item
(1 Vote)

Regarding Gary Parker’s recent blog, Deduplication - The Power of Flexibility, Gary discusses the importance of data deduplication and the trade-offs among the various deduplication options that are available in the market.

An interesting point was the comment that “for the highest performance levels, a recommended best practice is to use flexible deduplication policies to leverage post-process deduplication for the initial backup (for speed), and then switch to inline deduplication for subsequent backups.” I would like to expand on that because it is an important element of a good deduplication implementation.

Every deduplication method has tradeoffs, no matter which vendor is selected. Inline deduplication saves storage space because it does not require a staging area; but it usually is not as fast as post process or concurrent, because inline is a CPU intensive process that performs parsing, hashing, compression, and unique block look-up on the fly for incoming data.

While the CPU is performing all those calculations, some delay is added to the flow of transferred data. Thanks to new-generation processors, the impact can be limited by multiplying the number of cores available.  As an open solution, FalconStor VTL has the benefit of letting you choose the server that best fits your performance needs.

On the other hand, post process (and concurrent) are faster because they are stand-alone operations. Post process just compresses and writes the data to disk as soon as they are received. Deduplication is performed after the backup operation is completed and does not impact the ongoing backup. Post is then the most basic and fastest way to back up data to disk.

So what is the problem with first backup and inline deduplication?

One thing to know about inline is that the speed is dependent on the deduplication ratio. The higher the deduplication ratio is, the less data has to be written on disk. Initial backups usually have a low deduplication ratio because most items are unique; however, the ratio improves as the subsequent backups contain much data that did not change since the last backup. Since the deduplication ratio of the first backup will be lower, the backup speed will be reduced.

This is why we recommend performing the first backup with post process for initial best performance, leveraging the optimized performance of post and avoiding inline initial performance impact, and then switching to inline.  Thankfully we have the flexibility to do that without having to recreate the job and start all over again.  That would be a nightmare!

Each of the deduplication processes are tools that can be used based on your needs. When you decide on which vendor’s deduplication solution to implement, you want to be sure that they include the full range of options (as FalconStor does); otherwise, your flexibility to craft a specific solution for your business and technical needs will be severely restricted.

Darrell Riddle

Darrell Riddle

Sr. Director, Product Marketing

Website: www.falconstor.com E-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it
More in this category: « Prev Next »