Data Deduplication (11)
Now, let’s see how FalconStor deduplication solutions address each of these important decision elements.
Flexibility – FalconStor® Virtual Tape Library (VTL) allows backup administrators to fully align deduplication policies with business goals by letting them choose the method that best meets their specific requirements.
- Inline deduplication has the primary benefit of minimizing storage requirements.
- Post-process deduplication is ideal when the key goal is to back up as quickly as possible or to create off-site tape copies when replication is not available.
- Concurrent deduplication is similar to post-processing but starts as soon as the first set of data has been written and runs concurrently with backup. This is highly suitable for clustered VTL environments. Because replication starts sooner, data can be quickly recovered from a remote site.
- No deduplication can be used on data that does not deduplicate effectively or on data that is exported to physical tape. Examples include image files and pre-compressed or encrypted data. Selectively “turning off” deduplication saves deduplication cycles and focuses on deduplicating data that yields the highest ratios and value.
Deduplication is important because industry-standard backup practices inherently create large amounts of duplicate data because the backup system repeatedly copies the same data to secondary storage. By eliminating duplication, companies can keep more data online longer at significantly lower costs and reduce secondary storage requirements.
Here are elements to consider when looking at a deduplication system:
Some data yields better deduplication results than others, so a deduplication solution should allow backup administrators to align deduplication policies with business goals by letting them choose the method that best meets their specific requirements. These options include the following:
- Inline deduplication has the primary benefit of minimizing storage requirements. It is ideal for small storage configurations or environments where immediate replication is desired.
- Post-process deduplication is ideal when the key goal is to back up as quickly as possible. As its name implies, it occurs after the backup process completes, thus it can be scheduled to run at any time.
- Concurrent deduplication is similar to post-processing, but starts as soon as the first set of records has been written and runs concurrently with backup. Deduplication engines start working immediately, making full use of available CPU resources. This is highly suitable for clustered VTL environments. Replication starts sooner, so data can be quickly recovered from a remote site. One good practice is to use flexible deduplication policies to leverage post-process deduplication for the initial backup (for speed) and then switch to inline for subsequent backups.
- No deduplication is for data that does not deduplicate effectively or is exported to physical tape. Examples include image files and pre-compressed or encrypted data. Selectively turning off deduplication saves CPU cycles and focuses on deduplicating the data that yields the highest ratios and value.
The Reality of Data Domain Claims Featured
When I read the announcement "EMC Transforms Backup and Recovery Landscape…", I was surprised to find out that their new DD990 was "over 6x faster than their nearest competitor." If we compare this to FalconStor's enterprise solution offerings currently available, that would mean that the DD990 could ingest data at a rate of over 168TB per hour. Hmm, let's consider the facts. FalconStor can ingest a sustained 28TB per hour with our inline global deduplication. So then I ventured to the EMC website and found that they actually ingest data at only 31TB per hour inline (PEAK). FalconStor also offers the option to utilize policy-based post-processing for an ingest performance of 40TB per hour and the flexibility to switch between inline and post-process deduplication based on the particular needs of the business because one size does not fit all. Looking closer at EMC's posted numbers, you will find that they only get 15TB per hour inline ingest without utilizing BOOST. BOOST requires the customer to host the deduplication processing on their backup servers, which creates hidden costs and complexity.
We could detail the advantages of the FalconStor products vs. EMC, such as flexible deduplication, high availability, simplified licensing, and many other features, but common sense should put EMC's statement into the 'too good to be true' category. It looks like FalconStor’s position as having the fastest sustained deduplication speeds in the industry still stands, with 28TB per hour with inline deduplication and 40TB per hour with post-process deduplication.
Let’s call it what it is: data deduplication is the waste management system of the storage industry, and just as with any other waste management process, you really need your system to be very efficient. But to start, and just as with any other pandemic, let’s take a look at the symptoms of data duplication! The biggest duplicate producer in today’s IT world is the traditional backup process. Yes, I’m talking about the antiquated, passé, and totally broken batch backup process that produces more data than you can ever get any use for and way less than what you’d really need.
Page 1 of 2