Now, let’s see how FalconStor deduplication solutions address each of these important decision elements.
Flexibility – FalconStor® Virtual Tape Library (VTL) allows backup administrators to fully align deduplication policies with business goals by letting them choose the method that best meets their specific requirements.
- Inline deduplication has the primary benefit of minimizing storage requirements.
- Post-process deduplication is ideal when the key goal is to back up as quickly as possible or to create off-site tape copies when replication is not available.
- Concurrent deduplication is similar to post-processing but starts as soon as the first set of data has been written and runs concurrently with backup. This is highly suitable for clustered VTL environments. Because replication starts sooner, data can be quickly recovered from a remote site.
- No deduplication can be used on data that does not deduplicate effectively or on data that is exported to physical tape. Examples include image files and pre-compressed or encrypted data. Selectively “turning off” deduplication saves deduplication cycles and focuses on deduplicating data that yields the highest ratios and value.
Deduplication is important because industry-standard backup practices inherently create large amounts of duplicate data because the backup system repeatedly copies the same data to secondary storage. By eliminating duplication, companies can keep more data online longer at significantly lower costs and reduce secondary storage requirements.
Here are elements to consider when looking at a deduplication system:
Some data yields better deduplication results than others, so a deduplication solution should allow backup administrators to align deduplication policies with business goals by letting them choose the method that best meets their specific requirements. These options include the following:
- Inline deduplication has the primary benefit of minimizing storage requirements. It is ideal for small storage configurations or environments where immediate replication is desired.
- Post-process deduplication is ideal when the key goal is to back up as quickly as possible. As its name implies, it occurs after the backup process completes, thus it can be scheduled to run at any time.
- Concurrent deduplication is similar to post-processing, but starts as soon as the first set of records has been written and runs concurrently with backup. Deduplication engines start working immediately, making full use of available CPU resources. This is highly suitable for clustered VTL environments. Replication starts sooner, so data can be quickly recovered from a remote site. One good practice is to use flexible deduplication policies to leverage post-process deduplication for the initial backup (for speed) and then switch to inline for subsequent backups.
- No deduplication is for data that does not deduplicate effectively or is exported to physical tape. Examples include image files and pre-compressed or encrypted data. Selectively turning off deduplication saves CPU cycles and focuses on deduplicating the data that yields the highest ratios and value.
Let’s call it what it is: data deduplication is the waste management system of the storage industry, and just as with any other waste management process, you really need your system to be very efficient. But to start, and just as with any other pandemic, let’s take a look at the symptoms of data duplication! The biggest duplicate producer in today’s IT world is the traditional backup process. Yes, I’m talking about the antiquated, passé, and totally broken batch backup process that produces more data than you can ever get any use for and way less than what you’d really need.
Now that server virtualization technologies have been proven in many environments, more people are looking at virtualization to improve the efficiency of their primary workloads in the data center. Despite the realized benefits from virtualizing non-mission-critical applications, two questions remain on the minds of IT professionals. One, since traditional backup doesn’t work in virtual environments, how can I effectively protect virtualized workloads? We are talking mission-critical applications here! Two, I know how I reduced my server infrastructure with virtualization, but I also know how my storage cost went way up as a result. So how can I reduce my storage costs while implementing server virtualization?
In a recent report from ESG on the “Impact of Server Virtualization on Data Protection,” when asked about top server virtualization initiatives for 2010, most respondents placed backup, recovery, and replication right after virtualizing more workloads. It is very well understood that server virtualization breaks traditional backup processes. The consolidation of servers and workloads is leaving very little resources for backup applications to perform data copies. In virtual server environments, CPU utilization climbs to more than 60 to 70 percent, up from an average of 20 percent in physical environments, leaving very little for the most demanding job of them all, backup. In addition network resource utilization is increased to such a degree that very little bandwidth remains for massive data transfers required by backup operations.
It is hard for a technical person to not ask how something works. How does one deduplication technology differ from another? How big of a block size does it use? What form of hashing is it using? These are all great technical questions, but what are the really important questions?
I thought it was a joke when I first read about Sepaton's 40:1 deduplication rate guarantee, and when you look closer... it is a joke! Here are the details if you want to confirm... Sepaton 40 to 1 Guarantee
Almost as difficult as explaining when to use affect versus effect in a sentence, I spend a lot of time explaining why it is difficult to accurately size a deduplication solution for a specific customer's environment. The simple answer is, "the deduplication ratio will depend on your data and backup methods."
So much for New Year’s resolutions. February flew past without me posting a blog. Cut me some slack, it was a short month. I have also taken on the additional role of VTL Product Manager in order to better direct the development of VTL/SIR. So sending me your comments and concerns is a short path to the top.
Happy New Year! One of my New Year’s resolutions is to post more regularly to this blog, but to keep them short and hopefully interesting so you find them worth reading! So hear goes…
Although "The times, they are a changin" are the lyrics to a real old Bob Dylan song, they sure seem to be prophetic as they pertain to the storage industry today. In just a few short years, we have seen the fall of Fibre Channel and the introduction of SAS storage as the mainstay of SAN storage.