Falconstor Community

You are here: FalconStor Blog Data Deduplication Data Deduplication – the Storage Industry’s Waste Management System

Data Deduplication – the Storage Industry’s Waste Management System Featured

Rate this item
(1 Vote)

Let’s call it what it is: data deduplication is the waste management system of the storage industry, and just as with any other waste management process, you really need your system to be very efficient. But to start, and just as with any other pandemic, let’s take a look at the symptoms of data duplication! The biggest duplicate producer in today’s IT world is the traditional backup process. Yes, I’m talking about the antiquated, passé, and totally broken batch backup process that produces more data than you can ever get any use for and way less than what you’d really need.

Traditional backup, not to name any, has few challenges in our modern IT environments. First is that addiction to copying data over and over again, leaving behind a very long trail of tape that can become very challenging to manage, or even to make sense of, even with the best catalogues.

Second, it’s the reason why we have a “backup window,” due to its thirst for CPU and network resources. It will very simply bring all systems to a halt during the backup process. An old friend use to say, if you want to know whether you have a solid network and IT infrastructure or not, start your backups and watch what happens.

By looking at the above symptoms, data deduplication seems to be a good remedy, but all deduplication solutions are not created equal. The choice should be based on the type of IT operations that you have. The performance of the solution will directly impact your backup window. Common sense would tell you the smaller the backup window the higher performance you need, but there is a bit more to that. Your deduplication solution should be able to support different types and techniques of backup operations. An example would be how effective the solution is at deduplicating multiplexed backup streams!

Also, since the deduplication process itself is a CPU-intensive operation, you should check whether you could exclude some data types from deduplication and only apply the process where it matters. It doesn’t make sense to try to deduplicate encrypted data, for example, as it’s pretty much all unique data. You also want to exclude some other data streams, such as archiving medical imagery or microscopic or telescopic data, that are not very friendly to the deduplication process.

And since we are talking about backups, despite the fact that data deduplication allows you retain data longer on disk resources, many organizations still have a requirement to backup data to tape. In that case, you’d want a solution that streamlines the process and integrates with tape infrastructure seamlessly. Having two separate infrastructures and performing backups twice, once to the deduplication target and another to a tape library, defeats the whole purpose of deduplication.

And to get back to the waste management analogy, in the greatest borough of all, Manhattan, during curbside collection days you see only garbage bags and no containers, and there is definitely a logistical reason for that. So as with any waste management system, while choosing your data deduplication solution, try to align it to your business and IT goals and operations.

Check out FalconStor’s next data deduplication announcement on April 26th. You may find your solution there!

Fadi Albatal

Fadi Albatal

E-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it
More in this category: « Prev Next »