Dedupe will also be an increasingly important feature for cloud providers. These organizations must provide complete, competitive services to an expanding client base while relying upon existing physical and environmental resources. Dedupe will allow these businesses to grow without further taxing their capital budgets, physical space or power and cooling needs.
Dedupe, though, is only as powerful as the solution you select. If dedupe is on your to-do list in the third or fourth quarter, consider these factors before choosing a vendor:
- Scalability – Deploy a deduplication solution that will scale as a single repository. Data continues to explode; don't get caught with islands of storage without the ability to globally share deduplication resources.
- Data availability – Look for clustering and high availability to guarantee immediate availability for fast recovery from a failure. Especially in a multi-site infrastructure that relies on a single data repository at the disaster recovery (DR) site. Even remote office server downtime is costly to a company.
- Integration with backup – Non-disruptive integration into your existing infrastructure is critical to ensure uninterrupted backup schedules and reduced indirect cost. Use of a virtual tape library (VTL) is important for a large tape-based infrastructure, eliminating software license fees and manpower cost to change backup processes. In this environment it's important to make sure your VTL vendor provides exact emulation of your existing tape library and tape drive type as well as integrated deduplication. For corporations that have already moved to a disk-to-disk (D2D) backup license or are planning to make that investment, a LAN-based (CIFS/NFS/OST) appliance with integrated deduplication will fill the need.
- Deployment model – The deduplication solution should be open and able to meet your infrastructure, Fibre Channel or iSCSI SAN or CIFS/NFS LAN and architectural needs. A deduplication appliance should be available as a fully integrated/configured appliance that provide fast and easy deployment, either as a gateway that can use existing DAS or SAN storage or as pure software able to run on existing server/storage, in the event your company has a select server and storage vendor list designed to meet specific SLA requirements.
- Direct tape integration – While many companies talk about eliminating tape, it still plays a tertiary role for long-term data archive. With deduplication, tape production can be reduced from daily or weekly tape creation to monthly tape creation, saving considerable cost and time while providing long-term archiving. Confirm that your vendor can support direct tape out at the central data center and DR site without having to go back through the backup media server. There is nothing worse than having to purchase another backup server, software licenses and personal resources just to create tape at your DR site.
- Evaluate time to recovery – Methods and performance for restoring deduplicated data varies as much as the number of vendors. Not all vendors rebuild the egg as well as others, and for some it can depend on the age and size of the data set; so it is critical to test restore and recovery performance on your specific data sets. A good deduplication system will provide direct access to data, read-ahead technology and high-speed disk spanning technology to mitigate the restore time required for restructuring data.
- Virtual deployment – Dedupe can be deployed as a virtual appliance, saving you hardware cost and allowing you to take full advantage of your virtual infrastructure. A perfect environment for small or remote offices with existing ESX servers.
- Dedupe ratio – Dedupe ratios are like gas mileage, they vary based on use. Remember deduplication ratio depends on the nature of the data, how frequently it changes and how long the data is retained. Don't focus on the theoretical dedupe ratio that vendors publish; the ratio is contingent on your unique data factors.
Dedupe goes a long way toward reducing data storage costs by making storage much more efficient, which in turn can reduce the overall footprint inside the data center. Because of this, it's not hard to predict why the rest of this year will be rich in data deduplication adoption.