The 100-year Archive and the Data Preservation Explosion—Part One: The Compounding Storage Growth Rate and Long-term Data Preservation demands a Next Generation Archive

The technology market is always changing and innovating across the computing, networking, and storage elements, which are the holy trinity of the Von Neumann architecture. Innovation, in one area, spurs changes in the other two, as the cycle of innovation progresses with each element alternating between being the leader and then the bottleneck within the overall architecture. Today, new requirements are emerging concerning data storage, retention, and reinstatement of electronically generated data assets, which demands new approaches and new levels of optimization across all three elements.

To understand the emerging storage challenges, segmentation of the storage market is necessary. If the market is segmented from an active and passive data access patterns viewpoint, it bifurcates into two distinct sections. The first is Operational Storage, and the second is Long-term Archival Storage. Both have drastically different requirements. Operational Storage is the high access, high-performance, low latency, and very dynamic storage that is actively used daily for business operations, as well as the short-term backup and snapshots for quick recovery purposes. Long-term Archival Storage is the cost-efficient, deduplicated, and compressed storage that is used for information preservation purposes. It is data that is traditionally infrequently accessed. Both capabilities are essential for businesses—however, the demands and requirements for Long-term Archival undergoing significant evolution.

IDC’s Datasphere highlights archival data volume, which currently accounts for 60% of the total Datasphere. And, their growth estimates for archive data volumes have consistently increased over the last few years, as the implications of new mandates and regulations have become apparent. The growth of archival data volume will compound exponentially, as more information is included in the lengthy data retention and preservation mandates and, we believe, will quickly eclipse the other categories in the Datasphere.

Today’s Data Storage Volumes

As more informational assets become subject to preservation, organizations will have to bear the ever-growing expense of data retention volumes, which will negatively impact to profitably. The traditional archive solutions with their monolithic architectures cannot scale efficiently to meet the expected data volumes or meet the new preservation technical requirements. The cost point of today’s solutions becomes cost-prohibitive, as preservation volume grows.

There are many future implications and demands of the long-term archives with 100-year preservation lifecycles. Data portability between storage systems is a concern, as storage systems age and are replaced by newer technologies. Over 100 years, the data would need to be migrated to new systems approximately ten times throughout its lifecycle. How does an organization maintain and prove data integrity throughout ten migrations? The Cloud will also have similar problems with technology upgrades and multi-Cloud interoperability. The use of virtualization could abstract the stored information in a data center or Cloud to make it easier to migrate or reduce the impact of system-level technology changes. However, this would also imply that the abstraction would need to be backward compatible for up to 100 years, which is “Infinity Plus” from a development perspective and a rather complicated issue. Another concern surrounds the application itself that generated the data. In a fifty to hundred years from today, will the application be able to process information from 10, 25, 50, or 100 years in the future, as the application transitions from Version 1.0 to Version 100 overtime? Will, the company that developed the application, still be in business to provide application access, licensing, and backward compatibility? These and other questions and concerns must be addressed to mitigate the impact of the Long-term Archival data tsunami.

This blog series will focus on the changes in regulations and mandates, extended retention periods, data integrity inquiries, expense management, and other technical challenges facing the Long-term Archival Storage segment. Fundamentally, we believe a new approach is needed to meet the Long-term Archival requirements.