The 100-year Archive and the Data Preservation Explosion—Part Three: The Impending “Data-Centric” Storage Revolution

If we revisit the Von Neumann architecture from Part One, it has compute, network, and storage. A compute engine and its more complex brother, the application, execute its instructions, and delivers the answers (simplistically). Once finished with the calculation, the compute engine resets to zero and prepares for the next calculation. Compute is a transient function and does not remember its previous state. Networking is similar in most instances, as it takes data, encapsulates it and applies its destination address, and sends the packet in the network. If the packet is never received, it is merely retransmitted. After the sending, the router resets. The storage element is not Stateless…it is Stateful. The bits sent to the storage system are recorded and kept. The information might need to be stored for minutes, hours, days, months, years, decades, and in some cases, centuries. I like to make the analogy to one’s bank account. One makes a deposit, gets monthly audits, and expects to be able to walk in and make a withdraw at any point in the future. In banking and storage, there is a long-term fiduciary duty to the customer to be a good steward of the money/data at all times, which is a responsibility that can reach far into the future. With all of the container benefits, it seems that it could be a game-changing technology for storage…if, it was not stateless.

A Data-Centric Approach to Storage – A Revolution in Storage

Let’s assume the stateless challenge can be overcome and look at how container technology could be applied to storage. In working through this exercise with Marc Staimer of Dragonslayer, he immediately understood the implications of disaggregating data from the operating system and hardware and creating a data-centric container, as well as its far-reaching implications. “It is all about the data. It is a genuine “Data-Centric” approach,” said Marc. It is one of the real game-changers in storage in many, many years. Let’s take a look.

  • Portability is a core feature of containers. If a container was filled with data, it could be transferred between on-premises systems, as well as transferred across public and private clouds seamlessly with no impact on the data ensconced within it. This data-centric capability would deliver the customer the numerous options to move data to the most cost-effective solution on the market, especially as retention periods are extending to 10, 25, 50, and in many cases, 100 years. In a 100-year retention scenario, an organization would cycle through six to eight storage traditional systems with six to eight data transfers between systems. They would also have to maintain all data integrity throughout each transfer. With containers, the entire container could be moved in its entirety with no impact or change in the data contents.
  • Futureproofed for years to come. Since containers are based on an Open Source standard, the application programmable interface (API) between the container and the rest of the world is continually being updated. New underlying operating systems and hardware advancements are integrated into the existing API and, from a container perspective, seamlessly work. Traditionally, futureproofing storage technology has been nonexistent. I called it the “Zip Drive” problem, in that it was amazing technology at the time. However, today, I have stacks of Zip Floppy disks and have lost the Zip Driver reader, which renders all the data on those disks inaccessible. There are numerous examples of similar problems in enterprise system storage that have been EOLed and have mission-critical data stored within. From a “data-centric” perspective, having access to the data in 10, 25, 50, 100 years is a critical consideration, as more information becomes subject to retention mandates. Betting on an existing storage vendor being around in 10 years is a challenge and making a bet that they will be around in 100 insane within the context of the turnover in the tech market. Regardless of what happens to your storage or cloud vendor, organizations are still mandated to have access to and reproduce data on-demand in several retention instances for up to 100 years in the future.
  • Variable Payload or variable container size is another exciting capability. Unlike other standards like LTO, which have a fixed capacity due to physical limitations, containers are theoretically unlimited. This flexibility allows very large containers that would offer hyper-deduplication and compression for long-term archival storage cost-efficiency. Or if there were accessibility constraints, like an eDiscovery corpus, the container size could be smaller for accelerated retrieval.
  • Passive and Active Container Features are another benefit of containers that have many implications, as well as a broad range of new features and functionality. Having a mandated container stored with a powerful immutable retention capability with an added secure delete capability upon retention expiry would be significant cost savings for an organization attempting to manage the data explosion due to new retention mandates. Another significant concern, especially when storing data in a publicly accessible Cloud, is data integrity. With both an active and passive capability, containers could maintain their own health records. A scheduled data integrity health check with a journaled log of data health, access logs, attempted access logs and alerts for cyber threats, and checksum or hash checking would confirm that the data that was stored 10, 25, 50, 100 years ago is still the data that is in the container. For compliance, regulatory, privacy audits, and eDiscovery chain of custody inquiries, the data integrity type functionality will likely become a must-have capability of all future storage products and services.

So, containers can solve data portability that arises with end-of-life systems retirement that has always plagued the storage industry in long-term storage, backup, and archival. The data-centric approach would refocus the industry, not on the speeds and feeds of their systems, but allow efforts to be focused on data confidentiality, integrity, accessibility, availability, and nonrepudiation in this new world of never-ending storage endpoints and rising cyber threats.

Containers have the power to revolutionize the storage industry if the Stateless issue can be solved and a Persistent Virtual Storage Container created.