TRENDS IN HYPER-SCALE STORAGE FOR GROWING DATA WORKLOADS
By Farid Yavari
Storage has finally become an interesting field, full of innovation and change, addressing growing new requirements for storage flexibility, density and performance. The falling prices of flash, the introduction of various flavors of storage class memory, combined with increasing appetite for commoditization of the data center infrastructure, has helped fuel the innovation in how data is stored and accessed.
Companies are faced with the challenges of how to store their ever growing data efficiently, at a cost point that is palatable to CTOs and CFOs, while keeping the right levels of performance and SLAs in order to provide storage services to end users and applications. At the same time, internal IT organizations are facing the challenge of competing with flexibility and price points offered externally through public clouds.
Two new trends have emerged in architecting solutions for next-generation “hyper-scale” data centers and storage infrastructures that must grow on demand to meet compute, memory, and storage requirements. These requirements include on-demand provisioning, instant capacity management, and flexibility to scale each individual component independently, driving cost efficiency and a direct impact on the TCO.
First, as outlined in the table below, there are legacy SAN environments running transactional OLTP workloads, primarily based on Fibre Channel and NFS, with high-performance (greater than ~500K IOPS, less than ~5ms response time to the application) SLA targets. This environment is built on storage appliances and SAN installations with complete HA capabilities that provide data protection and service resiliency to the applications. The growth rate of the traditional SAN environment compared to other storage infrastructures is relatively low, and its impact on revenue is high enough to justify paying the premium for brand-name storage technologies that come with all the HA and data protection capabilities. Understandably, many companies are unwilling to try groundbreaking technologies within an OLTP infrastructure as stability; security and availability are the primary goals for this environment.
The second architecture described in the three columns to the right of the table, and arguably the fastest growing segment of every data center, is the scale-out environment running No-SQL, cloud and big data workloads. From the storage perspective, these environments usually run in a direct-attached storage (DAS) or disaggregated storage model based on various protocols such as ISCSI, PCIe, or NVMe. The scale of the storage infrastructure, especially for big data analytics, can reach hundreds of petabytes, which makes them extremely TCO driven. Many applications running in these environments have built-in resiliency, anticipate hardware failures, and can self-heal at the application layer. The document and key value stores, as well as analytics applications, feature server- and rack-aware replication-based data resiliency to guard data from hardware failures. When data protection and self-healing features are handled at the app layer, the need for building HA features at the storage layer is eliminated, which opens the door to utilizing consumer-grade, commodity hardware that can fail without impact to the service availability.
In future blogs, we’ll take a closer look at these trends in hyper-scale data centers, and how to achieve desirable TCO as well as resiliency and performance.