DATA ANALYTICS AND WHY IT MATTERS IN SDS PLATFORMS
By Pete McCallum
There was a time, not so long ago, when a storage administrator actually had to know something in order to do their job. There was no automation, auto-tiering, virtualization, API sets, QoS, or analytics; all we had was metaLUNs, concatenated metaLUNs, extent management, and RAID sets. We used to sit at the same lunch table as the UNIX guys who had to write code to open a text editor, and who had never EVER used a mouse.
Yes, it used to be that when something broke or started to slow down, we would fix it by actually going into a console or shell and typing some magic commands to save the day.
These days, managing storage is a very different proposition. We have such awesome capabilities emerging from software-defined platform stacks, such as IO-path QoS, hypervisor-and-cloud-agnostic protocols, and scale-out metadata - just to name a few. With all of these advancements, one would tend to think the days of the storage administrator have gone away. And I would tend to agree to some extent.
No longer is the storage administrator really concerned with finite volume management and provisioning. Today, storage performance almost manages itself. Thin provisioning is less about capacity optimization and more about data mobility. And there is almost as much data about our data as there is data.
In some ways we have converted storage administration into air-traffic control: finding optimal data paths and managing congestion of IO as things scale beyond reason. This is where analytics really comes into play.
In all aspects of IT, administration is taking a back seat to business integration, where knowing what has happened (reporting), plus what is happening (monitoring) starts to generate knowledge (analytics) about what is happening in the business. When we add predictive analytics, we add the ability to not only make technology decisions, but ostensibly, business decisions too, which can make a huge difference in meeting market demands and avoiding pitfalls. This moves IT (as well as storage) out of reactive mode and into pro-active mode, which is the number one benefit of predictive analytics.
Let’s see how this applies to a business-IT arrangement through a real world example: month-end close-out of books in a large company. In the past, an IT department would provide infrastructure that met the worst-case scenario of performance impact: So, despite having a 3000 IOPS requirement for 27 days of the month, the 35,000 IOPS month-end churn (for about eight hours) pushed for an all-flash array at 4x the cost of spinning disk. Because the volumes require a tremendous amount of “swing space” as journals fill and flush, reporting is run against copies of data, and Hadoop clusters scale up to analyze the data sets, almost a PB of storage capacity is required to support 200TB of actual production data. All of this is thick-provisioned across two datacenters for redundancy and performance in case of a problem or emergency.
Most of this data would be made available to the business through reporting and monitoring, which would allow an IT architect to decide on a storage and server platform that would handle this kind of load. Manual or semi-manual analytics of many different consoles and systems would merge the data (perhaps into a spreadsheet) where we would find that (apologies if my math is off a little):
- 30% of all data load happens on one day of the month.
- 90% of the “storage sprawl" is used for other than production data. Of the remaining 10% used for production data, perhaps 2% of that space actually requires the performance.
- Cost/TB/IOPS is skewed to fit 10% of the capacity (or .2% for real!), and 30% of the total load, at 8-20x the cost.
There are far more correlations of data that can be made – and are obviously actionable and meaningful to business. For example, one could:
- Right-size the performance load to the actual requirements of the dataset, rather than incurring tremendous expense to meet the worst-case scenario.
- Manually shift storage performance tiers prior to month-end (or automatically if the storage platform allows).
- Thin provision or use non-volatile, mountable snapshots for handling data mining and “copy data” to reduce storage sprawl.
All of these are actionable through a good virtualization platform (like FreeStor) and analytics on platform and application metadata. If we add a truly heterogeneous SDS platform (like FreeStor) that can operate across different performance and platform tiers of storage, we start gaining a breadth of insight into the infrastructure that surpasses anything an admin could reasonably wrap their day around. However, because of the sheer volume and complexity of capabilities, automation and foresight MUST be imbued into the control plane.
This is where intelligent predictive analytics comes in: It’s not about seeing into the future as much as it is correlating events from the past with current events to adjust capabilities in the present. If I know all the capabilities of my targets (performance, capacity, cache, storage layout for read/write optimization, etc.) and I know the trends in requirements from the source applications, AND I know the capabilities and features of the SDS platform (like FreeStor), then I should be able to correlate events and occurrences into policy-based actions to monitor security, performance, protection, and cost SLAs with actual point-in-time events in the system. I can then recommend or automate adjustments to IO paths, storage targets, DR strategies, and new operational requests through intelligent predictive analytics.
All this boils down to operational efficiencies for the business, cost savings in key infrastructure purchasing decisions, better SLA management for business workloads, faster conversion of data into information, and faster time-to-value. I know these are big phrases and promises, but we see it every day. No longer is it enough to be an administrator or an infrastructure architect. No longer is it enough for the CIO to manage a budget and hope systems don’t go down. These days, every aspect of IT is part of the business revenue stream and is a partner in making businesses profitable and efficient. Predictive analytics is a key enabler for this new requirement.