FalconStor
®

The BeFree Blog

As an early pioneer and industry leader in innovative, software-defined storage solutions, we often have thoughts and expertise we would like to share. Here at FalconStor, we strive to provide IT organizations and customers with solutions that provide the flexibility to BE FREE. Our newest platform, FreeStor is all about delivering the freedom and flexibility to manage storage sprawl and truly unify heterogeneous storage infrastructures. We also like to provide thought provoking and alternative views to Storage challenges, infrastructure, and the industry itself. Check back often for our latest thoughts and BE FREE to share your thoughts and comments. After all, ideas spark other ideas, and community discussion shapes cultures. Let’s share and learn together. | Sincerely – Gary Quinn – CEO

 

By Farid Yavari – Vice President, Technology - FalconStor
09 Feb 2016

Today, in most data centers, cloud, no Structured Query Language (No-SQL) and analytics infrastructures have been largely deployed on a direct-attached storage (DAS) architecture and is generally a Total Cost of Ownership (TCO) -driven deployment.

The DAS approach binds the compute and storage resources together, preventing independent scaling and tech refresh cycles. The converged DAS model works very well at smaller scale, but as the infrastructure grows to a substantial size, wasted compute or storage can greatly affect the TCO of the environment. Since the DAS model is constrained by the available slots in a server, scale is limited and often quickly outgrown. In some compute heavy environments, there may be enough DAS allocated to the servers, but the work load needs more Central Processing Units (CPUs), therefore some of the allocated DAS stays unused when additional nodes are added.  In addition, since the compute infrastructure is usually on a more aggressive tech-refresh cycle than the storage, converging them together in a single solution limits the flexibility for the tech-refresh. There is a trend to disaggregate at least the warm, cold, and archive data from the compute capacity, and use storage servers in separate racks as Internet Small Computer System Interface (iSCSI) targets to carve out the storage capacity. Hot data, especially if it resides on Solid State Disks (SSDs), is not easily moved to a disaggregated model because of network bandwidth and throughput requirements

The disaggregated iSCSI storage servers are basically commodity servers with just enough compute to drive the input/output (IO), and a large amount of storage to act as a pool of dense capacity. They can contain high performance SSDs, Hard Disk Drives (HDDs), or extremely low-cost, low-performance solutions such as Shingled Magnetic Recording (SMR) drive, depending on the workload performance and price requirements. In some SMR-based storage servers, a very thin layer of Non-Volatile Dual In-line Memory Module (NVDIMM) is used as a buffer to convert random write IOs to sequential for better efficiency

Some high-performance storage servers accommodate up to 240 terabytes (TB) of all-flash capacity sitting on a 12G Serial Attached SCSI (SAS) backend in 2 Rack Units (RUs), with two separate X86 servers in the same chassis, acting as “controllers” and a total of four, 40-Gigabit (40G) Ethernet connections (two on each server). There are other examples of very low cost, all-HDD storage servers with up to 109 6TB 3.5” Serial Advanced Technology Attachment (SATA) drives and two, single-core X86 controllers with 10G Ethernet connections to the network in a 4 RU stamp.

Carving out ISCSI target logical unit numbers (LUNs) in a storage rack and presenting them to various initiators in a different compute rack is a valid disaggregated model for storage in a scale-out architecture. In some instances using iSCSI Extensions for RDMA (ISER) with routable Remote Direct Memory Access (RDMA) can further speed up the throughput and input/output operations per second (IOPS) of the architecture. There is an added cost of the network upgrade that needs to be accounted for, usually around 20-25 percent of the total cost of the solution. The storage network needs a minimum of 40G connectivity on the storage servers and 10G connectivity on the initiator side. The network switches need to have extra-large buffers to prevent packet drops and in many cases priority flow control (PFC), explicit congestion notification (ECN), and quantized congestion notification (QCN) become necessary.

There are many ways to build a disaggregated storage model depending on use cases and requirements. In our next blog, we will cover how a disaggregated model benefits from a properly architected software platform, to gain not only value and utility, but essential features for the applications and the driving business needs.

By Farid Yavari – Vice President, Technology - FalconStor
02 Feb 2016

Storage has finally become an interesting field, full of innovation and change, addressing growing new requirements for storage flexibility, density and performance. The falling prices of flash, the introduction of various flavors of storage class memory, combined with increasing appetite for commoditization of the data center infrastructure, has helped fuel the innovation in how data is stored and accessed.

Companies are faced with the challenges of how to store their ever growing data efficiently, at a cost point that is palatable to CTOs and CFOs, while keeping the right levels of performance and SLAs in order to provide storage services to end users and applications. At the same time, internal IT organizations are facing the challenge of competing with flexibility and price points offered externally through public clouds.

Two new trends have emerged in architecting solutions for next-generation “hyper-scale” data centers and storage infrastructures that must grow on demand to meet compute, memory, and storage requirements. These requirements include on-demand provisioning, instant capacity management, and flexibility to scale each individual component independently, driving cost efficiency and a direct impact on the TCO.

First, as outlined in the table below, there are legacy SAN environments running transactional OLTP workloads, primarily based on Fibre Channel and NFS, with high-performance (greater than ~500K IOPS, less than ~5ms response time to the application) SLA targets. This environment is built on storage appliances and SAN installations with complete HA capabilities that provide data protection and service resiliency to the applications. The growth rate of the traditional SAN environment compared to other storage infrastructures is relatively low, and its impact on revenue is high enough to justify paying the premium for brand-name storage technologies that come with all the HA and data protection capabilities. Understandably, many companies are unwilling to try groundbreaking technologies within an OLTP infrastructure as stability; security and availability are the primary goals for this environment.

The second architecture described in the three columns to the right of the table, and arguably the fastest growing segment of every data center, is the scale-out environment running No-SQL, cloud and big data workloads. From the storage perspective, these environments usually run in a direct-attached storage (DAS) or disaggregated storage model based on various protocols such as ISCSI, PCIe, or NVMe. The scale of the storage infrastructure, especially for big data analytics, can reach hundreds of petabytes, which makes them extremely TCO driven. Many applications running in these environments have built-in resiliency, anticipate hardware failures, and can self-heal at the application layer. The document and key value stores, as well as analytics applications, feature server- and rack-aware replication-based data resiliency to guard data from hardware failures. When data protection and self-healing features are handled at the app layer, the need for building HA features at the storage layer is eliminated, which opens the door to utilizing consumer-grade, commodity hardware that can fail without impact to the service availability.

In future blogs, we’ll take a closer look at these trends in hyper-scale data centers, and how to achieve desirable TCO as well as resiliency and performance.

By Pete McCallum – Director, Data Center Solutions Architecture - FalconStor
14 Oct 2015

There was a time, not so long ago, when a storage administrator actually had to know something in order to do their job. There was no automation, auto-tiering, virtualization, API sets, QoS, or analytics; all we had was metaLUNs, concatenated metaLUNs, extent management, and RAID sets. We used to sit at the same lunch table as the UNIX guys who had to write code to open a text editor, and who had never EVER used a mouse.

Yes, it used to be that when something broke or started to slow down, we would fix it by actually going into a console or shell and typing some magic commands to save the day.

These days, managing storage is a very different proposition. We have such awesome capabilities emerging from software-defined platform stacks, such as IO-path QoS, hypervisor-and-cloud-agnostic protocols, and scale-out metadata - just to name a few. With all of these advancements, one would tend to think the days of the storage administrator have gone away. And I would tend to agree to some extent.

No longer is the storage administrator really concerned with finite volume management and provisioning. Today, storage performance almost manages itself. Thin provisioning is less about capacity optimization and more about data mobility. And there is almost as much data about our data as there is data.

In some ways we have converted storage administration into air-traffic control: finding optimal data paths and managing congestion of IO as things scale beyond reason. This is where analytics really comes into play.

In all aspects of IT, administration is taking a back seat to business integration, where knowing what has happened (reporting), plus what is happening (monitoring) starts to generate knowledge (analytics) about what is happening in the business. When we add predictive analytics, we add the ability to not only make technology decisions, but ostensibly, business decisions too, which can make a huge difference in meeting market demands and avoiding pitfalls. This moves IT (as well as storage) out of reactive mode and into pro-active mode, which is the number one benefit of predictive analytics.

Let’s see how this applies to a business-IT arrangement through a real world example: month-end close-out of books in a large company. In the past, an IT department would provide infrastructure that met the worst-case scenario of performance impact: So, despite having a 3000 IOPS requirement for 27 days of the month, the 35,000 IOPS month-end churn (for about eight hours) pushed for an all-flash array at 4x the cost of spinning disk. Because the volumes require a tremendous amount of “swing space” as journals fill and flush, reporting is run against copies of data, and Hadoop clusters scale up to analyze the data sets, almost a PB of storage capacity is required to support 200TB of actual production data. All of this is thick-provisioned across two datacenters for redundancy and performance in case of a problem or emergency.

Most of this data would be made available to the business through reporting and monitoring, which would allow an IT architect to decide on a storage and server platform that would handle this kind of load. Manual or semi-manual analytics of many different consoles and systems would merge the data (perhaps into a spreadsheet) where we would find that (apologies if my math is off a little):

  • 30% of all data load happens on one day of the month.
  • 90% of the “storage sprawl" is used for other than production data. Of the remaining 10% used for production data, perhaps 2% of that space actually requires the performance.
  • Cost/TB/IOPS is skewed to fit 10% of the capacity (or .2% for real!), and 30% of the total load, at 8-20x the cost.

There are far more correlations of data that can be made – and are obviously actionable and meaningful to business. For example, one could:

  • Right-size the performance load to the actual requirements of the dataset, rather than incurring tremendous expense to meet the worst-case scenario.
  • Manually shift storage performance tiers prior to month-end (or automatically if the storage platform allows).
  • Thin provision or use non-volatile, mountable snapshots for handling data mining and “copy data” to reduce storage sprawl.

All of these are actionable through a good virtualization platform (like FreeStor) and analytics on platform and application metadata. If we add a truly heterogeneous SDS platform (like FreeStor) that can operate across different performance and platform tiers of storage, we start gaining a breadth of insight into the infrastructure that surpasses anything an admin could reasonably wrap their day around. However, because of the sheer volume and complexity of capabilities, automation and foresight MUST be imbued into the control plane.

This is where intelligent predictive analytics comes in: It’s not about seeing into the future as much as it is correlating events from the past with current events to adjust capabilities in the present. If I know all the capabilities of my targets (performance, capacity, cache, storage layout for read/write optimization, etc.) and I know the trends in requirements from the source applications, AND I know the capabilities and features of the SDS platform (like FreeStor), then I should be able to correlate events and occurrences into policy-based actions to monitor security, performance, protection, and cost SLAs with actual point-in-time events in the system. I can then recommend or automate adjustments to IO paths, storage targets, DR strategies, and new operational requests through intelligent predictive analytics.

All this boils down to operational efficiencies for the business, cost savings in key infrastructure purchasing decisions, better SLA management for business workloads, faster conversion of data into information, and faster time-to-value. I know these are big phrases and promises, but we see it every day. No longer is it enough to be an administrator or an infrastructure architect. No longer is it enough for the CIO to manage a budget and hope systems don’t go down. These days, every aspect of IT is part of the business revenue stream and is a partner in making businesses profitable and efficient. Predictive analytics is a key enabler for this new requirement.

By Pete McCallum – Director, Data Center Solutions Architecture - FalconStor
21 Aug 2015

Let’s face it; embracing new storage technologies, capabilities, and upgrading to new hardware often results in added complexity and costs. The reality is that when IT equipment, platforms, and applications do not integrate with one another, the resulting “sprawl” of storage islands and silos on disparate systems can be costly, risky, disruptive, and time-consuming.  But it does not have to be that way. 

Few organizations have the luxury of performing a massive infrastructure replacement or maintaining completely identical infrastructures for primary and secondary storage. Hardware/platform incompatibility, different system generations, different architectures, and different media types can compromise even the most diligent efforts at protecting and replicating business critical data.

A properly architected Software-defined storage approach can ease many of these integration and management pains.   Software-defined storage implemented at the network fabric layer, abstracted from the underlying hardware, will bypass storage sprawl issues because it standardizes all tools, data services, and management. 

Horizontal, software-defined storage deployed across the infrastructure in a common way should accommodate storage silos on geographically dispersed data centers, locally on different storage systems, or across physical and virtual infrastructures. Software-defined storage eliminates the accumulation of point solutions, and regards all storage as equal.  This enables the delivery of common data services like migration, continuity, recovery, and optimization that can be executed consistently across the entire storage infrastructure. That reduces complexity, the numbers of silos to manage, as well as lowers licensing costs for data services array by array.

The key to solving the problem is not to solve it at all, but to work with it, through a truly horizontal software-defined storage platform that can marry unlike infrastructures, including arrays, servers, hypervisors, and the private or hybrid cloud.  It’s time move the industry forward and BE FREE to eliminate the legacy of silos and infrastructure complexity.


HOME   |   INDUSTRIES   |   SOLUTIONS   |   PRODUCTS   |   SUPPORT   |   NEWS & EVENTS   |   PARTNERS   |   ABOUT US   |   INVESTORS   |   CORPORATE GOVERNANCE   |   PARTNERS   |   CONTACT US
FalconStor
®
1.631.777.5188  
  |  CHINA   |  FRANCE   |  GERMANY   |  JAPAN   |  LATIN AMERICA   |  SOUTH KOREA   |  SPAIN   |  SWITZERLAND
SEARCH
© 2016 FalconStor Software. All Rights Reserved
Privacy Policy & Legal