Opening a cloud account was easy – but how do you get all of your data there?

Everyone today is talking about storing data in the cloud, it’s a bit of a hype – it’s cheap, it has unlimited capacity, it’s cool, so why not use it? But have you ever thought about how long it takes to actually copy the data to the cloud? No, right? Let’s shed some light on it.

What kind of data are we dealing with?

Let’s first cover some basics. Why are you copying data to the cloud in the first place? Are you looking for a way to protect your data, looking for cheap long-term storage, or are you trying to avoid huge capital expenses for storage? Next step is to define what kind of data you need to copy – production data, non-production data, backups or archives? Now, let’s define how much data do you have – 10TB, 100TB, 1PB, more? And finally – exactly how big is your pipe to the public internet?

Let’s look at the following table to better understand why this is important.

Data typeStorage requirementsBandwidth
ProductionLowLow
Non-productionMediumMedium
BackupLargeHuge
ArchiveExtremely largeInfinite

The following table shows how long it takes to transfer data to the cloud with today’s high speed datacenter links.

 1Gbps10Gbps
1TB0.2 days0.0 days
10TB1.7 days0.2 days
100TB16.5 days1.7 days
1PB165.3 days16.5 days

The most common reason to copy production data to the cloud is to protect it, usually by building a cloud Disaster Recovery (DR) site and syncing data to it. For production data, storage capacity requirements are relatively low. However, storage performance and link latency requirements are high. Since there is not so much data to be transferred, internet bandwidth is not a relevant factor as only pieces of data are synced to the cloud. For production data to be protected, even lower speed links such as 100Mbps should be good enough.

The main reasons for storing non-production data (eg. fileserver data) in the cloud are lower storage costs and ease of maintenance. There are various approaches on how to create fileservers in the cloud – cloud native and hybrid fileserver products, or typical storage-as-a-service products. With non-production data, your main goal should be to pay less for more – so, think of utilizing all the cool storage technologies such as deduplication, compression or anything that might help you reduce your data footprint in the cloud. Since you will be transferring at least a few 10’s of TBs to the cloud, you will need a bigger pipe, something in the 500Mbps – 1Gbps range.

Backups require a large capacity measured in 10’s of TBs and are usually stored on magnetic tapes or virtual tape libraries (VTL). Still, costs of tape libraries or storage arrays in datacenters can rapidly grow over the roof, so considering cheap cloud storage makes perfect sense. How do you transfer such large amount of data to the cloud? If you’re storing your backups on physical tapes – congratulations, you just locked yourself out of cloud storage for as long as you store data on physical tapes. Consider using Virtual Tape Library (VTL) instead or a cloud tape import service – this is the first step towards your cloud journey. Next – you will need at least 1Gbps bandwidth line to even think about this option. But even if you have 1Gbps bandwidth, you should consider yourself lucky if you achieve 70% of what you’re paying for. Have in mind that you’re transferring data to the cloud – it is most probably not in the building next to you, but in another state or maybe even another country. There’s latency involved, packets travel back and forth, and it takes time which affects the final transfer speed from you to the cloud.

Archives require enormous capacity as this is the data that you are legally bound to keep for X number of years. Think of it as a copy of all your backups throughout the past 10 years. We’re talking about 100TB to 1PB or even more data that must be stored somewhere. Everything we mentioned in the former paragraph applies here but multiplied by 10x. Simply – don’t copy the data via internet pipe to the cloud, just don’t. If you have 100TB of data to copy, guess what – on 1Gbps link you are looking into 15-16 days at minimum. All fine if you have that much time to wait, but just don’t do it this way, there are other more elegant ways to do it that will save you lots of time and trouble.

How am I going to transfer all my TBs of data to the cloud then?

Ask for help! There are several technologies and techniques out there that can help you. For example – there’s deduplication which can help you reduce your local backup or archive storage footprint by up to 90-95%. That means that 1PB of raw data takes no more than 50-100TB after deduplication. That’s surely something that doesn’t take few months to copy. If you have 100TB, you will end up having 5-10TB and that can be copied in few hours to the cloud via 1Gbps link.

Another option you might want to look at to avoid overutilizing your public internet link is something we internally call ‘The pigeon’. The technique is very simple – the cloud vendor sends you a NAS device via courier service. You attach it to your internal network and copy the data to it at full speed (1Gbps, 10Gbps, as fast as you can). Once you’re done copying, you send the NAS device back to the cloud vendor and data gets imported to your account. You would be surprised, but this process takes a few days to complete compared to months required to copy anything above 200TB on 1Gbps link. If you have 100’s of TBs of data, it is most likely that you have at least 10Gbps local network which is much faster than 1Gbps and in this process, every Gbps counts.

The combination of these two techniques will help you quickly transfer data to the cloud and get all the benefits cloud storage has to offer – low cost, low maintenance and access to data on demand.

How much will all this cost me?

Before we answer this question, let’s talk about TCO – Total Cost of Ownership. Storage equipment costs are not about storage equipment costs at all. There are so many hidden costs involved – one-time employee costs for research, procurement, implementation, configuration to name a few. Then, there are datacenter space, power, and cooling costs for as long as you use your datacenter equipment (eg. 5 years) to consider. Additionally, you need to consider ongoing employee costs for system administrators, training, and certifications. And now, we get to storage equipment costs. With cloud storage, there’s are no hidden costs involved. You pay for what you use for as long as you need it. From our experience, we have seen TCO reduction in ranges of 50-70%.

Can FalconStor help me in my cloud journey?

Sure we can! Our product portfolio covers protection for all data types mentioned above. FalconStor StorSafe Virtual Tape Library (VTL)  supports on-premises use, hybrid cloud, and pure cloud storage, and offers the best deduplication performance out there. It doesn’t require you to learn a whole new method for backup and it’s  the only product capable of importing physical tapes into VTL, all of which is great if you’re running legacy backup system and don’t want to redesign your internal backup processes. FalconStor StorGuard can help you build a DR site in the cloud and help you move away from any kind of vendor lock in. Why not run production servers in one cloud, and build DR site in another cloud? Our products can help you build it! FalconStor StorSight serves as the control panel for both StorSafe and StorGuard allowing you to monitor your storage use across the datacenter and cloud.

Learn more about our product portfolio or contact us if you have any additional questions.

Goodbye datacenter, hello cloud!

Lamar Duke
FalconStor is the trusted data protection software leader modernizing disaster recovery and backup operations for the hybrid cloud world. The company enables enterprise customers and managed service providers to secure, migrate, and protect their data while reducing data storage and long-term retention costs by up to 95 percent.  More than 1,000 organizations and managed service providers worldwide standardize on FalconStor as the foundation for their cloud first data protection future.