Part 1 of a 2 part blog.
Copy Data Management
2015 was the year of science fiction blockbuster movies such as The Martian and Star Wars and more than anything else, this started me thinking about space travel. Some of you might agree, that space travel is the next big challenge mankind has taken on to solve because of the "exponential population" growth on earth.
Hopefully our brightest rocket ship scientists are building the next generation spaceship to make it possible to explore far flung galaxies so humans can colonize and continue with our way of life. Here is one of the great visionaries of our time had to say about these next generations rockets.
“If one can figure out how to effectively reuse rockets just like airplanes, the cost of access to space will be reduced by as much as a factor of a hundred. A fully reusable vehicle has never been done before. That really is the fundamental breakthrough needed to revolutionize access to space.”--Elon Musk
I for one would not want to journey on an antiquated spaceship with obsolete technology, so more power to Elon Musk to revolutionize space travel, as it's a fresh thought and out of the box thinking which will make space travel truly possible for everyone.
Back on Earth in the data center storage space we are seeing some challenges which are fundamentally similar and the biggest one of them being "exponential data growth". This Mckinsey report and podcast talks in detail about this problem and is greatly enlightening to know as to how next generation architectures for storage infrastructure have the right approach to solve this problem.
One of the biggest contributing factors for this exponential growth of data is due to the multiple copies of data created by various silo’d approaches that are spread around on various different storage platforms in data centers today.
The idea of Copy data management as a technology was conceived directly as a result of this problem.
Traditional "copy data management" platforms are based on traditional 3 tier architecture SANs and use block protocols such as FC and iSCSI. Block protocols have always been cumbersome to implement and manage, also due to physical limitations of the architecture they run into performance challenges at various different levels such as physical fan-in fan-out storage controller limits and serious performance degradation to storage IO during garbage collection background operations.
This is where Cohesity comes into play, as it brings the next generation Hyper-Converged platform and converges backup, copy data management, in place analytics into a single platform.
In a previous blog post I discussed the fully distributed integrated backup software on the Cohesity platform, and in this blog we will look how to use that backup to enable "copy data management"and spin up Test&Dev "copy data management" on a hyper-converged fully distributed platform using underlying NFS protocol. Using this next generation architecture and using file based protocols simplify the workflow of provisioning Test&Dev VMs.
Cohesity has patented a new way of taking snapshots which enable CDM with zero copy snapshots and thus enable unlimited snapshots with no performance degradation. This type of copy on write technology has been implemented by some other storage vendors on single or dual controllers, only Cohesity has taken this revolutionary technology and implemented it on a fully distributed storage platform. Here is an example from a lab system on which I setup a backup job to run every 10 mins, on 10/31/2015, and then literarily forgot about it. It had done 18261 snapshots over this period of time.
As far as my understanding no other storage vendor has implemented COW based snapshots on a fully distributed scale out storage platform with all the bells and whistles of next generation file systems baked in, i.e real time auto-tiering,de-duplication,compression, bit-rot detection, error correction,map-reduce based garbage collection, and strong consistency.
Now lets take a look at how we enable CDM on the Cohesity platform in 3 simple steps.
In this example, we will enable the Test&Dev workflow to show the CDM capabilities of the Cohesity platform.
- Find the backup job which you would like to use to create Test&Dev VMs from. As shown below the job can be searched using wildcards. Select the entire job or individual VMs. If the job is selected, then all the VMs in that job would be presented back to the vCenter from the Cohesity platform via a NFS datastore. In this example I am going to show selecting VMs from multiple different backups, adding them to a shopping cart and then in the same workflow provision them. This can be hugely beneficial as developers use multiple permutation and combinations of infrastructure VMs to troubleshoot/develop. In this example, c2400-vm-1 and c2400-vm-37 are from the latest backup while c2400-vm-18 is from the previous backup job.
- Edit c2400-vm-18 to pick an earlier date of backup and add it to the cart.
- In the 2nd step, we can rename the VMs with an option to add prefix or suffix to the VMs names which will be provisioned. The workflow also enables the Test&Dev VM to be provisioned into an isolated network or a pre-configured network port-group which could assign IP addresses to the VMs using DHCP. The location to provision the VMs is selected next, and the datastore name to be mounted on the ESX hosts is provided at this time, lastly the VMs can be either left powered off or powered on after provisioning.
- The summary of the provisioned environment and details are provided on when one clicks Finish.
As seen above the clone operation is successful, and it only took 26 seconds to provision 3 VMs to the ESX hosts as shown below.
On the vCenter we can now go ahead and power on these VMs to run thru all the Test&Dev use cases for these VMs. These Test&Dev operations would be completely independent of the ongoing backup jobs for these particular VMs configured in the c2400-job.
As the platform is implemented with QOS for different workloads i.e Test&Dev, Backup and configured with its own QOS policies powering on the VMs other Test&Dev workloads work without any performance degradation or stepping on top of each other.
As a final step once all Test & Dev functions have been performed on these VMs, they can be torn down and all VMs be deleted and the datastore be unmounted from the ESX cluster.
Fundamentally Cohesity has built Copy Data Management into the heart of its fully distributed and scale out filesystem OASIS. This in and itself an amazingly HUGE technology feat !!!
As seen above Cohesity enables the entire life-cycle of CDM for VMware environments with its 1st GA release. Going forward it is easy for Cohesity to apply this architecture for database applications such as SQL and Oracle and enable customers to take advantage of CDM.
So in conclusion Copy Data Management is one of the most crucial next generation technologies enterprise companies will need to adopt to stay ahead of the "exponential data growth” and overall data management challenges. CDM can enable customers to save big on CAPEX and OPEX with overall convergence of all secondary workflows into a single platform, which will enable organizations to “Do More with Less”.