Whenever I hear someone say "Do more with less", to me it just means CONVERGENCE.
Almost everyone who uses a smartphone is the ‘beneficiary’ of convergence and the following picture should make it clear what it is.
The smartphone is THE embodiment of convergence. It takes many different workflows and integrates them all very tightly with hardware and software thereby eliminating redundant hardware.
Now let's talk about convergence in the data center world. Flash, convergence, hyperconvergence and numerous other innovations have totally upended the primary storage world. However, secondary storage platforms have seen no architectural innovation in the past decade or more.
But what are these Secondary Storage platforms?
Well my definition is that these are platforms that are running workloads other than the mission critical tier 1 revenue generating applications in your enterprise. These workloads could be backup / recovery, test & dev, analytics, archival, file shares etc. So anything but the ‘money-maker’ apps.
Today’s secondary storage systems are based on legacy architectures and are quite complex to implement and error prone. For example an early adopter of Cohesity mentioned that an incumbent solution “takes a village to configure and implement”. These also atrociously expensive to purchase and use ‘forklift upgrades’ to scale up. Beyond steup, IT admins have to manage multiple different systems/products using multiple panes of glass to manage and provide various secondary storage related services. In some extreme cases admins use the expensive primary storage to support these secondary workflows.
So how does Cohesity address this seemingly interactable issue? It has done this by building a converged appliance that brings compute and hybrid storage to dense hardware building block.
But a hardware spec doesn’t a platform make! Hardware spec is the tiniest part of the story. The biggest differentiator is the core principle on which Cohesity is built: Bring simplicity to secondary storage environments in the data center.
Cohesity converges the essential secondary storage workflows such as backup / recovery, copy data management, analytics, file services, cataloging / indexing among other things to its scale out distributed platform. This is not very different from the what happened with the smartphone convergence that I mentioned above. Cohesity simply eliminates the complexities associated with managing multiple product UIs, workflows, systems and what-nots which result in fragmentation and lead to the inefficiency which unfortunately have become an accepted reality in the data center. Cohesity is changing this ‘reality’.
It is easy to make tall claims. But let me peel the architecture just a little bit to explain how Cohesity has built a fault-tolerant scale-out data platform that can always stay ahead of data management problems. The following block diagram will help you understand how Cohesity converges the workflows into a state of the art truly distributed scale out filesystem, which is built with all the core storage services any enterprise storage subsystem is expected to have.
Every component of this architecture is a blog post on its own. But one thing I am specially gaga-gaga about is the scale-out.
Every node in the cluster is running all software services and each node is a complete peer of the other. The metadata is infinitely scalable. This scale-out design avoids situations which can arise with hot disks or hot nodes. This enables the cluster to scale linearly with performance with every single node that is added to the cluster.There is simply no theoretical limit on how big it can grow (OK purists, there ALWAYS are limits).
So, can I take the liberty of describing Cohesity as "The Uber solution for Workflow Convergence" for the datacenter (VERY different from ‘Uber for storage’, which might be an interesting idea in itself...)?
Messrs Merriam (both G and C) and Webster certainly think so!
BTW, this similar model of application / services scaling is what Google, Facebook and Amazon's of the world have been implementing for a decade. It’s about time this very efficient, cost effective, fully redundant, fully fault tolerant model of distributed computing is adopted by the secondary storage side of the data center!