Share this:

It’s no secret. Application developers are drawn to the agility, scalability and versatility of public cloud services as they are instantly instantiated, scaled, and updated. There is no wait for IT service tickets to be processed and DevOps has precise control over IT infrastructure. There’s only the worry about the bill at the end of the month and not having the umbrella of network security that the corporate firewall provides.

What is surprising though is that there seems to be a struggle to provide the same experience using on-premises infrastructure. Didn’t IT invest in Hyper-converged Infrastructure (HCI), Software-Defined Storage (SDS), and all-flash storage to eliminate complexity, lower cost, and to eliminate all hurdles to satisfy the demands of current hypervisors, cloud native databases, and container orchestrators that developers love?

The answer is likely “yes, in all of them”. So, you, the application owner, are left with the burden to integrate, manage, scale, and automate all of them individually to satisfy the needs of developers. In an era where the applications rule the enterprise, the time has come for you to step up and demand more from your IT infrastructure! What exactly? Well, let me outline it below.

1. An API first approach to data storage

What does this really mean? While many data storage vendors, be it software-defined or traditional, claim to provide some level of this, the reality is that there isn’t a single option available that currently offers a true API first approach.

Storage administrators need to automate repeated tasks, streamline complex sequences of operations, or to accelerate time-critical operations in order to keep up with change requests of application owners. So naturally, storage vendors cater to these needs by providing a scripting option for their products but delivered mostly as an afterthought.

Most vendors provide a command line interface (CLI) for automation. However, the CLI is designed for human interaction and using it for automation may jeopardize your data as console outputs are unreliable. Others provide an API that covers only a subset of the array’s functionality and is scoped to a single array. What if you have 10, 20 or 100 arrays? How many vendors allow you to do firmware upgrades using their API?

In a true API first approach, control of every single aspect of your data storage system is done through a native API that works reliably at scale so every data storage related request from application owners can be fully automated.

2. Storage architected for modern workloads

It’s established that having an API is key to simplifying data storage infrastructure and accelerate provisioning. However, it’s not an excuse for unnecessarily complicated storage architectures, unfit for modern application workloads.

In traditional architectures servers attach to shared storage arrays over a SAN. Even an API won’t rescue storage administrators from the complexity of managing a SAN in rapidly changing environments on top of the complexities of a shared storage array that serves a plethora of different workloads. This architectural limitation makes it a multi-level burden for the storage administrator to cater an application owner’s request for a volume being provisioned to an application server.

Software-Defined Storage doesn’t change this fundamental architecture and it remains unmodified with the exception of using off-the-shelf componentry for storage in preference to proprietary hardware.

Hyper-Converged Infrastructure changes this architecture by collapsing compute and storage services to run side-by-side on a single server. There is no longer an (expensive) SAN that connects disaggregated compute and storage. Simpler, right? Don’t be fooled. Stringent hardware, firmware, software compatibility recipes, competing CPU, memory and networking resources, cause management nightmares at scale.

Lastly, all of these architectures are built (from the ground up) for shared storage. But many modern workloads are built for shared-nothing environments for which these architectures introduce unnecessary capacity and performance overheads. You know what I’m talking about if you’ve ever tried to run Apache Cassandra, Apache CouchDB, Apache Kafka, etc. on a shared storage array.

Why settle for any of these storage architectures that limits agility with unnecessary complexity for automation, monitoring and troubleshooting when they don’t serve modern cloud native workloads as well as they serve traditional workloads?

3. A multi-vendor compatible infrastructure management software

There are only few large enterprises that operate a single-vendor IT infrastructure. And there are many good reasons for that, e.g. best-of-breed vendor selection, prevention of lock-in, or simply inherited IT infrastructure through company acquisitions.

Whatever the reasons may be, the people that are providing infrastructure services for your applications are left with managing heterogeneous devices, which might be incompatible, lack the same level of automation, and are inevitably slowing them down in their ability to cater your application needs. Although there is progress for API standardization to help manage heterogeneous devices, vendors are incentivized to provide additional, differentiating capabilities on top of standards, which render the standards meaningless.

This means that you’re tasked with integrating over heterogenous infrastructure with different APIs if you want to know the details about your applications in the datacenter. Why should you have to put up with this any longer?

4. More than just storage telemetry

Artificial intelligence for IT Operations is the latest trend that data storage vendors promote with their products. The fundamental idea is great and a necessity. Your applications become more distributed, interdependent, and the quantity of parameters, metrics, and points of failure are too many for humans to comprehend. AIOps is needed to assist your operations team to automatically spot and react to issues in real time.

But data storage vendors only implement analytics by collecting log files that were originally intended for support and then retrofitting them into a pseudo analytics pipeline. The facilities to extract, transmit and analyze telemetry data is an afterthought, not optimized to make effective use of machine learning and deep learning at scale. Storage vendors need to do expensive data cleaning, data transformation, and data correlation operations which make real time analytics for their products practically impossible.

Even for the storage vendors that provide the best AIOps capabilities, you’ll find a delay of at least 15 minutes in their analytics pipeline, but most measure theirs in hours. A lot can happen in 15 minutes, never mind an hour. Why settle for an analytics engine that counts in minutes or hours and not in seconds?

5. A single tool for analytics and management

Analytics is only one part of AIOps. The objective of AIOps is being able to quickly react to issues in real time. For storage array vendors today, that means sending out a notification to a storage administrator, sometimes with instructions, and sometimes proactively. How quickly can humans react to these notifications?

The core problem for storage vendors that will continue to exist is that their analytics services are entirely isolated from their control plane. Also, this stems from the fact that analytics is merely an afterthought and wasn’t a conscious design decision. So, even if storage array vendors would rearchitect their products to deliver real time analytics, they are still unable to act on issues automatically.

Think of a smart thermostat that can predict that your room temperature will exceed your comfort levels but requires you to touch and adjust every heater in your room manually to the correct setting. Why wouldn’t the thermostat be adjusting the heater settings automatically?

You should demand the same experience for your IT infrastructure: A complete AIOps engine for data storage that combines analytics and the ability to act on solvable issues automatically, in real time (so you don’t have to).

6. A holistic view into your datacenter

While we’re at it, why limit yourself to just data storage? Your applications depend on more than just data storage from a single vendor, but also compute resources from heterogeneous vendors, and of course networking.

Being able to gain holistic insights into what your application is using in your datacenter is essential for effective troubleshooting and planning. Why settle for anything that provides you only partial information of your application stack?

If any of these points resonate with your current environment, get in touch with us and learn how Cloud-Defined Storage can benefit your application infrastructure. And don’t forget to follow us on social media to get informed when we discuss the benefits of Cloud-Defined Storage for Kubernetes, VMware, Microsoft, and bare-metal applications in future blog posts.

Share this:

Author Image

Tobias Flitsch

Principle Product Manager

Tobias Flitsch has been working in enterprise data storage for more than 10 years. As a former solution architect and technical product manager, Tobias focused on various different scale-up and scale-out file and object storage solutions, big data, and applied machine learning. At Nebulon, his product management focus is on understanding and solving customer challenges with large-scale infrastructure.