Data Storage 101: All Storage Solutions Are Not Equal

Your business generates quite a bit of data – from the email you receive, to the PDFs you create, to the video you capture and to the Power Points you make.
The challenge for your business is how to ensure you have sufficient storage and storage management.
Jeff Ready, co-founder of Scale Computing helps us understand some of the fundamentals in regard to data storage:
Are there different vertical industries of business that use more storage than others?
Yes, certainly there are. Document intensive businesses, such as those in the medical, legal, and construction fields are verticals where we’ve seen a tremendous amount of storage needs on a per-employee basis. Likewise, businesses which deal with different kinds of media, such as television & radio stations, video editing shops, and photographers use a lot of storage. Finally, the move to digital video cameras for surveillance purposes (and away from VCRs and analog cameras for the same purpose) drives a lot of storage use in retail and other industries where visual security is a concern.
Are there any different types of businesses (size? Revenue?) that use more storage than others?
Generally speaking, your very-small 1-5 person businesses don’t use a lot of network storage. This is not always the case, particularly in the video editing / photography areas as described above, where a very small business may have tremendous storage needs. Likewise, with video surveillance, it’s really more about the number of cameras and the length of time you want to store the video. If you have a fairly large retail storefront with 10 cameras, you have a large storage need even though you may only need a handful of employees. All that said, as a broad rule of thumb, we see people looking to network storage solution starting at 20 or so employees.

Is storage, overall a concern for all businesses or just a certain type?
It’s something that affects all businesses very directly. When you think about the risk of “the server going down” and losing the hard drives inside it, does that give you a cold sweat? If so, then you haven’t thought through what a proper storage architecture should look like. The fact is, computer hardware fails, and you need to be ready for it. The more servers you have, the more people you have using it, the more likely you are to experience a failure. What happens when the failure occurs is dependent not the preparations you made ahead of time. In the worst case, the data is just gone forever, which is clearly not a pretty picture. In a slightly more sophisticated scenario, perhaps you have tape backups of your data. However, a surprising amount of the time, the tape backups don’t restore cleanly. When was the last time you tried a restore, and how confident are you it will work? How much data gets lost because the last backup didn’t occur immediately before the crash? And finally, how much time and productivity is lost while the restore process is underway? It can take days or even weeks to rebuilt a crashed infrastructure — think of how long it took to setup in the first place, and in a crashed/restore from backup scenario, you may be looking at rebuilding the infrastructure and THEN attempting to restore from backup. How much of your employees time is flushed down the drain while all this is taking place? Scary thoughts. What we advocate is an architecture that maintains data uptime, connectivity, and integrity, even if the hardware that data is on experiences a failure — because it’s not a matter of IF some component will fail, it’s a matter of when. With our TrueCluster technology, we embrace this fact such that you can replace the failed hardware without the end users ever realizing there was a hardware problem with the storage system in the first place.
For those with a basic file server, and find they need more storage, what are their options?
Often you are stuck with buying a new system, copying data to it (called data migration), and getting rid of the old one. That’s the most common scenario, and frankly, it’s a pain in the rear. This is compounded by the fact that when you buy that new system, you have to take a guess at how much storage you’re going to use in the future, and you will almost certainly guess low. Storage needs typically double every 12 months. That means if you guess low, you’re going to be buying another new server and migrating data again, sooner rather than later. If you buy too much storage, you are paying today’s prices for storage you may not use for another 2-3 years, which means you probably paid at least double what it would cost by the time you actually use it (storage costs typically drop by 30-50% per year). It’s a kind of “damned if you do, damned if you don’t” problem. Compounding this problem is the “what happens when that server fails” that I described above.
With the increased use of hosted applications, is the need for storage less?
Yes and no. If you have your email hosted with someone else, you don’t need to worry about having that storage internally. On the other hand, you will be paying monthly for storage if it’s hosted elsewhere, so for large archives of information, you need to consider the tradeoffs. You will often end up paying more for hosted storage over a year than it would cost to buy that storage internally, so it’s a matter of deciding what makes the most sense for your organization. Often times, you’ll be in a hybrid scenario using both hosted applications and things you run internally. Likewise, when thinking about large files such as videos, medical images, etc, an organization may find that the speed of the connection to the hosted provider is too slow for their needs. In this case, you’ll want internal storage solutions.
What makes your data storage solution more affordable than others
For the markets we’ve been talking about, this isn’t a great comparison because the XIV solution is not at all geared to those sized companies. More appropriately, you would want to look at solutions from Dell’s EqualLogic line and HP’s LeftHand line. These solutions will have entry price points of around $30k and per-usable-terabyte pricing of $5-6k for most real-world configurations. We come in right around half of that or better. Still, that’s a dramatic difference so your questions remains valid. The answer is that our development of the TrueCluster technology allows us to use an entirely different architecture than these competitors, and by means of this architecture, we are able to do a lot of things at the software level that they are forced to do with hardware. In turn, that drives down the cost significantly, as well as adds to the scalability and the ability to easily add large or small chunks over time. However, what TrueCluser does is eliminates all single points of failure in the architecture. When you have a single point of failure, you are forced to “throw money at the problem” in order to try and prevent that single point from actually failing. I liken it to the Space Shuttle: you can’t have just one of something critical on the Shuttle because you can’t have the mission fail because one component went belly-up. Therefore, instead of one circuit board that does XYZ, the Shuttle will have two (or three) that do the same thing. Of course, that costs 3x as much even though you don’t get any additional benefit other than having those backups. This is how most networked storage architectures work. You’ll read about multiple power supplies, multiple “controllers”, multiple NIC cards, all serving the purpose of recovering from equipment failure by brute force of money. Instead, TrueCluster is designed so that every element in your storage cluster, what we call nodes, are exactly like every other node. There is no “master” unit that controls all the network traffic or a “central database” that keeps track of where all the individual data blocks are located. With TrueCluster, it doesn’t matter if you lose node 1, or node 6, or node 27. Because all the nodes are the same, any still functioning node can “pick up the slack” while the failed node is replaced. By doing so, you don’t need all this hardware redundancy in the individual components that drive up price, because the entirety of the system is self-redundant by means of our technology — we didn’t need to buy all that extra hardware “to be safe” and therefore the customers don’t have to pay for it. While all this may sound rather complex, that complexity is something that all happens on the back end, without the need for active management by the users and administrators.
When shopping for storage what are some things to keep in mind?
I believe there are 4 key things to think about: Scalability, ease of use, recovery from failure (sometimes called “high availability”) and price. We’ve touched on all these things, and it is our belief that these are the elements critical for mid-market success in dealing with storage. Systems need the ability to grow as your data needs grow, and you want to do so without throwing old systems away or over-purchasing and over-paying. You need to be able to keep your business running even when those hard drives or other components fail, because they will. You need a system that can be managed by someone who is not a “storage specialist” but rather that is the same person that manages all facets of your technology infrastructure (as an aside, this is a problem where products that were designed for Fortune 500 customers fail in the mid-market, because the makeup of the IT department is completely different.) Finally, there’s the matter of price: what does it cost per terabyte, what does it cost to get started (this is why XIV is not a mid-market option), and what does it cost to grow from there?


About Ramon Ray

Ramon Ray, Marketing & Technology Evangelist, & Infusionsoft. Full bio at . Check him out on Google Plus, Twitter or Facebook