Smart or Dumb Data Management
When You Back Up Your Data, Do You Think About it?

Backing up data, storing data and moving data, sound like simple things. You just do it. But if you have a lot of data to backup, store and move you’ve really got to think of the best way to do it. It’s like freight. Moving 1 basketball is not a problem but moving 10,000 basketballs from New York to LA is something you’ve really got to think about. Once you move them how will you store them. What if you need 50 basketballs one week or 3 the next and 500 the next week but will never use 4,000 until next year?

Bill Andrews, President and CEO of ExaGrid Systems, Inc. shares with us some insight in how to ensure your data backup and storage are done as efficiently as possible.
He shares insight about Single Instance Store, Archival Storage, Remote Site, Next Generation Backup Agent on the Application Servers and Disk-Based Backup – which is probably the most common type of backup for smaller businesses.
There is an increasing amount of confusion in the marketplace surrounding the different types of data reduction methods for backing up data. Customers are interested in reducing the amount of data they store for both primary storage and backup storage, yet they often find it difficult to understand the differences between the various vendors and the technologies available to store and reduce data. There are five areas where data reduction of any sort can be applied. Many customers find that they can “mix and match” the technologies to meet the unique demands of their businesses.
Single Instance Store
Single Instance Store has a play in primary storage. If a 2MB PowerPoint presentation is emailed to 50 employees and each employee stores it in his or her own file share, then 50 copies are stored in primary storage, accounting for 100MB. Single Instance Store keeps only one copy and replaces the other 49 copies with a stub file or pointer. By using Single Instance Store, redundant data is eliminated and storage consumption and cost are reduced. Microsoft has recognized the value of Single Instance Store and has incorporated it into Windows. Storage overhead can be reduced by 10% to 20%, with even greater savings achieved when combined with other data reduction methods.
Archival Storage
In most environments, a large percentage of primary data has not been accessed in years or is never accessed. Primary data that hasn’t been accessed for an extended period of time can and should be archived to low cost SATA storage or even to DVD or tape instead of expensive primary storage. By using archival storage, only active data resides on high cost storage; all non-active data is archived on a less expensive storage medium. In a case where an organization has 100TB of data, 50TB may be stored on primary storage with 50TB stored on low-cost SATA, DVD or tape, providing significant cost savings. In the archival world, redundant data is stored as it would be in primary storage, so organizations enable Single Instance Store on the archival data as well.
Remote Site
Many companies with multiple remote sites are now moving all data back to a central data center to be backed up. The problem is that this takes an enormous amount of WAN bandwidth between the remote site and the central data center. The key to making this method successful is to move the least amount of data possible. Reducing the amount of data requires utilizing a combination of methods. For example, companies can employ Single Instance Store so repetitive files can be eliminated at each remote site. Next, companies should employ a data reduction technology that moves only changes between backups. Finally, provide replication to move only bytes that change over the WAN and then schedule the replication at a time when WAN usage low. Using all of these methods allows for a minimum amount of data to be moved across the WAN to move the changes when the WAN is used the least.
Next Generation Backup Agent on the Application Servers
Some companies apply data reduction at the backup agent level. Data reduction at the application server reduces the amount of data over the network, resulting in a smaller backup window and less data stored on the backup disk. This approach, however, is time consuming, costly and risky since it requires the customer to remove their existing backup application and replace all of the agents and backup server with a new agent and backup server approach.
Disk-Based Backup
Since backup history is kept for years, potentially hundreds of copies of the same data are stored over and over again. When a backup occurs night after night, the data backed up is almost identical to the night before. As a result, many companies are deploying data reduction appliances to reduce the amount of data stored.
Typically only 2% of data changes between backups. Data reduction only stores those changes. If 1TB is backed up and the next day backed up again, only 20GB will have changed, so instead of storing 1TB twice, or a total of 2TB, you only store the first 1TB and the 20GB of changes. For example: ten backups of 1TB of data without data reduction would require 10TB of storage. With data reduction, only 1TB plus nine 20GB changes will be stored for a total of 1.18TB. In this example, the data reduction is 8.5 to 1.
To further increase the data reduction ratio, the last full backup can also be compressed, after the backup is completed to disk. Typical compression is 2 to 1, resulting in the last backup file being compressed to 500GB instead of 1TB. The result is 500GB of data compressed during the last backup and 180GB of changed data (9 additional backups at 20GB each) for a total of only 680GB. The result is 680GB of storage for ten full backups with last backup compression and data reduction versus 10TB of storage without data reduction. In this example the data reduction is 14.7 to 1.
Longer retention periods result in even greater storage savings. For 1TB of data with 20 full backups, 20TB of storage is required without data reduction but only 880GB of storage is required with data reduction. (500GB last back compressed plus 19 x 20GB changes). This results in a data reduction ratio of 22.7 to 1.
Last backup compression plus data reduction uses far less storage and can reduce the storage costs for backup by up to 75% over standard SATA drives without compression and data reduction.
It doesn’t matter if the primary storage ran single instance store and then archived the data, because whatever data remains still needs to be backed up. The remaining data is backed up over and over and over again. Also a history of how the data looked at different points in time is required for regulations, audit, financial purposes, legal discoveries and many other reasons.
All backup applications can write to a disk volume, a NAS share (network-attached storage) or to a tape library. Therefore, SATA disk can be placed behind a backup server as an iSCSI or NAS target. Market-leading products ship with a NAS interface, high quality SATA drives, last backup compression and data reduction in an easy to use, turnkey appliance. Most solutions with data reduction offer a single site system for onsite disk-backup while using tape for offsite, but two-site solutions used to eliminate offsite tape are becoming more popular.