Data deduplication
![]() | This article's tone or style may not reflect the encyclopedic tone used on Wikipedia. (April 2008) |
Data deduplication essentially refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. Deduplication is able to reduce the required storage capacity since only the unique data is stored. For example, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, a 100 MB storage demand could be reduced to only 1 MB.
Benefits
In general, data deduplication improves data protection, increases the speed of service, and reduces costs.
- The business benefits from data de-duplication start with increasing overall data integrity and end with reducing overall data protection costs. Data de-duplication lets users reduce the amount of disk they need for backup by 90 percent or more.
- With reduced acquisition costs—and reduced power, space, and cooling requirements—disk becomes suitable for first stage backup and restore and for retention that can easily extend to months.
- With data on disk, restore service levels are higher, media handling errors are reduced, and more recovery points are available on fast recovery media.
- Data deduplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.
- Data deduplication is a very valuable tool within the virtual environment as well, giving you the ability to deduplicate the VMDK files need for deployment of virtual environments.
- Also having the ability to de duplicate snap shots files i.e. VMSN & VMSD in VMWare will give you considerable cost savings compared to the conventional disk backup environment whilst still giving you more recovery points for disaster recovery.
Major Commercial Players
There are plenty of vendors, because data deduplication is a very hot area these days, especially now that the VTL (virtual tape library) vendors are getting involved. There is Asigra, Avamar (acquired by EMC), Diligent Technologies (acquired by IBM), Data Domain, Druvaa Software, ExaGrid, FalconStor Software, NetApp, Quantum, Sepaton, Spectra Logic, Symantec Puredisk and Tandberg Data.
Quantum_Corp. has long been an innovator in data deduplication technology. They developed the pioneering patent for variable-length block data deduplication, and they provide data deduplication solutions that protect a full range of environments – from small distributed offices to the largest enterprise data centers. Quantum’s data deduplication technology is being used to protect more than 400 petabytes of data, has been tested and validated by key industry analysts, and has been adopted by storage leaders such as Dell and EMC
ExaGrid is another innovator using next generation byte-level deduplication that is pushing the bounds of deduplication performance and scalability in a disk backup appliance.
References
- Tandberg Data RDX from Craystone of Bolton www.tandbergdata.co.uk
- Biggar, Heidi(2007.12.11). WebCast: The Data Deduplication Effect
- CIMdata (2007-12-27). "Why do Enterprises need Data Deduplication". Press release.
- Using Latent Semantic Indexing for Data Deduplication.
- http://www.eweek.com/c/a/Knowledge-Center/What-Is-the-Difference-Between-Data-Deduplication-File-Deduplication-and-Data-Compression/
- ExaGrid Deduplication
- FalconStor File-interface Deduplication System
- Quantum Quantum De Duplication
- Spectra Logic Spectra DeDuplication