Jump to content

Data deduplication

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by DataProtection (talk | contribs) at 22:13, 20 April 2009. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Data deduplication essentially refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. Deduplication is able to reduce the required storage capacity since only the unique data is stored. For example, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, a 100 MB storage demand could be reduced to only 1 MB.

Benefits

In general, data deduplication improves data protection, increases the speed of service, and reduces costs.

  • The business benefits from data de-duplication start with increasing overall data integrity and end with reducing overall data protection costs. Data de-duplication lets users reduce the amount of disk they need for backup by 90 percent or more.
  • With reduced acquisition costs—and reduced power, space, and cooling requirements—disk becomes suitable for first stage backup and restore and for retention that can easily extend to months.
  • With data on disk, restore service levels are higher, media handling errors are reduced, and more recovery points are available on fast recovery media.
  • Data deduplication also reduces the data that must be sent across a WAN for remote backups, replication, and disaster recovery.
  • Data deduplication is a very valuable tool within the virtual environment as well, giving you the ability to deduplicate the VMDK files need for deployment of virtual environments.
  • Also having the ability to de duplicate snap shots files i.e. VMSN & VMSD in VMWare will give you considerable cost savings compared to the conventional disk backup environment whilst still giving you more recovery points for disaster recovery.

Major Commercial Players

There are plenty of vendors, because data deduplication is a very hot area these days, especially now that the VTL (virtual tape library) vendors are getting involved. There is Asigra, Avamar (acquired by EMC), Diligent Technologies (acquired by IBM), Data Domain, Druvaa Software, ExaGrid, FalconStor Software, NetApp, Quantum, Sepaton, Spectra Logic, Symantec Puredisk and Tandberg Data.

Quantum_Corp. has long been an innovator in data deduplication technology. They developed the pioneering patent for variable-length block data deduplication, and they provide data deduplication solutions that protect a full range of environments – from small distributed offices to the largest enterprise data centers. Quantum’s data deduplication technology is being used to protect more than 400 petabytes of data, has been tested and validated by key industry analysts, and has been adopted by storage leaders such as Dell and EMC


ExaGrid is another innovator using next generation byte-level deduplication that is pushing the bounds of deduplication performance and scalability in a disk backup appliance.

References