Jump to content

Continuous data protection

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by DovidBenAvraham (talk | contribs) at 06:28, 3 July 2019 (Enhance with text and references previously added by me to Backup#CDP). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Continuous data protection (CDP), also called continuous backup or real-time backup, refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user saves. In its true form it allows the user or administrator to restore data to any point in time.[1] The technique was patented by British entrepreneur Pete Malcolm in 1989 as "a backup system in which a copy [editor's emphasis] of every change made to a storage medium is recorded as the change occurs [editor's emphasis]"[2] .

CDP runs as a service that captures changes to data to a separate storage location. There are multiple methods for capturing continuous live data changes involving different technologies that serve different needs. CDP-based solutions can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, mail boxes, messages, and database files and logs.[3]

There is a briefer sub-sub-section in the "Backup" article about this, now renamed to "CDP" to avoid confusion.

Differences from traditional backup

True continuous data protection is different from traditional backup in that it is not necessary to specify the point in time to recover from until ready to restore.[4] Traditional backups only restore data from the time the backup was made. True continuous data protection, in contrast to "snapshots", has no backup schedules.[4] When data is written to disk, it is also asynchronously written to a second location, usually another computer over the network.[5] This introduces some overhead to disk-write operations but eliminates the need for scheduled backups.

Continuous vs near continuous

Because true CDP "backup write operations are executed at the level of the basic input/output system (BIOS) of the microcomputer in such a manner that normal use of the computer is unaffected",[2] true CDP backup must in practice be run in conjunction with a virtual machine[5][6][7]—which rules it out for ordinary personal backup applications. It is therefore discussed in the "Enterprise client-server backup" article, rather than in the "Backup" article.

Some solutions marketed as continuous data protection may only allow restores at fixed intervals such as 15 minutes or one hour or 24 hours, because they automatically take incremental backups at those intervals. Such "near-CDP" schemes are not universally recognized as true continuous data protection, as they do not provide the ability to restore to any point in time. When the interval is shorter than one hour, these "snapshot" solutions are typically based on periodic snapshots, an example of which is Arq Backup,[8] disk-based backup software that periodically creates restore points using a snapshot.[9] "Near-CDP" solutions use snapshots because "to avoid downtime, high-availability systems may instead perform the backup on ... a read-only copy of the data set frozen at a point in time—and allow applications to continue writing to their data".

There is debate in the industry as to whether the granularity of backup must be "every write" to be CDP, or whether a "near-CDP" solution that captures the data every few minutes is good enough. The latter is sometimes called near continuous backup. The debate hinges on the use of the term continuous: whether only the backup process must be continuously automatically scheduled, which is often sufficient to achieve the benefits cited above, or whether the ability to restore from the backup also must be continuous. The Storage Networking Industry Association (SNIA) uses the "every write" definition.[4]

Differences from RAID, replication or mirroring

Continuous data protection differs from RAID, replication, or mirroring in that these technologies only protect one copy of the data (the most recent). If data becomes corrupted in a way that is not immediately detected, these technologies simply protect the corrupted data with no way to restore an uncorrupted version.[10]

Continuous data protection protects against some effects of data corruption by allowing restoration of a previous, uncorrupted version of the data. Transactions that took place between the corrupting event and the restoration are lost, however. They could be recovered through other means, such as journaling.

Backup disk size

In some situations, continuous data protection requires less space on backup media (usually disk) than traditional backup. Most continuous data protection solutions save byte or block-level differences rather than file-level differences. This means that if one byte of a 100 GB file is modified, only the changed byte or block is backed up. Traditional incremental and differential backups make copies of entire files; however starting around 2013 enterprise client-server backup applications have implemented a capability for block-level incremental backup, designed for large files such as databases.

Risks and disadvantages

The protection afforded by continuous data protection is often heralded without consideration of the disadvantages and challenges that it can present. Specifically, the continuous bandwidth usage can adversely affect network performance, especially in operations where file sizes are large, such as multimedia and CAD design environments. To mitigate this risk, companies employ throttling techniques that prioritize network traffic to reduce the impact of backup on day-to-day operation.[11]

See also

References

  1. ^ Behtash, Behzad (2010-05-10). "Why Continuous Data Protection's Getting More Practical". Disaster recovery/business continuity. Informationweek. Retrieved 2011-11-12.
  2. ^ a b Peter B. Malcolm (13 November 1989). "US Patent 5086502: Method of operating a data processing system". Google Patents. Retrieved 29 November 2016. Filing date Nov 13, 1989 ... a backup system in which a copy of every change made to a storage medium is recorded as the change occurs ... backup write operations are executed at the level of the basic input/output system (BIOS) ...
  3. ^ "An Overview of Continuous Data Protection". Infosectoday.com. Retrieved 2011-11-12.
  4. ^ a b c "Data Protection Best Practices" (PDF). SNIA. Storage Networking Industry Association. 23 October 2017. 2.1.4 Continuous Data Protection (CDP). Retrieved 27 June 2019. ...pros to the use of snapshots:[new paragraph]Allows for the recovery of files from a specific point in time (based on snapshot schedule)....CDP can provide the ability to restore to any previous point in time, since the backups are taking place near-instantaneously; therefore, the potential for data loss is very small.
  5. ^ a b Wu, Victor (4 March 2017). "EMC RecoverPoint for Virtual Machine Overview". Victor Virtual. WuChiKin. Retrieved 22 June 2019. The splitter splits out the Write IOs to the VMDK/RDM of a VM and sends a copy to the production VMDK and also to the RecoverPoint for VMs cluster.
  6. ^ "System Requirements". R1Soft. 19 September 2018. Backup Agent – Windows tab, Backup Agent – Linux tab. Retrieved 29 June 2019.
  7. ^ van Doorn, Gijsbert Janssen (March 2019). "Future of Backup: From Periodic to Continuous" (PDF). Zerto.com. Zerto. Inconsistent recovery. Retrieved 2 July 2019. In today's IT environment, applications do not reside on a single virtual machine (VM), but instead are spread across different VMs with different roles.
  8. ^ Reitshamer, Stefan (5 July 2017). "Troubleshooting backing up open/locked files on Windows". Arq Blog. Haystack Software LLC. Retrieved 25 June 2019. Arq uses Windows Volume Shadow Copy Service (VSS) to back up files that are open/locked. [Reitshamer is the principal developer of Arq Backup]
  9. ^ "Continuous data protection (CDP) explained: True CDP vs near-CDP". ComputerWeekly.com. TechTarget. July 2010. Retrieved 22 June 2019. ... copies data from a source to a target. True CDP does this every time a change is made, while so-called near-CDP does this at pre-set time intervals. Near-CDP is effectively the same as snapshotting....True CDP systems record every write and copy them to the target where all changes are stored in a log.
  10. ^ Mayer, Alex (6 November 2017). "Backup Types Explained: Full, Incremental, Differential, Synthetic, and Forever-Incremental". Nakivo Blog. Nakivo. Full Backup, Incremental Backup, Differential Backup, Mirror Backup, Reverse Incremental Backup, Continuous Data Protection (CDP), Synthetic Full Backup, Forever-Incremental Backup. Retrieved 17 May 2019.
  11. ^ Off-Site Backup - The Bandwidth Hog Archived 2011-07-07 at the Wayback Machine