Jump to content

Data Version Control (software)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Conscious AI (talk | contribs) at 10:32, 5 October 2022 (Description, info box and overview were added.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)
DVC
Original author(s)Dmitry Petrov
Developer(s)Iterative.ai
Initial releaseMay 4, 2017; 5 years ago
Stable release
2.24.0 / September 19, 2022; 16 days ago
Repositorygithub.com/iterative/dvc
Available inPython
LicenseApache - 2.0
Websitehttps://dvc.org/

DVC is a free and open-source, platform-agnostic version system for data, ML models, and experiments.[1] It is designed to make ML models shareable, experiments reproducible, and to track versions of models, data, and pipelines.[2][3][4]

[5] DVC works on top of Git repositories and cloud storage.[6][7]

The first (beta) version of DVC (DVC 0.6) was launched in May 2017. In May 2020, DVC 1.0 was publicly released by Iterative.ai.[8] [9]

Overview

DVC is designed to incorporate the best practices of software development into Machine Learning workflows. It does this by extending the traditional software tool Git by cloud storages for datasets and ML models.

Specifically, DVC makes Machine Learning operations:   

  • Codified: it codifies datasets and models by storing pointers to the data files in cloud storages.
  • Reproducible: it makes it easy for users to reproduce experiments, and rebuild datasets from raw data. These features also allow to automate the construction of datasets, the training, evaluation, and deployment of ML models.

References

  1. ^ Hewage Nipuni, Meedeniya Dulani (February 2022). "Machine Learning Operations: A Survey on MLOps Tool Support". ResearchGate.
  2. ^ Barrak Amine, Eghan Ellis E., Adams Bram (March 2021). "On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects". IEEE Xplore.
  3. ^ Wiggers, Kyle. "MLOps startup Iterative.ai nabs $20M". VentureBeat.
  4. ^ Ivancic, Kristijan. "Data Version Control With Python and DVC". Real Python.
  5. ^ "MLOps Company Iterative Achieves Significant Customer and Company Growth in 2021". Business Wire.
  6. ^ Hall, Susan. "Iterative.ai: Git-Based Machine Learning Tools for ML Engineers". The New Stack.
  7. ^ "What is DVC?". MLOps Guide.
  8. ^ Petrov, Dmitry. "DVC 3 Years and 1.0 Pre-release". Iterative.ai.
  9. ^ Anadiotis, George. "Streamlining data science with open source: Data version control and continuous machine learning". ZDNET.