Jump to content

Data Version Control (software)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Conscious AI (talk | contribs) at 10:37, 6 October 2022 (Overview and references are added). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
DVC
Original author(s)Dmitry Petrov
Developer(s)Iterative.ai
Initial releaseMay 4, 2017; 5 years ago
Stable release
2.24.0 / September 19, 2022; 16 days ago
Repositorygithub.com/iterative/dvc
Available inPython
LicenseApache - 2.0
Websitehttps://dvc.org/

DVC is a free and open-source, platform-agnostic version system for data, ML models, and experiments.[1] It is designed to make ML models shareable, experiments reproducible, and to track versions of models, data, and pipelines.[2][3][4]

[5] DVC works on top of Git repositories and cloud storage.[6][7]

The first (beta) version of DVC (DVC 0.6) was launched in May 2017. In May 2020, DVC 1.0 was publicly released by Iterative.ai.[8] [9]

Overview

DVC is designed to incorporate the best practices of software development into Machine Learning workflows.[10] It does this by extending the traditional software tool Git by cloud storages for datasets and ML models.[11]

Specifically, DVC makes Machine Learning operations:   

  • Codified: it codifies datasets and models by storing pointers to the data files in cloud storages.[4]
  • Reproducible: it makes it easy for users to reproduce experiments, and rebuild datasets from raw data.[12][13] These features also allow to automate the construction of datasets, the training, evaluation, and deployment of ML models.[14]

References

  1. ^ Hewage Nipuni, Meedeniya Dulani (February 2022). "Machine Learning Operations: A Survey on MLOps Tool Support". ResearchGate.
  2. ^ Barrak Amine, Eghan Ellis E., Adams Bram (March 2021). "On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects". IEEE Xplore.
  3. ^ Wiggers, Kyle. "MLOps startup Iterative.ai nabs $20M". VentureBeat.
  4. ^ a b Ivancic, Kristijan. "Data Version Control With Python and DVC". Real Python.
  5. ^ "MLOps Company Iterative Achieves Significant Customer and Company Growth in 2021". Business Wire.
  6. ^ Hall, Susan. "Iterative.ai: Git-Based Machine Learning Tools for ML Engineers". The New Stack.
  7. ^ "What is DVC?". MLOps Guide.
  8. ^ Petrov, Dmitry. "DVC 3 Years and 1.0 Pre-release". Iterative.ai.
  9. ^ Anadiotis, George. "Streamlining data science with open source: Data version control and continuous machine learning". ZDNET.
  10. ^ Petrov, Dmitry. "The Road to AI Hell Starts with Good MLOps Intentions". The New Stack.
  11. ^ Lardinois, Frederic. "Iterative raises $20M for its MLOps platform". TechCrunch.
  12. ^ "AITech interview with Dmitry Petrov, Co-Founder & CEO at Iterative.ai". AI Tech Park.
  13. ^ "Data Versioning for CD4ML – Part 2". AI Singapore.
  14. ^ Baena, Daniel. "How to build an efficient Machine Learning project workflow using Data Version Control (DVC)". Rappi Tech.