Jump to content

Data contract

From Wikipedia, the free encyclopedia

In data management, a data contract is an agreement between data producers and data consumers.[1] It contains a detailed schema creating a link between business (logical representation of the data) and technology (its physical implementation). A data contract also describes advanced metadata, such as data quality rules, SLA, and behavior. Data contracts can take several forms, but YAML is very common.[2]

The Linux Foundation project Bitol has published a data contract standard called Open Data Contract Standard (ODCS).[3]. Its current version is 3.0.2.[4]

History

[edit]

In December 2021, Andrew Jones at GoCardless wrote about how they were using Data Contracts[5], and in October 2022 wrote about their implementation[6].

In August 2022, Jean-Georges Perrin published in the PayPal Technology Blog a popular reference article where he describes the use of data contracts in a Data Mesh implementation.[7] A little later, in May 2023, PayPal open-sourced its Data Contract Template.[8].

In June 2023, Andrew Jones published Driving Data Quality with Data Contracts: A comprehensive guide to building reliable, trusted, and effective data platforms[9], which is, up to now, the only published book on this topic.

In November 2023, Bitol, a Linux Foundation project, released the first version of ODCS (Open Data Contract Standard), a compatible fork from the PayPal template.[10]

In September 2024, Ronald Angel at Miro wrote about their implementation of data contracts[11].

In October 2024, Bitol released ODCS v3.0.0 with enhanced support for data quality.[12]

Implementation

[edit]

The Apache 2.0-based Bitol project divides data contracts into several sections[13]:

Fundamentals: This section contains general information about the contract, like name, domain, version, and much room for information.

Schema: This section describes the dataset and the schema of the data contract. The schema is a critical element of the contract, it is the support for data quality. A data contract focuses on a single dataset with several tables (and, obviously, columns).

Data quality: This category describes data quality rules & their parameters. They are tightly linked to the schema defined in the dataset & schema section.

Pricing: This section explains pricing if/when there is a need to bill customers for using this data product, whether the customer is internal or external.

Team: This important part lists stakeholders and the history of their relation with this data contract. It usually excludes consumers.

Roles: This section lists the roles that a consumer may need to access the dataset depending on the type of access they require.

Service-level agreement (SLA): This section describes the service-level agreements (SLA). Data. Data Quality and SLA are combined together as Data QoS.

Infrastructure: This section describes the servers and storage in potentially several environments, like production, development, test, and so on.

Business rules: This section describes the rules associated with the organization’s business rules.

Custom properties: This section covers custom & other properties in a data contract using a list of key/value pairs. This structure offers flexibility without requiring the creation of a new version of the standard whenever someone needs additional properties.

Product-oriented data engineering & management

[edit]

Data contracts are gaining popularity as Data Products are gaining traction.[14], [15]

Best practices

[edit]

Usually, a data contract is created by one data producer for one or many data consumers.

A data contract is designed to be enhanced iteratively. Data engineers can start with the few elements in the header and the schema. Over time, data engineers and owners can add more information, like data quality and SLA.

Most data contracts are implemented using a YAML file, which is both human -and computer-readable and language-agnostic.

The symbol for a data contract is either an equilateral triangle (rotated 90°) – symbolizing schema, business meaning, and SLAs[16] or a file icon.[17]

References

[edit]
  1. ^ Segner, Michael (8 December 2022). "Data Contracts – Everything You Need to Know". Retrieved 9 April 2025.
  2. ^ "Data Contract Specification" (MIT License). Innoq. Retrieved 9 April 2025.
  3. ^ "Open Data Contract Standard (ODCS)". GitHub. Retrieved 2025-03-18.
  4. ^ Flook, Peter. "Version 3.0.2". GitHub. Bitol. Retrieved 9 April 2025.
  5. ^ "Improving Data Quality with Data Contracts". Medium. GoCardless. Retrieved 4 April 2025.
  6. ^ "Implementing Data Contracts at GoCardless". Medium. GoCardless. Retrieved 4 April 2025.
  7. ^ Perrin, Jean-Georges (Aug 3, 2022). "The next generation of Data Platforms is the Data Mesh". The PayPal Technology Blof (Medium). PayPal. Retrieved 9 April 2025.
  8. ^ "Data Contract Template". GitHub. PayPal. Retrieved 18 March 2025.
  9. ^ Andrew, Jones (Jun 30, 2023). Driving Data Quality with Data Contracts (1 ed.). Packt. p. 206. ISBN 9781837635009. Retrieved 18 March 2025.
  10. ^ "Open Data Contract Standard". GitHub. Bitol. Retrieved 18 March 2025.
  11. ^ "Achieving Reliable Data Products: Insights from Metadata and Collaboration". Miro. Miro. Retrieved 4 April 2025.
  12. ^ "ODCS Version 3.0.0". GitHub. Bitol. Retrieved 18 March 2025.
  13. ^ Collective. "Open Data Contract Standard (ODCS)". Bitol. Retrieved 19 March 2025. This article contains content from this source licensed under Apache 2.0
  14. ^ Gioia, Andrea (Feb 9, 2025). "Data Contract vs. Data Product Specifications". Medium. Retrieved 9 April 2025.
  15. ^ Perrin, Jean-Georges (Feb 14, 2025). "Data Product vs. Data Contract: What's the Difference?". Data Mesh Learning (Medium). Data Mesh Learning (Medium). Retrieved 9 April 2025.
  16. ^ Perrin, Jean-Georges (Feb 14, 2025). "Data Product vs. Data Contract: What's the Difference?". Data Mesh Learning (Medium). Data Mesh Learning (Medium). Retrieved 9 April 2025.
  17. ^ "Data Contract Specification" (MIT License). Innoq. Retrieved 9 April 2025.