Data contract
This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.
In data management, a data contract is a link between data producers and data consumers. It also is a link between business (logical representation of the data) and technology (its physical implementation). A data contract also describes advanced metadata, such as data quality rules, SLA, and behavior.
History
In May 2023, PayPal open-sourced its Data Contract Template.[1].
In June 2023, Andrew Jones published Driving Data Quality with Data Contracts: A comprehensive guide to building reliable, trusted, and effective data platforms[2], which is, up to now, the only published book on this topic.
In November 2023, Bitol, a Linux Foundation project, released the first version of ODCS (Open Data Contract Standard), a compatible fork from the PayPal template.[3]
In October 2024, Bitol released ODCS v3.0.0 with enhanced support for data quality.[4]
Implementation
Data contracts are divided into several sections:
Fundamentals: This section contains general information about the contract, like name, domain, version, and much room for information.
Schema: This section describes the dataset and the schema of the data contract. It is the support for data quality, which I detail in the next section. A data contract focuses on a single dataset with several tables (and, obviously, columns).
Data quality: This category describes data quality rules & parameters. They are tightly linked to the schema defined in the dataset & schema section.
Pricing: This section explains pricing if/when you bill your customer for using this data product.
Team: This important part lists stakeholders and the history of their relation with this data contract.
Roles: This section lists the roles that a consumer may need to access the dataset depending on the type of access they require.
Service-level agreement (SLA): This section describes the service-level agreements (SLA). Data. Data Quality and SLA are combined together as Data QoS.
Infrastructure: This section describes the servers and storage in potentially several environments.
Business rules: This section describes the rules associated with the organization’s business rules.
Custom properties: This section covers custom & other properties in a data contract using a list of key/value pairs. This structure offers flexibility without requiring the creation of a new template version whenever someone needs additional properties.
Best Practices
Usually, a data contract is created by one data producer to one or many data consumers.
A data contract is designed to be enhanced iteratively. Data engineers can start by the few elements in the header and the schema. Data engineers and owners can add more information, like data quality and SLA, over time.
Most data contracts are implemented using a YAML file, which is both human -and computer-readable, as well as language-agnostic.
References
- ^ "Data Contract Template". GitHub. PayPal. Retrieved 18 March 2025.
- ^ Andrew, Jones (Jun 30, 2023). Driving Data Quality with Data Contracts (1 ed.). Packt. p. 206. ISBN 9781837635009. Retrieved 18 March 2025.
- ^ "Open Data Contract Standard". GitHub. Bitol. Retrieved 18 March 2025.
- ^ "ODCS Version 3.0.0". GitHub. Bitol. Retrieved 18 March 2025.