Jump to content

Data vault modeling

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Dlinstedt (talk | contribs) at 13:56, 10 September 2008 (Rewrote page to meet guidelines, NO LINKS to web-sites, even though Ralph Kimball and Bill Inmon receive their links, apparently I do not.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Data Vault Modeling

Data Vault Modeling is defined as a hybrid data modeling approach for enterprise data warehousing. Data Vault's real name is: "Common Foundational Integration Modeling Architecture." The Data Vault Model has no relationship and no bearing on security devices to which Oracle stake's it's claims. The Data Vault model is based on repeatable design, consistent and redundant or fault tolerant capabilities for ensuring MPP scalability of data sets.

Technical Definition: The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise.

The Data Vault project approach is based on SEI/CMMI, PMP, Six Sigma, TQM, and Function Point Analysis components where risk management, repeatability, measurability, adaptability, and flexibility are ciritical to success. The Data Vault approach covers all the project based components and is comprised of a number of components including: project plan, data break down structure, work break down structure, organization break down structure, and process break down structure. Each of these components are measured for build time, estimated for risk of completion.

The Data Vault Model is made up of three basic Entities (See ERD): Hubs = Unique List of Business keys (defined by same semantic grain) Links = Associations, or transactions across those business keys - this is where the flexibility takes place Satellites = Providing context to the Hubs and Links across the model. Satellites are descriptive data over time, this is where the Data Warehousing component lives.

The Data Model is patterned off a simplistic view of neurons, dendrites, and synapses - where neurons are associated with Hubs and Hub Satellites, Links are dendrites (vectors of information), and other Links are synapses (vectors in the opposite direction). By utilizing a data mining set of algorithms links can be scored with confidence and strength ratings. They can also be created and dropped on the fly in accordance with "learning" about relationships that currently don't exist. In other words the model can be automatically morphed, adapted, and adjusted as it is used and fed new structures. This is what we call true dynamic data warehousing (Automatic adaptation of structure over time).

Dan Linstedt is the author, creator, and inventor of the Data Vault. We will not put any further links at the bottom of this page because it is seen as advertisement by the maintainers of Wikipedia. The Data Vault Modeling is PUBLIC DOMAIN and has been freely available on a number of articles and forums since 2000. The Data Vault is based on 10+ years of research and design and was originally concepted in 1990, and released in 2000 with articles on The Data Administration Newsletter.