Jump to content

Metadata repository

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Bgwhite (talk | contribs) at 07:37, 23 May 2014 (WP:CHECKWIKI error fix #94. Stray ref tag. Do general fixes and cleanup if needed. - using AWB). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A Metadata repository is a database cvbcvcreated to store metadata. Metadata itself is information about the structures that contain the actual data. Metadata is often said to be "data about data", but this is misleading. Data profiles are an example of actual "data about data". Metadata is one layer of abstraction removed from this - it is data about the structures that contain data. Metadata may describe the structure of any data, of any subject, stored in any format.

A well-designed metadata repository typically contains data far beyond simple definitions of the various data structures. Typical repositories store dozens to hundreds of separate pieces of information about each data structure.

Comparing the metadata of a couple data items - one digital and one physical - will help us understand what metadata really is:

First, digital: for data stored in a database we may have a table called "Patient" with many columns, each containing data which describes a different attribute of each patient. One of these columns may be named "Patient_Last_Name". What is some of the metadata about the column that contains the actual surnames of patients in the database? We have already used two items: the name of the column that contains the data (Patient_Last_Name) and the name of the table that contains the column (Patient). Other metadata might include the maximum length of last name that may be entered, whether or not last name is required (can we have a patient without Patient_Last_Name?), and whether the database converts any surnames entered in lower case to upper case. Metadata of a security nature may show the restrictions which limit who may view these names.

Second, physical: for data stored in a brick and mortar library, we have many volumes and may have various media, including books. Metadata about books would include ISBN, Binding_Type, Page_Count, Author, etc. Within Binding_Type, metadata would include possible bindings, material, etc.

This contextual information of business data include meaning and content, policies that govern, technical attributes, specifications that transform, and programs that manipulate.[1]: 171 

Definition

The metadata repository is responsible for physically storing and cataloging metadata. The metadata that is stored should be generic, integrated, current, and historical. Generic for a metadata repository means that the meta model should store the metadata by generic terms instead of storing it by an applications-specific defined way, so that if your data base standard changes from one product to another the physical meta model of the metadata repository would not need to change. Integration of the metadata repository allows all entities of the enterprise business to view all metadata subject areas. The metadata repository should also be designed so that current and historical metadata both can be accessed.[2] Metadata repositories used to be referred to as a data dictionary.[1]: 239 

Repository vs. registry

A metadata repository is similar to a metadata registry in that it also only stores metadata. A metadata repository is different from a metadata registry in that a repository provides response times suitable for browsing and reporting, whereas a registry provides response times suitable for service virtualization.[3]

Reason for use

Each database management system (DBMS) and database tools have their own language for the metadata components within. Database applications already have their own repositories or registries that are expected to provide all of the necessary functionality to access the data stored within. Vendors do not want other companies to be capable of easily migrating data away from their products and into competitors products, so they are proprietary with the way they handle metadata. CASE tools, DBMS dictionaries, ETL tools, data-cleansing tools, OLAP tools, and data mining tools all handle and store metadata differently. Only a metadata repository can be designed to store the metadata components from all of these tools.[4]

Design

Metadata repositories should store metadata in four classifications: ownership, descriptive characteristics, rules and policies, and physical characteristics. Ownership, showing the data owner and the application owner. The descriptive characteristics, define the names, types and lengths, and definitions describing business data or business processes. Rules and policies, will define security, data cleanliness, timelines for data, and relationships. Physical characteristics define the origin or source, and physical location.[1]: 176  Like building a logical data model for creating a database, a logical meta model can help identify the metadata requirements for business data.[1]: 185  The metadata repository will be centralized, decentralized, or distributed. A centralized design means that there is one database for the metadata repository that stores metadata for all applications business wide. A centralized metadata repository has the same advantages and disadvantages of a centralized database. Easier to manage because all the data is in one database, but the disadvantage is that bottlenecks may occur.

A decentralized metadata repository stores metadata in multiple databases, either separated by location and or departments of the business. This makes management of the repository more involved than a centralized metadata repository, but the advantage is that the metadata can be broken down into individual departments.

A distributed metadata repository uses a decentralized method, but unlike a decentralized metadata repository the metadata remains in its original application. An XML gateway is created[1]: 246  that acts as a directory for accessing the metadata within each different application. The advantages and disadvantages for a distributed metadata repository mirror that of a distributed database.

Entity-Relationship/Object-Oriented

Metadata repositories can be designed as either an Entity-relationship model, or an Object-oriented design.

See also

References

  1. ^ a b c d e Moss, L. T.; Atre, S. (2003). Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley Professional. ISBN 0-201-78420-3.
  2. ^ Marco, D.,; Jennings, M. (2004). Universal Metadata Models. Wiley. pp. 36–43. ISBN 0-471-08177-9.{{cite book}}: CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)
  3. ^ Thompson, Jess (9 November 2007). "Q&A: What Is a Registry/Repository, and Who Should Consider One?" (PDF). Gartner. p. 5.
  4. ^ Marco, D. (2000). Building and Managing the Metadata Repository: A Full Lifecycle Guide. Wiley. ISBN 978-0471355236.