User:Danimlucht/sandbox
Metadata management <also known as metadata strategy> is a term used to describe the necessary steps taken to effectively and accurately manage metadata within a targeted environment.
There are many unique advantages to implementing a metadata strategy:
Advantages
[edit]According to EW Solutions, those advantages are:
- Offer recognition of value of data and its components
- Develop a map for managing expanding information requirements
- Highlight the importance of enterprise data management
- Address data quality, data integrity, and data reuse
- Enable strategic information to be consistently and accurately derived from operational data
- Improve productivity through component development, management and reuse
- Reduce time necessary for software development cycle
- Share information with customers and business partners
- An enterprise perspective of information resources and impact analysis provides competitive advantage
Goal
[edit]In essence, managing metadata will promote cooperation between responsible parties, allow for easier accessibility and efficiency in searching for data, and promote accuracy when creating and cataloging databases/database fields. Implementing a metadata management protocol will also provide common structure and naming standards which can assist in enterprise data management.
Barriers
[edit]Implementing a metadata strategy is not always an easy "get". There are many barriers to convince executive of the importance and overall necessity of a project of this magnitude.
- Resources - In an enterprise environment, the sheer amount of metadata already present (and likely unaccounted for) could signify an enormous resource shortage - identifying and cataloging the existing metadata could span an extended time period.
- Investment - Convincing executives and sponsors of the importance metadata represents is always challenging. New project which don't have any immediate visible or financial advantages (and, in fact, will cost a substantial amount to initiate) are not high on prioritization lists.
- Necessity - "But we've always done it this way..." is a standard reaction to adopting new procedures. When naming conventions or standard procedures are verbally passed on in an enterprise environment, there is always the chance (and likelihood) of instructions changing over time.
Lifecycle
[edit]In Step I - the incumbent metadata manager does extensive research into current metadata practices, procedures, and case studies in order to formulate a comprehensive implementation plan for future metadata needs.
- Researching metadata needs
This step is the initial scope assessment. During this phase, the investigation will look into the current metadata procedures and collections, what the ultimate goal in compiling metadata is, and which participants will be involved in the project. In order to acquire this information, interview and questioning techniques should be utilized. "The purpose of the interview is to acquire preliminary information and contacts, and also to establish a better understanding between metadata staff and content providers regarding the project background information. In the interviews, contact information, metadata schedule, metadata scope, legacy system, metadata context and metadata role and function have to be included and clarified.[5]"
- Review current standards
This step involves the review of any previously implemented metadata procedures for conformity, standards, and accessibility.
- Further Investigation
Database requirements are analyzed and case studies are created using live data and experimental metadata.
- Strategy
Once case studies are complete, metadata standards and strategy need to be established. Schemas, systems components, and staffing are all decided upon.
In Step II, the metadata research is compiled and creation begins.
- Determine characteristics
This step involves determining which metadata characteristics and specifications will be necessary. Creation date, responsible user, physical location, and value are just some of the potential fields.
- Automate
Many of the characteristics can be automated via the tools used to develop the metadata itself. Creation dates, usernames, last accessed date/time, and last known updates are just some of the fields that could be automated in specific circumstances.
In Step III, the methods of maintaining metadata need to be determined. For simple and unchanging metadata, a stable environment is necessary. The fewer the changes, the more limited the maintenance will be. However, with complicated and ever-changing metadata, the maintenance will need to be just as fluid. It will require more intervention and handling than the stable metadata will.
In Step IV, the foreseen updates to metadata need to be addressed. These updates can include changes in the organization or in the overall metadata procedures. They can also relate to changing data or technological updates in systems or in standards. When updating systems, special consideration needs to be taken in transitioning the metadata repository.
In Step V, metadata storage schemas are chosen and implemented. This can include relational databases or RDF triplestores.
"In many cases, metadata must survive even after the deletion of the data it describes.[4]" Once data has been removed from the database and working environment, the metadata must then describe the reasons and methods of deletion for a set amount of time following that deletion. This is done to remove any doubt as to whether the data has been removed mistakenly or was an intentional action and to alert the users that the data has been removed in a timely manner.
Listed below are various items which have been implemented to further along the metadata technological standards in recent years (or, in some respects, older processes having been expanded upon to create newer standards).
As one of the older metadata standards on this list, the Library of Congress Subject Headings has been in place and utilized for over 115 years. Utilizing a unique naming and categorizing format, the LCSH identified metadata before the term even existed. Because of the extensive amount of data captured and the easy searchability, many newer standards have garnered structure and ideas from the LCSH.
FAST (or Faceted Application of Subject Terminology)"is an enumerative, faceted subject heading schema derived from the Library of Congress Subject Headings (LCSH)[7]."
The International Standard Name Identifier's mission "is to assign to the public name(s) of a researcher, inventor, writer, artist, performer, publisher, etc. a persistent unique identifying number in order to resolve the problem of name ambiguity in search and discovery[8]." While many things can be categorized and labeled to illustrate differences (ISBNs illustrating differences in books even if the title is the same, Genre indicating a difference even if the title is the same), names are harder to categorize. If you are searching for John Smith, likely there are multiple people who will fit that name. The ISNI hopes to categorize names by assigning a unique identifier so as to make searching easier and faster. Without this identifying schema, you would be forced to rely on other, more general metadata terms ("John Smith musician", "John Smith writer", "John Smith explorer"). Simplifying the search method to "John Smith 123456" would reduce the amount of results populating the search screen while also increasing the speed and accuracy of data.
ONIX for books is a method for classifying all books (electronic and otherwise) and book related products. An XML-based standard, ONIX utilizes the following "sub-identifiers" and defines them as:
- ISBN - International Standard Book Number : is a unique international identifier for monographic publications.
- ISTC - International Standard Text Code : is a numbering system supporting the unique identification of textual works.
- DOI - Digital Object Identifier : provides a system to support persistent identification of content objects and related entities in digital networks.
- ISNI - International Standard Name Identifier : ISNI will enable the public identities of parties involved in media content industries to be uniquely identified so that they can be clearly disambiguated where otherwise there might be doubt.
- ISSN - International Standard Serial Number : (ISO 3297) identifies serial publications and other continuing resources, whether available in print or digital formats.
- ISMN - International Standard Music Number : (ISO 10957) identifies publications consisting of musical notation whether for sale, hire or gratis.
- RFID - Radio Frequency ID technology : The RFID tag can be inserted in a book as part of the binding process or attached as part of a label and will uniquely identify each copy of a book.
XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. In this sense, it can be incredibly powerful in allowing for easier and more efficient metadata searches. Realistically, a machine is incapable of knowing what metadata is or interpreting metadata. XML allows for machine-readable data (and metadata) eliminating the necessity of a human "interpretation" element.
Resource Description Framework is "the standard for encoding metadata and other knowledge on the Semantic Web...What is meant by “semantic” in the Semantic Web is not that computers are going to understand the meaning of anything, but that the logical pieces of meaning can be mechanically manipulated by a machine to useful ends[11]." As Joshua Tauberer states, RDF enables translating and manipulating data into knowledge, which machines have little ability to deal with. Using a subject-predicate-object layout, RDF is a reliable and often used standard in web metadata.
The Virtual International Authority File is a compilation of many nations' categorical name files. Instead of country's maintaining separate authority files for individual items, they would be linked and accessible through the VIAF.
FRBR is a conceptual entity-relationship model developed by the International Federation of Library Associations and Institutions (IFLA) that relates user tasks of retrieval and access in online library catalogues and bibliographic databases from a user’s perspective. It represents a more holistic approach to retrieval and access as the relationships between the entities provide links to navigate through the hierarchy of relationships. The model is significant because it is separate from specific cataloguing standards such as AACR2 or International Standard Bibliographic Description (ISBD).
According to the Dublin Core Metadata Initiative, "DCMI supports shared innovation in metadata design and best practices across a broad range of purposes and business models." DCMI is especially prominent in metadata standard development due to their metadata vocabulary terms: See Dublin Core Elements.
Challenges/ Business Problems
[edit]- Gaining interest and investment from executive team/project sponsors
- Consolidating resources
- Planning a budget and time-frame
- Cataloging existing data and metadata sources
Goals and Objectives
[edit]- Standardize naming conventions and other non-standard protocol
- Provide descriptive information about data including responsible party, characteristics, physical location, date of creation, and intended use
- Organize and consolidate existing stores of data
Usage Plans
[edit]- Establish a responsible party for maintaining metadata
- Catalog the use of data as part of the metadata analysis
Sources
[edit]- Identify potential sources
- Existing database records
- Paper documentation/files
- Standard protocol
- Naming conventions
- Verbal instructions
- Establish accuracy of existing sources
- Create new repository for existing sources
Quality
[edit]- Establish protocol for creating new metadata
- Create entry/update/delete standards for dealing with existing metadata
- Secure metadata repository against alterations or deletion from unauthorized users
- Document and catalog metadata details
- Determine if the metadata storage will be active or passive
- Active metadata is ever changing and very fluid. When the data changes, the metadata automatically changes with it.
- Passive metadata is stable. If data changes, passive metadata requires action in order to change with it (either by human intervention or an automated process needing to be "fired").
- Identify where metadata will be stored
- Federated Metadata - Unrelated metadata is stored in differing locations and can be accessed through an intermediary.
- Distributed Metadata- Related and similar metadata is stored in multiple locations, sometimes overlapping and is not accessed through an intermediary.
- Determine system requirements
Measurement
[edit]- Create "follow-up" procedures
- Measure the use of metadata
- Report on effectiveness of metadata repository compared to previous storage procedures
Business Intelligence Positions
[edit]After a metadata management plan has been implemented, the BI environment should include a few professionals designated to create, update, and manage the metadata repository and components thereof. Two of the more important and advanced positions handling metadata (among other types of data) are listed below.
A data steward is, as it would seem by the name, someone who is in charge of and oversees specific data fields. The responsibility of the steward is to create, update, manage, and maintain data within the databases. While the custodian is available to oversee the structural components of databases, the steward is more interested in the operational components (the fields, definitions, values, and users). A steward should be the front line advocating for the accurate and appropriate usage of the data. Due to this requirement, stewards are normally the people working with software development and IT personnel to create the database, import the data, cleanse it, and oversee the daily usage of it. While it may seem as though the steward is in charge of the minute details of the database and the custodian is in charge of the big picture, there is a considerable amount of collaboration between these positions in order to implement new operational strategies and security measures.
In more specific metadata terms, the data steward is the person responsible for maintaining the metadata fields and values once decided upon. This would be the person who would ultimately make any database changes specific to a field he or she monitors.
A data custodian can be seen as a gateway to the structural components of the data itself. While this person does not handle the creation, maintenance, or ultimate deletion (or archiving) of data, he or she is in charge of the environment surrounding the data. This person would deal with the network permissions, security, access or denial of access, server upgrades/maintenance and administrative controls of the databases. Even though this position does not handle the minute details of the data itself, it is generally required that the incumbent be able to define and organize the database contents for accessibility. It is also the responsibility of the data custodian to understand how each database (and the data therein) is populated and used operationally so as to be able to answer questions and provide solutions to data driven needs.
The custodian can be seen as a DBA(database administrator) - a high level position with experience in network systems as well as in data management. In terms of hierarchy, the custodian would be the manager in charge of any data stewards. This leaves the custodian better able to handle the storage and security aspects while the stewards are charged with monitoring content and usage.
In terms of metadata, the custodian is the person responsible for the creation and maintenance of the metadata repository and any security or network permissions involved. The data steward would handle the actual metadata fields while the custodian would handle the structural components of the repository.
Available Software
[edit]- ASG Software Solutions
- Informatica Metadata Manager
- ER/Studio
- MetadataPortal Metadata Management
- InfoLibrarian Metadata Management Software Products
- Metadigger Free Metadata Management Software
- InfoSphere Metadata Workbench
See Also
[edit]- Metadata standards
- Metadata
- Metadata registry
- Resource Description Framework
- Functional Requirements for Bibliographic Records
- Library of Congress Subject Headings
- Dublin Core
- Data element
- Data steward
- Data custodian
References
[edit]- ^ a b Smith, Anne Marie. "Turning Data into Knowledge: Creating and Implementing a Meta Data Strategy" (PDF). EW Solutions. Retrieved 19 January 2014.
- ^ a b Rowlands, Ian. "Strategic and Tactical Issues In Metadata Management" (PDF). ASG Software Solutions. Retrieved 19 January 2014.
- ^ Jareo, Ben. "Best Practices Metadata" (PDF). Informatica World. Retrieved 19 January 2014.
- ^ a b c d e f g "Introduction to metadata management". PWC. Retrieved 19 January 2014.
- ^ "Metadata Lifecycle Model". MAAT. Retrieved 19 January 2014.
- ^ "Library of Congress Subject Headings". Library of Congress. Retrieved 26 January 2014.
- ^ a b "OCLC releases FAST (Faceted Application of Subject Terminology) as Linked Data". OCLC. Retrieved 19 January 2014.
- ^ a b "ISNI". ISNI. Retrieved 26 January 2014.
- ^ "ONIX". Editeur. Retrieved 26 January 2014.
- ^ "What is XML?". W3Schools. Retrieved 26 January 2014.
- ^ a b Tauberer, Joshua. "What is RDF and what is it good for?". Retrieved 19 January 2014.
- ^ "VIAF: The Virtual International Authority File". VIAF. Retrieved 19 January 2014.
- ^ OCLC. "OCLC Research Activities and IFLA's Functional Requirements for Bibliographic Records". Retrieved 19 January 2014.
- ^ "Dublin Core Metadata Element Set, Version 1.1". Dublin Core Metadata Initiative. DCMI. Retrieved 19 January 2014.
- ^ Smith, Anne Marie. "How to implement a meta data strategy". The Data Administration Newsletter. TDAN. Retrieved 19 January 2014.
- ^ Agnew, Grace (2005). "Developing a metadata strategy: A road map" (PDF). Journal of Digital Asset Management. 1 (6): 372–385.
- ^ "What does a data steward do?". wiseGEEK. Retrieved 8 February 2014.
- ^ "What does a data custodian do?". wiseGEEK. Retrieved 8 February 2014.