Semi-structured data

Semi-structured data^[1] is a form of structured data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure.

In semi-structured data, the entities belonging to the same class may have different attributes even though they are grouped together, and the attributes' order is not important.

Semi-structured data are increasingly occurring since the advent of the Internet where full-text documents and databases are not the only forms of data anymore, and different applications need a medium for exchanging information. In object-oriented databases, one often finds semi-structured data.

Pros and cons

Advantages

Programmers persisting objects from their application to a database do not need to worry about object-relational impedance mismatch, but can often serialize objects via a light-weight library.
Support for nested or hierarchical data often simplifies data models representing complex relationships between entities.
Support for lists of objects simplifies data models by avoiding messy translations of lists into a relational data model.

Disadvantages

The traditional relational data model has a popular and ready-made query language, SQL.
Prone to "garbage in, garbage out"; by removing restraints from the data model, there is less forethought that is necessary to operate a data application.

References

^ Peter Buneman (1997). "Semistructured data" (PDF). Symposium on Principles of Database Systems.

External links

UPenn Database Group – semi-structured data and XML
Semi-Structured data analytics: Relational or Hadoop platform? by IBM

[1] Peter Buneman (1997). "Semistructured data" (PDF). Symposium on Principles of Database Systems.

[1]

Pros and cons

Advantages

Disadvantages

See also

References

External links