Jump to content

Document-oriented database

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Rediosoft (talk | contribs) at 13:25, 9 November 2012. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information. Document-oriented databases are one of the main categories of so-called NoSQL databases and the popularity of the term "document-oriented database" (or "document store") has grown with the use of the term NoSQL itself. In contrast to well-known Relational databases and their notions of "Relations" (or "Tables"), these systems are designed around an abstract notion of a "Document".

Documents

The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).

Documents inside a document-oriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like. For example here's a document:

 {
    FirstName:"Bob", 
    Address:"5 Oak St.", 
    Hobby:"sailing"
 }

Another document could be:

 {
     FirstName:"Jonathan", 
     Address:"15 Wanamassa Point Road", 
     Children:[
        {Name:"Michael",Age:10}, 
        {Name:"Jennifer", Age:8}, 
        {Name:"Samantha", Age:5}, 
        {Name:"Elena", Age:2}
   ]
  }

Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it does not require explicitly stating if other pieces of information are left out.

Keys

Documents are addressed in the database via a unique key that represents that document. Often, this key is a simple string. In some cases, this string is a URI or path. Regardless, this key can be used to retrieve the document from the database. Typically, the database retains an index on the key such that document retrieval is fast.

Retrieval

One of the other defining characteristics of a document-oriented database is that, beyond the simple key-document (or key-value) lookup that you can use to retrieve a document, the database will offer an API or query language that will allow you to retrieve documents based on their contents. For example, you may want a query that gets you all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to the next.

Organization

Implementations offer a variety of ways of organizing documents, including notions of

  • Collections
  • Tags
  • Non-visible Metadata
  • Directory hierarchies

Implementations

Name Publisher License Language Notes RESTful API
BaseX BaseX Team BSD License Java, XQuery Support for XML, JSON and binary formats; client-/server based architecture; concurrent structural and full-text searches and updates; REST APIs. Yes
Clusterpoint Clusterpoint Ltd. Free community license / Commercial[1] C++ Scalable, high-performance, schema-free, document-oriented database management system platform with server based data storage, fast full text search engine functionality, information ranking for search relevance and clustering. Yes
Couchbase Couchbase Apache License Erlang and C Distributed NoSQL Document Database. Yes [2]
CouchDB Apache Software Foundation Apache License Erlang JSON over REST/HTTP with Multi-Version Concurrency Control and limited ACID properties. Uses map and reduce for views and queries.[3] Yes [4]
djondb djondb GPL C++ document-oriented database optimized for enterprises. No
eXist eXist, [3] GPL XQuery, Java XML over REST/HTTP, WebDAV, Lucene Fulltext search, validation, versioning, clustering, triggers, URL rewriting, collections, ACLS, XQuery Update Yes [5]
Firebase Firebase Proprietary Scala Distributed Realtime JSON Database; Accesible directly from a web-browser. Yes [6]
FleetDB FleetDB MIT License Clojure A JSON-based schema-free database optimized for agile development. (unknown)
Aurinko Aurinko Eclipse Public License Clojure A very compact networked document database engine implementation. (unknown)
Jackrabbit Apache Software Foundation Apache License Java (unknown)
Lotus Notes IBM Proprietary LotusScript, Java, Lotus @Formula (unknown)
MarkLogic MarkLogic Corporation Free Express license or Commercial REST, Java, XQuery, XSLT, C++ Fast, secure, scalable, distributed, enterprise-grade document-oriented database with Multi-Version Concurrency Control, integrated Full text search and ACID-compliant transaction semantics Yes
MongoDB 10gen, Inc GNU AGPL v3.0[7] C++ Fast, document-oriented database optimized for highly transient data. Optional [8]
MUMPS Database[9] Proprietary and GNU Affero GPL[10] MUMPS Commonly used in health applications. (unknown)
OrientDB Orient Technologies Apache License Java JSON over HTTP Yes
Apache Cassandra Apache Cassandra Apache License Java JSON over HTTP Yes
RavenDB Hibernating Rhinos AGPL Free or Commercial C#, F#, VB.NET JSON over REST/HTTP with a rich .Net client API. ACID compliant. Uses Lucene.net to provide indexes using a LINQ interface, optionally with MapReduce functionality. Yes [11]
Redis BSD License ANSI C Key-value store supporting lists and sets with fast, simple and binary-safe protocol. (unknown)
Rocket U2 Rocket Software Proprietary UniData, UniVerse Yes (Beta)
Terrastore Apache License Java JSON/HTTP (unknown)
ThruDB BSD License C++, Java Built on top of Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Alternate implementation is being developed in Java. Alpha software. (unknown)
VaultDB RedIO Software Inc. Free development license or Commercial C++, Java Encrypted document store, Free Developer Edition, ACID transactions, multi-recipient cryptosystem, schema-free, supports PHP and replication. (unknown)


XML database implementations

Most XML databases are document-oriented databases.

See also

References

Further reading