Relational model
The relational model for management of a database is a data model based on predicate logic and set theory.
The Model
The fundamental assumption of the relational model is that all data are represented as mathematical relations, i.e., a subset of the Cartesian product of n sets. In the mathematical model, reasoning about such data is done in two-valued predicate logic (that is, without NULLs), meaning there are two possible evaluations for each proposition: either true or false. Data are operated upon by means of a relational calculus and algebra.
The relational data model permits the designer to create a consistent logical model of information, to be refined through database normalization. The access plans and other implementation and operation details are handled by the DBMS engine, and should not be reflected in the logical model. This contrasts with common practice for SQL DBMSs in which performance tuning often requires changes to the logical model.
The basic relational building block is the domain, or data type. A tuple is an ordered multiset of attributes, which are ordered pairs of domain and value. A relvar (relation variable) is a set of ordered pairs of domain and name, which serves as the header for a relation. A relation is a set of tuples. Although these relational concepts are mathematically defined, they correspond loosely to traditional database concepts. A relation is similar to the traditional concept of table. A tuple is similar to the concept of row.
The basic principle of the relational model is the Information Principle: all information is represented by data values in relations. Thus, the relvars are not related to each other at design time: rather, designers use the same domain in several relvars, and if one attribute is dependent on another, this dependency is enforced through referential integrity.
Competition
Other models are the hierarchical model and network model. Some systems using these older architectures are still in use today in data centers with high data volume needs or where existing systems are so complex it would be cost prohibitive to migrate to systems employing the relational model; also of note are newer object-oriented databases, even though many of them are DBMS-construction kits, rather than proper DBMSs.
The relational model was the first formal database model. After it was defined, informal models were made to describe hierarchical databases (the hierarchical model) and network databases (the network model). Hierarchical and network databases existed before relational databases, but were only described as models after the relational model was defined, in order to establish a basis for comparison.
History
The relational model was invented by Dr. Ted Codd as a general model of data, and subsequently maintained and developed by Chris Date and Hugh Darwen among others. In The Third Manifesto (1995) they show how the relational model can be extended with object-oriented features without compromising its fundamental principles.
Misimplementation
SQL, initially pushed as the standard language for relational databases, was actually always in violation of it. SQL DBMS's are thus not actually RDBMS's, and the current ISO SQL standard doesn't mention the relational model or use relational terms or concepts.
Implementation
There have been several attempts to produce a true implementation of the relational database model originally developed by Codd, Date, Darwen and others, but none have been popular successes so far. Rel is one of the more recent attempts to do this.
Controversies
Codd himself proposed a three-valued logic version of the relational model, and a four-valued logic version has also been proposed, in order to deal with missing information. But these have never been implemented, presumably because of attending complexity. SQL NULLs were intended to be part of a three-valued logic system, but fell short of that due to logical errors in the standard and in its implementations.
Design
Database normalization is usually performed when designing a relational database, to improve the logical consistency of the database design and the transactional performance.
There are two commonly used systems of diagramming to aid in the visual representation of the relational model: the entity-relationship diagram (ERD), and the related IDEF diagram used in the IDEF1X method created by the U.S. Air Force based on ERDs.
Example database
An idealized, very simple example of a description of some relvars and their attributes:
Customer(Customer ID, Tax ID, Name, Address, City, State, Zip, Phone)
Order(Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)
Order Line(Order No, Order Line No, Product Code, Qty)
Invoice(Invoice No, Customer ID, Order No, Date, Status)
Invoice Line(Invoice No,Line No,Product Code, Qty Shipped)
In this design we have five relvars: Customer, Order, Order Line, Invoice, and Invoice Line. The bold, underlined attributes are candidate keys. The non-bold, underlined attributes are foreign keys.
Usually one candidate key is arbitrarily chosen to be called the primary key and used in preference over the other candidate keys, which are then called alternate keys.
A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag, by violating the basic definition of a set. A key can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.
Example: customer relation
Customer ID Tax ID Name Address [More fields....] ================================================================================================== 1234567890 555-5512222 Jo Lee 323 Broadway ... 2223344556 555-5523232 Dorothy Red 1200 Main Street ... 3334445563 555-5533322 Linda de la Cruz 871 1st Street ... 4232342432 555-5325523 E. F. Codd 123 It Way ...
If we attempted to insert a new customer with the ID 1234567890, this would violate the design of the relvar since Customer ID is a primary key and we already have a customer 1234567890. The DBMS must reject a transaction such as this that would render the database inconsistent by a violation of an integrity constraint.
Foreign keys are integrity constraints enforcing that the value of the attribute set is drawn from a candidate key in another relation, for example in the Order relation the attribute Customer ID is a foreign key. A join is the operation that draws on information from several relations at once. By joining relvars from the example above we could query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific customer, we would specify this using a restriction condition.
If we wanted to retrieve all of the Orders for Customer 1234567890, we could query the database to return every row in the Order table with Customer ID 1234567890 and join the Order table to the Order Line table based on Order No.
There is a flaw in our database design above. The Invoice relvar contains an Order No attribute. So, each tuple in the Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in reality an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always true in the Real World. An order is sometimes paid through several invoices, and sometimes paid without an invoice. In other words there can be many Invoices per Order and many Orders per Invoice. This is a many-to-many relationship between Order and Invoice (also called a non-specific relationship). To represent this relationship in the database a new relvar should be introduced whose role is to specify the correspondence between Orders and Invoices:
OrderInvoice(Order No,Invoice No)
Now, the Order relvar has a one-to-many relationship to the OrderInvoice table, as does the Customer relvar. If we want to retrieve every Invoice for a particular Order, we can query for all orders where Order No in the Order relation equals the Order No in OrderInvoice, and where Invoice No in OrderInvoice equals the Invoice No in Invoice.
See also
References
- Codd, E. F. (1970). "A relational model of data for large shared data banks". Communications of the ACM, , Vol. 13, No. 6, pp. 377-387. Retrieved from http://www.acm.org/classics/nov95/toc.html Sept. 4, 2004.
- Date, Christopher J. (2003); Introduction to Database Systems. 8th ed.