Relational algebra

In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics. The theory was introduced by Edgar F. Codd.^[1]

The main application of relational algebra is to provide a theoretical foundation for relational databases, particularly query languages for such databases, chief among which is SQL. Relational databases store tabular data represented as relations. Queries over relational databases often likewise return tabular data represented as relations.

The main purpose of relational algebra is to define operators that transform one or more input relations to an output relation. Given that these operators accept relations as input and produce relations as output, they can be combined and used to express complex queries that transform multiple input relations (whose data are stored in the database) into a single output relation (the query results).

Unary operators accept a single relation as input. Examples include operators to filter certain attributes (columns) or tuples (rows) from an input relation. Binary operators accept two relations as input and combine them into a single output relation. For example, taking all tuples found in either relation (union), removing tuples from the first relation found in the second relation (difference), extending the tuples of the first relation with tuples in the second relation matching certain conditions, and so forth.

Introduction

Relational algebra received little attention outside of pure mathematics until the publication of E.F. Codd's relational model of data in 1970. Codd proposed such an algebra as a basis for database query languages.

Relational algebra operates on homogeneous sets of tuples $S=\{(s_{j1},s_{j2},...s_{jn})|j\in 1...m\}$ where we commonly interpret m to be the number of rows of tuples in a table and n to be the number of columns. All entries in each column have the same type.

A relation also has a unique tuple called the header which gives each column a unique name or attribute inside the relation). Attributes are used in projections and selections.

Set operators

The relational algebra uses set union, set difference, and Cartesian product from set theory, and adds additional constraints to these operators to create new ones.

For set union and set difference, the two relations involved must be union-compatible—that is, the two relations must have the same set of attributes. Because set intersection is defined in terms of set union and set difference, the two relations involved in set intersection must also be union-compatible.

For the Cartesian product to be defined, the two relations involved must have disjoint headers—that is, they must not have a common attribute name.

In addition, the Cartesian product is defined differently from the one in set theory in the sense that tuples are considered to be "shallow" for the purposes of the operation. That is, the Cartesian product of a set of n-tuples with a set of m-tuples yields a set of "flattened" $(n + m)$ -tuples (whereas basic set theory would have prescribed a set of 2-tuples, each containing an n-tuple and an m-tuple). More formally, R × S is defined as follows:

$R\times S:=\{(r_{1},r_{2},\dots ,r_{n},s_{1},s_{2},\dots ,s_{m})|(r_{1},r_{2},\dots ,r_{n})\in R,(s_{1},s_{2},\dots ,s_{m})\in S\}$

The cardinality of the Cartesian product is the product of the cardinalities of its factors, that is, |R × S| = |R| × |S|.

Projection

A projection ( $Π$ ) is a unary operation written as $\Pi _{a_{1},\ldots ,a_{n}}(R)$ where $a_{1},\ldots ,a_{n}$ is a set of attribute names. The result of such projection is defined as the set that is obtained when all tuples in R are restricted to the set $\{a_{1},\ldots ,a_{n}\}$ .

Note: when implemented in SQL standard the "default projection" returns a multiset instead of a set, and the $Π$ projection to eliminate duplicate data is obtained by the addition of the DISTINCT keyword.

Selection

A generalized selection (σ) is a unary operation written as $\sigma _{\varphi }(R)$ where $φ$ is a propositional formula that consists of atoms as allowed in the normal selection and the logical operators $\wedge$ (and), $\lor$ (or) and $\neg$ (negation). This selection selects all those tuples in R for which $φ$ holds.

To obtain a listing of all friends or business associates in an address book, the selection might be written as $\sigma _{{\text{isFriend = true}}\,\lor \,{\text{isBusinessContact = true}}}({\text{addressBook}})$ . The result would be a relation containing every attribute of every unique record where $isFriend$ is true or where $isBusinessContact$ is true.

Rename

A rename (ρ) is a unary operation written as $\rho _{a/b}(R)$ where the result is identical to R except that the b attribute in all tuples is renamed to an a attribute. This is commonly used to rename the attribute of a relation for the purpose of a join.

To rename the "isFriend" attribute to "isBusinessContact" in a relation, $\rho _{\text{isBusinessContact / isFriend}}({\text{addressBook}})$ might be used.

There is also the $\rho _{x(A_{1},\ldots ,A_{n})}(R)$ notation, where R is renamed to x and the attributes $\{a_{1},\ldots ,a_{n}\}$ are renamed to $\{A_{1},\ldots ,A_{n}\}$ .^[2]

Natural join

Natural join (⨝) is a binary operator that is written as (R ⨝ S) where R and S are relations.^[a] The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example consider the tables Employee and Dept and their natural join:^{[citation needed]}

*Employee*
Name	EmpId	DeptName
Harry	3415	Finance
Sally	2241	Sales
George	3401	Finance
Harriet	2202	Sales
Mary	1257	Human Resources

*Dept*
DeptName	Manager
Finance	George
Sales	Harriet
Production	Charles

*Employee* ⨝ *Dept*
Name	EmpId	DeptName	Manager
Harry	3415	Finance	George
Sally	2241	Sales	Harriet
George	3401	Finance	George
Harriet	2202	Sales	Harriet

Note that neither the employee named Mary nor the Production department appear in the result.

This can also be used to define composition of relations. For example, the composition of Employee and Dept is their join as shown above, projected on all but the common attribute DeptName.

The natural join is arguably one of the most important operators since it is the relational counterpart of the logical AND operator. Note that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value (this is a consequence of the idempotence of the logical AND).

Formally the semantics of the natural join are defined as follows:

R\bowtie S=\left\{r\cup s\ \vert \ r\in R\ \land \ s\in S\ \land \ r_{A}=s_{A}\right\}

1

It is usually required that R and S must have at least one common attribute, but if this constraint is omitted, and R and S have no common attributes, then the natural join becomes the Cartesian product.

The natural join can be simulated with Codd's primitives as follows. Assume that c₁,...,c_m are the attribute names common to R and S, r₁,...,r_n are the attribute names unique to R and s₁,...,s_k are the attribute names unique to S. Furthermore, assume that the attribute names x₁,...,x_m are neither in R nor in S. In a first step the common attribute names in S can be renamed:

T=\rho _{x_{1}/c_{1},\ldots ,x_{m}/c_{m}}(S)=\rho _{x_{1}/c_{1}}(\rho _{x_{2}/c_{2}}(\ldots \rho _{x_{m}/c_{m}}(S)\ldots ))

2

Then we take the Cartesian product and select the tuples that are to be joined:

P=\sigma _{c_{1}=x_{1},\ldots ,c_{m}=x_{m}}(R\times T)=\sigma _{c_{1}=x_{1}}(\sigma _{c_{2}=x_{2}}(\ldots \sigma _{c_{m}=x_{m}}(R\times T)\ldots ))

3

Finally we take a projection to get rid of the renamed attributes:

U=\Pi _{r_{1},\ldots ,r_{n},c_{1},\ldots ,c_{m},s_{1},\ldots ,s_{k}}(P)

4

Implementations

The first query language to be based on Codd's algebra was Alpha, developed by Dr. Codd himself. Subsequently, ISBL was created, and this pioneering work has been acclaimed by many authorities^[3] as having shown the way to make Codd's idea into a useful language. Business System 12 was a short-lived industry-strength relational DBMS that followed the ISBL example.

In 1998 Chris Date and Hugh Darwen proposed a language called Tutorial D intended for use in teaching relational database theory, and its query language also draws on ISBL's ideas.^[4] Rel is an implementation of Tutorial D. Bmg is an implementation of relational algebra in Ruby which closely follows the principles of Tutorial D and The Third Manifesto.^[5]

Even the query language of SQL is loosely based on a relational algebra, though the operands in SQL (tables) are not exactly relations and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the detriment of optimisers and/or users). The SQL table model is a bag (multiset), rather than a set. For example, the expression $(R\cup S)\setminus T=(R\setminus T)\cup (S\setminus T)$ is a theorem for relational algebra on sets, but not for relational algebra on bags.^[6]

Notes

^ In Unicode, the join symbol is ⨝ (U+2A1D), and the bowtie symbol, occasionally used instead, is ⋈ (U+22C8).

References

^ Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. 13 (6): 377–387. doi:10.1145/362384.362685. S2CID 207549016.
^ Silberschatz, Abraham; Henry F. Korth; S. Sudarshan (2020). Database system concepts (Seventh ed.). New York. p. 56. ISBN 978-0-07-802215-9. OCLC 1080554130.{{cite book}}: CS1 maint: location missing publisher (link)
^ C. J. Date. "Edgar F. Codd - A.M. Turing Award Laureate". amturing.acm.org. Retrieved 2020-12-27.
^ C. J. Date and Hugh Darwen. "Databases, Types, and the Relational model: The Third Manifesto" (PDF). Retrieved 2024-07-04.
^ "Bmg documentation". Retrieved 2024-07-04.
^ Hector Garcia-Molina; Jeffrey D. Ullman; Jennifer Widom (2009). Database systems: the complete book (2nd ed.). Pearson Prentice Hall. ISBN 978-0-13-187325-4.

External links

RAT Relational Algebra Translator Free software to convert relational algebra to SQL
Lecture Videos: Relational Algebra Processing - An introduction to how database systems process relational algebra
Lecture Notes: Relational Algebra – A quick tutorial to adapt SQL queries into relational algebra
Relational – A graphic implementation of the relational algebra
Query Optimization This paper is an introduction into the use of the relational algebra in optimizing queries, and includes numerous citations for more in-depth study.
Relational Algebra System for Oracle and Microsoft SQL Server
Pireal – An experimental educational tool for working with Relational Algebra
DES – An educational tool for working with Relational Algebra and other formal languages
RelaX - Relational Algebra Calculator (open-source software available as an online service without registration)
RA: A Relational Algebra Interpreter
Translating SQL to Relational Algebra

[3] In Unicode, the join symbol is ⨝ (U+2A1D), and the bowtie symbol, occasionally used instead, is ⋈ (U+22C8).

[Codd1970-1] Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. 13 (6): 377–387. doi:10.1145/362384.362685. S2CID 207549016.

[2] Silberschatz, Abraham; Henry F. Korth; S. Sudarshan (2020). Database system concepts (Seventh ed.). New York. p. 56. ISBN 978-0-07-802215-9. OCLC 1080554130.{{cite book}}: CS1 maint: location missing publisher (link)

[4] C. J. Date. "Edgar F. Codd - A.M. Turing Award Laureate". amturing.acm.org. Retrieved 2020-12-27.

[5] C. J. Date and Hugh Darwen. "Databases, Types, and the Relational model: The Third Manifesto" (PDF). Retrieved 2024-07-04.

[6] "Bmg documentation". Retrieved 2024-07-04.

[Garcia-MolinaUllman2009-7] Hector Garcia-Molina; Jeffrey D. Ullman; Jennifer Widom (2009). Database systems: the complete book (2nd ed.). Pearson Prentice Hall. ISBN 978-0-13-187325-4.

[1]

[2]

[a]

[3]

[4]

[5]

[6]

v t e Database management systems
Types	Object-oriented comparison Relational list comparison Key–value Column-oriented list Document-oriented Wide-column store Graph NoSQL NewSQL In-memory list Multi-model comparison Cloud Blockchain-based database
Concepts	Database ACID Armstrong's axioms Codd's 12 rules CAP theorem CRUD Null Candidate key Foreign key PACELC design principle Superkey Surrogate key Unique key
Objects	Relation table column row View Transaction Transaction log Trigger Index Stored procedure Cursor Partition
Components	Concurrency control Data dictionary JDBC XQJ ODBC Query language Query optimizer Query rewriting system Query plan
Functions	Administration Query optimization Replication Sharding
Related topics	Database models Database normalization Database storage Distributed database Federated database system Referential integrity Relational algebra Relational calculus Relational model Object–relational database Transaction processing
Category Outline