Join (SQL)

A join combines records from two tables in a relational database and results in a new (temporary) table, also called joined table. In the Structured Query Language (SQL), there are two types of joins: inner and outer.

As a special case, a table (base table, view, or joined table) can be joined with itself again. This is called self-join.

Mathematically, join is relation composition, the fundamental operation in relational algebra, and generalizing function composition.

Join form

Sample Tables

Employee Table
LastName	DepartmentID
Rafferty	31
Jones	33
Steinberg	33
Robinson	34
Smith	34
Jasper	36

Department Table
DepartmentName	DepartmentID
Sales	31
Engineering	33
Clerical	34
Marketing	35

Inner join

An inner join essentially finds the intersection between the two tables. This is the most common type of join used, and is considered the default join type. The join example below takes all the records from table A (in this case, employee) and finds the matching record(s) in table B (department) based on the join predicate. If no match is found, the record from A is not included in the joined table. If multiple results are found in B that match the predicate (condition given), then one row will be returned for each (the values from A will be repeated).

Inner joins can be further classified into equi-joins, natural joins, and cross joins.

Special care must be taken when joining tables on columns that can be NULL since NULL values will never match each other, unless the join condition uses explicitly the IS NULL or IS NOT NULL predicates. NULL equality in joins can also be simulated to some degree by using the SQL COALESCE function or CASE expressions.

Example of an explicit inner join:

SELECT *
FROM   employee 
       INNER JOIN department 
          ON employee.DepartmentID = department.DepartmentID

Example of an implicit inner join:

SELECT *  
FROM   employee, department 
WHERE  employee.DepartmentID = department.DepartmentID

Inner join result:

Employee.LastName	Employee.DepartmentID	Department.DepartmentName	Department.DepartmentID
Smith	34	Clerical	34
Jones	33	Engineering	33
Robinson	34	Clerical	34
Steinberg	33	Engineering	33
Rafferty	31	Sales	31

Notice that employee Jasper and department Marketing do not appear. Neither of these records have accompanying rows in their associative tables, and are thus omitted from the inner join result.

Equi-Join

Equi Join = Inner Join : they use equality "=" of common attributes to join tables

Natural Join

NATURAL JOIN allows you to leave off the join condition completely if the tables concerned share common column names on which you want to join. However, it's ambiguous at best and leaves you open to problems if columns get added or renamed, no more than two tables can be joined using this method, and it gives you little control over the specifics of a join if columns join across the tables in an unusual way.

One thing to be aware of though is the NATURAL JOIN facility, which allows you to miss off the join condition completely if the tables concerned share common column names on which you want to join. For example, if your STUDENT and CLASS table share a common STUDENT_NO column, you could NATURAL JOIN them thus;

SELECT student_name, class_name
FROM   student
       NATURAL JOIN class

However, it's ambiguous at best and leaves you open to problems if columns get added or renamed, no more than two tables can be joined using this method, and it gives you little control over the specifics of a join if columns join across the tables in an unusual way. Tom's advice with NATURAL JOINs- forget that they exist.

Cross join

While not used very commonly, a cross join is the foundation upon which inner joins are built. A cross join returns the cartesian product of the sets of rows from the joined tables. Thus, it is an inner join where the join condition always evaluates to True.

If A and B are two sets then cross join = A X B.

The SQL code for a cross join lists the tables to be joined (FROM), but does not include any filtering join predicate (WHERE).

Example of an explicit cross join:

SELECT *
FROM   employee
       CROSS JOIN department

Example of an implicit cross join:

SELECT *
FROM   employee, department;

Employee.LastName	Employee.DepartmentID	Department.DepartmentName	Department.DepartmentID
Rafferty	31	Sales	31
Jones	33	Sales	31
Steinberg	33	Sales	31
Smith	34	Sales	31
Robinson	34	Sales	31
Jasper	36	Sales	31
Rafferty	31	Engineering	33
Jones	33	Engineering	33
Steinberg	33	Engineering	33
Smith	34	Engineering	33
Robinson	34	Engineering	33
Jasper	36	Engineering	33
Rafferty	31	Clerical	34
Jones	33	Clerical	34
Steinberg	33	Clerical	34
Smith	34	Clerical	34
Robinson	34	Clerical	34
Jasper	36	Clerical	34
Rafferty	31	Marketing	35
Jones	33	Marketing	35
Steinberg	33	Marketing	35
Smith	34	Marketing	35
Robinson	34	Marketing	35
Jasper	36	Marketing	35

As you can see the cross join does not apply any predicate when matching records for the joined table. These joins are almost never used, except to generate all possible combinations of records from tables that do not share a common element. Still, the results of a cross joins can be further filtered.

Outer Joins

Outer joins are subdivided further into left outer joins, right outer joins, and full outer joins.

Left Outer Join

A left outer join is very different from an inner join. Instead of limiting results to those in both tables, it limits results to those in the "left" table (A). This means that if the ON clause matches 0 records in B, a row in the result will still be returned — but with NULL in each column from B.

A left outer join returns all the values from left table + matched values from right table (or NULL in case of no matching value).

For example, this allows us to find the employee's departments, but still show the employee even when their department is NULL or does not exist (contrary to the inner join example above, where employees in non-existent departments were ignored).

Example of a left outer join:

SELECT distinct *  
FROM   employee 
       LEFT OUTER JOIN department 
          ON employee.DepartmentID = department.DepartmentID

Employee.LastName	Employee.DepartmentID	Department.DepartmentName	Department.DepartmentID
Jones	33	Engineering	33
Rafferty	31	Sales	31
Robinson	34	Clerical	34
Smith	34	Clerical	34
Jasper	36	NULL	NULL
Steinberg	33	Engineering	33

Right Outer Join

A right outer join is much like a left outer join, except that the tables are reversed. Every record from the right side, B or department, will be returned, and NULL will be returned for columns from A for those rows that have no matching record in A.

A right outer join returns all the values from right table and matched values from left table (or NULL in case of no matching value).

Example right outer join:

SELECT * 
FROM   employee 
       RIGHT OUTER JOIN department 
          ON employee.DepartmentID = department.DepartmentID

Employee.LastName	Employee.DepartmentID	Department.DepartmentName	Department.DepartmentID
Smith	34	Clerical	34
Jones	33	Engineering	33
Robinson	34	Clerical	34
Steinberg	33	Engineering	33
Rafferty	31	Sales	31
NULL	NULL	Marketing	35

Full Outer Join

A full outer join combines the results of both left and right outer joins. These joins will show records from both tables, and fill in NULLs for missing matches on either side.

Some database systems do not support this functionality, but it can be emulated through the use of left and right outer joins and unions (see below).

Example full outer join:

SELECT *  
FROM   employee 
       FULL OUTER JOIN department 
          ON employee.DepartmentID = department.DepartmentID

LastName	Department.DepartmentID	DepartmentName	Department.DepartmentID
Smith	34	Clerical	34
Jones	33	Engineering	33
Robinson	34	Clerical	34
Jasper	36	NULL	NULL
Steinberg	33	Engineering	33
Rafferty	31	Sales	31
NULL	NULL	Marketing	35

The same example, for use on databases that do not support FULL OUTER JOIN:

SELECT employee.LastName,
       employee.DepartmentID,  
       department.DepartmentName,
       department.DepartmentID
FROM   employee 
       LEFT JOIN department 
          ON employee.DepartmentID = department.DepartmentID
UNION
SELECT employee.LastName,
       employee.DepartmentID,  
       department.DepartmentName,
       department.DepartmentID
FROM   employee
       RIGHT JOIN department
         ON employee.DepartmentID = department.DepartmentID
WHERE  employee.DepartmentID IS NULL

Implementation

The efficient implementation of joins has been the goal of much work in database systems, because joins are both extremely common and difficult to execute efficiently. The difficulty results from the fact that (inner) joins are both commutative and associative. In practice, this means that the user merely supplies the list of tables to be joined and the join conditions to be used, and the database system has the task of determining the most efficient way to perform the operation. Determining how to execute a query containing joins is done by the query optimizer. It has two basic freedoms:

Join order: Because joins are commutative, the order in which tables are joined does not change the final result set of the query. However, join order does have an enormous impact on the cost of the join operation, so choosing the best join order is very important.
Join method: Given two tables and a join condition, there are multiple algorithms to produce the result set of the join. Which algorithm is most efficient depends on the sizes of the input tables, the number of rows from each table that match the join condition, and the operations required by the rest of the query.

Many join algorithms treat their inputs differently. The inputs to a join are referred to as the outer and inner join operands, or left and right, respectively. In the case of nested loops, for example, the entire inner relation will be scanned for each row of the outer relation. Query plans involving joins can be classified as:

Left-deep: the inner operand of each join in the plan is a base table (rather than another join).
Right-deep: the outer operand of each join in the plan is a base table.
Bushy: neither left-deep nor right-deep; both inputs to a join may be joins themselves.

These names are derived from the appearance of the query plan if drawn as a tree, with the outer join relation on the left and the inner relation on the right (as is the convention).

Join algorithms

There are three fundamental algorithms to perform a join operation.

Nested loops

This is the simplest join algorithm. For each tuple in the outer join relation, the entire inner join relation is scanned, and any tuples that match the join condition are added to the result set. Naturally, this algorithm performs poorly if either the inner or outer join relation is very large. The performance can be enhanced if the inner relation has an index on joining column.

A refinement to this technique is called "block nested loops" (BNL): for every block in the outer relation, the entire inner relation is scanned. For each match between the current inner tuple and one of the tuples in the current block of the outer relation, a tuple is added to the join result set. This variant means that more computation is done for each tuple of the inner relation, but far fewer scans of the inner relation are required.

Merge join

If both join relations are sorted by the join attribute, the join can be performed trivially:

For each tuple in the outer relation,
1. Consider the current "group" of tuples from the inner relation; a group consists of a set of contiguous tuples in the inner relation with the same value in the join attribute.
2. For each matching tuple in the current inner group, add a tuple to the join result. Once the inner group has been exhausted, both the inner and outer scans can be advanced to the next group.

This is one reason why many optimizers keep track of the sort order of query nodes — if one or both input relations to a merge join is already sorted on the join attribute, an additional sort is not required. Otherwise, the DBMS will need to perform the sort, usually using an external sort to avoid consuming too much memory.

Vorlage:Seealso

Hash join

Join optimization

Description coming soon

Semi join

A semi join is an optimization technique for joins on distributed databases. The join predicates are applied in multiple phases, starting with the earliest possible. This can reduce the size of the intermediate results that must be exchanged with remote nodes, thus reducing inter node network traffic. It can be improved with a Bloom filter (hashing).

External links

Various Join Algorithm Implementations

Vorlage:Databases