Binary search tree

Binary search tree
Binary search tree
Type	tree
Invented	1960
Invented by	P.F. Windley, A.D. Booth, A.J.T. Colin, and T.N. Hibbard
Operation
Time complexity in big O notation
Operation	Average
Search	O(log n)
Insert	O(log n)
Delete	O(log n)
Space complexity
Space	O(n)

In computer science, a binary search tree (BST), sometimes also called an ordered or sorted binary tree, is a node-based binary tree data structure where each node has a comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all nodes in that node's left sub-tree and smaller than the keys in all nodes in that node's right sub-tree. Each node has no more than two child nodes. Each child must either be a leaf node or the root of another binary search tree. The left sub-tree contains only nodes with keys less than the parent node; the right sub-tree contains only nodes with keys greater than the parent node. BSTs are also dynamic data structures, and the size of a BST is only limited by the amount of free memory in the operating system. The main advantage of binary search trees is that it remains ordered, which provides quicker search times than many other data structures. The common properties of binary search trees are as follows: ^[1]

The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
The left and right subtree each must also be a binary search tree.
Each node can have up to two successor nodes.
There must be no duplicate nodes.
A unique path exists from the root to every other node.

Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their associated records.

The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient. The other advantages are:

Binary Search Tree is fast in insertion and deletion etc. when balanced.
Very efficient and its code is easier than other data structures.
Stores keys in the nodes in a way that searching, insertion and deletion can be done efficiently.
Implementation is very simple in Binary Search Trees.
Nodes in tree are dynamic in nature.

Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays. Some of their disadvantages are as follows:

The shape of the binary search tree totally depends on the order of insertions, and it can be degenerated.
When inserting or searching for an element in binary search tree, the key of each visited node has to be compared with the key of the element to be inserted or found, i.e., it takes a long time to search an element in a binary search tree.
The keys in the binary search tree may be long and the run time may increase.
After a long intermixed sequence of random insertion and deletion, the expected height of the tree approaches square root of the number of keys which grows much faster than $\log n$ .

Determining whether a tree is a BST or not

Sometimes we already have a binary tree, and we need to determine whether it is a BST. This is an interesting problem which has a simple recursive solution.

The BST property—every node on the right subtree has to be larger than the current node and every node on the left subtree has to be smaller than (or equal to - should not be the case as only unique values should be in the tree - this also poses the question as to if such nodes should be left or right of this parent) the current node—is the key to figuring out whether a tree is a BST or not. On first thought it might look like we can simply traverse the tree, at every node check whether the node contains a value larger than the value at the left child and smaller than the value on the right child, and if this condition holds for all the nodes in the tree then we have a BST. This is the so-called "Greedy approach," making a decision based on local properties. But this approach clearly won't work for the following tree:

In the tree above, each node meets the condition that the node contains a value larger than its left child and smaller than its right child hold, and yet it is not a BST: the value 5 is on the right subtree of the node containing 20, a violation of the BST property!

How do we solve this? It turns out that instead of making a decision based solely on the values of a node and its children, we also need information flowing down from the parent as well. In the case of the tree above, if we could remember about the node containing the value 20, we would see that the node with value 5 is violating the BST property contract.

So the condition we need to check at each node is: a) if the node is the left child of its parent, then it must be smaller than (or equal to) the parent and it must pass down the value from its parent to its right subtree to make sure none of the nodes in that subtree is greater than the parent, and similarly b) if the node is the right child of its parent, then it must be larger than the parent and it must pass down the value from its parent to its left subtree to make sure none of the nodes in that subtree is lesser than the parent.

A simple but elegant recursive solution in C++ can explain this further:

struct TreeNode {
    int data;
    TreeNode *left;
    TreeNode *right;
};

bool isBST(TreeNode *node, int minData, int maxData) {
    if(node == NULL) return true;
    if(node->data < minData || node->data > maxData) return false;
    
    return isBST(node->left, minData, node->data) && isBST(node->right, node->data, maxData);
}

The initial call to this function can be something like this:

if(isBST(root, INT_MIN, INT_MAX)) {
    puts("This is a BST.");
} else {
    puts("This is NOT a BST!");
}

Essentially we keep creating a valid range (starting from [MIN_VALUE, MAX_VALUE]) and keep shrinking it down for each node as we go down recursively.

Difference between binary tree and binary search tree

Binary tree: In short, a binary tree is a tree where each node has up to two leaves. In a binary tree, a left child node and a right child node contain values which can be either greater, less, or equal to parent node.

    3
   / \
  4   5

Binary Search Tree: A Binary Search Tree imposes the condition that, for all nodes, the left children are less than or equal to the current node, which is less than all the right nodes.

    4
   / \
  3   9

Operations

Operations, such as find, on a binary search tree require comparisons between nodes. These comparisons are made with calls to a comparator, which is a subroutine that computes the total order (linear order) on any two keys. This comparator can be explicitly or implicitly defined, depending on the language in which the binary search tree was implemented. A common comparator is the less-than function, for example, a < b, where a and b are keys of two nodes a and b in a binary search tree.

Searching

Searching a binary search tree for a specific key can be a recursive or an iterative process.

We begin by examining the root node. If the tree is null, the key we are searching for does not exist in the tree. Otherwise, if the key equals that of the root, the search is successful and we return the node. If the key is less than that of the root, we search the left subtree. Similarly, if the key is greater than that of the root, we search the right subtree. This process is repeated until the key is found or the remaining subtree is null. If the searched key is not found before a null subtree is reached, then the item must not be present in the tree. This is easily expressed as a recursive algorithm:

function Find-recursive(key, node):  // call initially with node = root
    if node = Null or node.key = key then
        return node
    else if key < node.key then
        return Find-recursive(key, node.left)
    else
        return Find-recursive(key, node.right)

The same algorithm can be implemented iteratively:

function Find(key, root):
    current-node := root
    while current-node is not Null do
        if current-node.key = key then
            return current-node
        else if key < current-node.key then
            current-node := current-node.left
        else
            current-node := current-node.right
    return Null

Because in the worst case this algorithm must search from the root of the tree to the leaf farthest from the root, the search operation takes time proportional to the tree's height (see tree terminology). On average, binary search trees with n nodes have O(log n) height. However, in the worst case, binary search trees can have O(n) height, when the unbalanced tree resembles a linked list (degenerate tree).

Insertion

Insertion begins as a search would begin; if the key is not equal to that of the root, we search the left or right subtrees as before. Eventually, we will reach an external node and add the new key-value pair (here encoded as a record 'newNode') as its right or left child, depending on the node's key. In other words, we examine the root and recursively insert the new node to the left subtree if its key is less than that of the root, or the right subtree if its key is greater than or equal to the root.

Here's how a typical binary search tree insertion might be performed in a binary tree in C++:

void insert(Node*& root, int data) {
  if (!root) 
    root = new Node(data);
  else if (data < root->data)
    insert(root->left, data);
  else if (data > root->data)
    insert(root->right, data);
}

The above destructive procedural variant modifies the tree in place. It uses only constant heap space (and the iterative version uses constant stack space as well), but the prior version of the tree is lost. Alternatively, as in the following Python example, we can reconstruct all ancestors of the inserted node; any reference to the original tree root remains valid, making the tree a persistent data structure:

 def binary_tree_insert(node, key, value):
     if node is None:
         return TreeNode(None, key, value, None)
     if key == node.key:
         return TreeNode(node.left, key, value, node.right)
     if key < node.key:
         return TreeNode(binary_tree_insert(node.left, key, value), node.key, node.value, node.right)
     else:
         return TreeNode(node.left, node.key, node.value, binary_tree_insert(node.right, key, value))

The part that is rebuilt uses O(log n) space in the average case and O(n) in the worst case (see big-O notation).

In either version, this operation requires time proportional to the height of the tree in the worst case, which is O(log n) time in the average case over all trees, but O(n) time in the worst case.

Another way to explain insertion is that in order to insert a new node in the tree, its key is first compared with that of the root. If its key is less than the root's, it is then compared with the key of the root's left child. If its key is greater, it is compared with the root's right child. This process continues, until the new node is compared with a leaf node, and then it is added as this node's right or left child, depending on its key.

There are other ways of inserting nodes into a binary tree, but this is the only way of inserting nodes at the leaves and at the same time preserving the BST structure.

Deletion

There are three possible cases to consider:

Deleting a leaf (node with no children): Deleting a leaf is easy, as we can simply remove it from the tree.
Deleting a node with one child: Remove the node and replace it with its child.
Deleting a node with two children: Call the node to be deleted N. Do not delete N. Instead, choose either its in-order successor node or its in-order predecessor node, R. Copy the value of R to N, then recursively call delete on R until reaching one of the first two cases. If you choose in-order successor of a node, as right sub tree is not NIL( Our present case is node has 2 children), then its in-order successor is node with least value in its right sub tree, which will have at a maximum of 1 sub tree, so deleting it would fall in one of first 2 cases.

Broadly speaking, nodes with children are harder to delete. As with all binary trees, a node's in-order successor is its right subtree's left-most child, and a node's in-order predecessor is the left subtree's right-most child. In either case, this node will have zero or one children. Delete it according to one of the two simpler cases above.

Consistently using the in-order successor or the in-order predecessor for every instance of the two-child case can lead to an unbalanced tree, so some implementations select one or the other at different times.

Runtime analysis: Although this operation does not always traverse the tree down to a leaf, this is always a possibility; thus in the worst case it requires time proportional to the height of the tree. It does not require more even when the node has two children, since it still follows a single path and does not visit any node twice.

def find_min(self):   # Gets minimum node (leftmost leaf) in a subtree
    current_node = self
    while current_node.left_child:
        current_node = current_node.left_child
    return current_node

def replace_node_in_parent(self, new_value=None):
    if self.parent:
        if self == self.parent.left_child:
            self.parent.left_child = new_value
        else:
            self.parent.right_child = new_value
    if new_value:
        new_value.parent = self.parent

def binary_tree_delete(self, key):
    if key < self.key:
        self.left_child.binary_tree_delete(key)
    elif key > self.key:
        self.right_child.binary_tree_delete(key)
    else: # delete the key here
        if self.left_child and self.right_child: # if both children are present
            successor = self.right_child.find_min()
            self.key = successor.key
            successor.binary_tree_delete(successor.key)
        elif self.left_child:   # if the node has only a *left* child
            self.replace_node_in_parent(self.left_child)
        elif self.right_child:  # if the node has only a *right* child
            self.replace_node_in_parent(self.right_child)
        else: # this node has no children
            self.replace_node_in_parent(None)

Traversal

Once the binary search tree has been created, its elements can be retrieved in-order by recursively traversing the left subtree of the root node, accessing the node itself, then recursively traversing the right subtree of the node, continuing this pattern with each node in the tree as it's recursively accessed. As with all binary trees, one may conduct a pre-order traversal or a post-order traversal, but neither are likely to be useful for binary search trees. An in-order traversal of a binary search tree will always result in a sorted list of node items (numbers, strings or other comparable items).

The code for in-order traversal in Python is given below. It will call callback for every node in the tree.

def traverse_binary_tree(node, callback):
    if node is None:
        return
    traverse_binary_tree(node.leftChild, callback)
    callback(node.value)
    traverse_binary_tree(node.rightChild, callback)

With respect to the example defined in the lead section of this article,

The pre-order traversal is: 8, 3, 1, 6, 4, 7, 10, 14, 13.
The in-order traversal is: 1, 3, 4, 6, 7, 8, 10, 13, 14.
The post-order traversal is: 1, 4, 7, 6, 3, 13, 14, 10, 8.

Traversal requires O(n) time, since it must visit every node. This algorithm is also O(n), so it is asymptotically optimal.

Sort

A binary search tree can be used to implement a simple but efficient sorting algorithm. Similar to heapsort, we insert all the values we wish to sort into a new ordered data structure—in this case a binary search tree—and then traverse it in order, building our result:

def build_binary_tree(values):
    tree = None
    for v in values:
        tree = binary_tree_insert(tree, v)
    return tree

def get_inorder_traversal(root):
    '''
    Returns a list containing all the values in the tree, starting at *root*.
    Traverses the tree in-order(leftChild, root, rightChild).
    '''
    result = []
    traverse_binary_tree(root, lambda element: result.append(element))
    return result

The worst-case time of build_binary_tree is $O(n^{2})$ —if you feed it a sorted list of values, it chains them into a linked list with no left subtrees. For example, build_binary_tree([1, 2, 3, 4, 5]) yields the tree (1 (2 (3 (4 (5))))).

There are several schemes for overcoming this flaw with simple binary trees; the most common is the self-balancing binary search tree. If this same procedure is done using such a tree, the overall worst-case time is O(nlog n), which is asymptotically optimal for a comparison sort. In practice, the poor cache performance and added overhead in time and space for a tree-based sort (particularly for node allocation) make it inferior to other asymptotically optimal sorts such as heapsort for static list sorting. On the other hand, it is one of the most efficient methods of incremental sorting, adding items to a list over time while keeping the list sorted at all times.

Types

There are many types of binary search trees. AVL trees and red-black trees are both forms of self-balancing binary search trees. A splay tree is a binary search tree that automatically moves frequently accessed elements nearer to the root. In a treap (tree heap), each node also holds a (randomly chosen) priority and the parent node has higher priority than its children. Tango trees are trees optimized for fast searches.

Two other titles describing binary search trees are that of a complete and degenerate tree.

A complete binary tree is a binary tree, which is completely filled, with the possible exception of the bottom level, which is filled from left to right. In complete binary tree, all nodes are far left as possible. It is a tree with n levels, where for each level d <= n - 1, the number of existing nodes at level d is equal to 2^d. This means all possible nodes exist at these levels. An additional requirement for a complete binary tree is that for the nth level, while every node does not have to exist, the nodes that do exist must fill from left to right.

A degenerate tree is a tree where for each parent node, there is only one associated child node. It is unbalanced and, in the worst case, performance degrades to that of a linked list. If your added node function does not handle re-balancing, then you can easily construct a degenerate tree by feeding it with data that is already sorted. What this means is that in a performance measurement, the tree will essentially behave like a linked list data structure.

Performance comparisons

D. A. Heger (2004)^[2] presented a performance comparison of binary search trees. Treap was found to have the best average performance, while red-black tree was found to have the smallest amount of performance variations.

Optimal binary search trees

Tree rotations are very common internal operations in binary trees to keep perfect, or near-to-perfect, internal balance in the tree.

If we do not plan on modifying a search tree, and we know exactly how often each item will be accessed, we can construct^[3] an optimal binary search tree, which is a search tree where the average cost of looking up an item (the expected search cost) is minimized.

Even if we only have estimates of the search costs, such a system can considerably speed up lookups on average. For example, if you have a BST of English words used in a spell checker, you might balance the tree based on word frequency in text corpora, placing words like the near the root and words like agerasia near the leaves. Such a tree might be compared with Huffman trees, which similarly seek to place frequently used items near the root in order to produce a dense information encoding; however, Huffman trees store data elements only in leaves, and these elements need not be ordered.

If we do not know the sequence in which the elements in the tree will be accessed in advance, we can use splay trees which are asymptotically as good as any static search tree we can construct for any particular sequence of lookup operations.

Alphabetic trees are Huffman trees with the additional constraint on order, or, equivalently, search trees with the modification that all elements are stored in the leaves. Faster algorithms exist for optimal alphabetic binary trees (OABTs).

References

^ Gilberg, R.; Forouzan, B. (2001), "8", Data Structures: A Pseudocode Approach With C++, Pacific Grove, CA: Brooks/Cole, p. 339, ISBN 0-534-95216-X
^ Heger, Dominique A. (2004), "A Disquisition on The Performance Behavior of Binary Search Tree Data Structures" (PDF), European Journal for the Informatics Professional, 5 (5): 67–75
^ Gonnet, Gaston. "Optimal Binary Search Trees". Scientific Computation. ETH Zürich. Retrieved 1 December 2013.

External links

Literate implementations of binary search trees in various languages on LiteratePrograms
Goleta, Maksim (27 November 2007). "Goletas.Collections". goletas.com. Includes an iterative C# implementation of AVL trees.
Jansens, Dana. "Persistent Binary Search Trees". Computational Geometry Lab, School of Computer Science, Carleton University. C implementation using GLib.
Binary Tree Visualizer (JavaScript animation of various BT-based data structures)
Kovac, Kubo. "Binary Search Trees" (Java applet). Korešpondenčný seminár z programovania.
Madru, Justin (18 August 2009). "Binary Search Tree". JDServer. C++ implementation.
Tarreau, Willy (2011). "Elastic Binary Trees (ebtree)". 1wt.eu.
Binary Search Tree Example in Python
"References to Pointers (C++)". MSDN. Microsoft. 2005. Gives an example binary tree implementation.
Igushev, Eduard. "Binary Search Tree C++ implementation".
Stromberg, Daniel. "Python Search Tree Empirical Performance Comparison".

[1] Gilberg, R.; Forouzan, B. (2001), "8", Data Structures: A Pseudocode Approach With C++, Pacific Grove, CA: Brooks/Cole, p. 339, ISBN 0-534-95216-X

[2] Heger, Dominique A. (2004), "A Disquisition on The Performance Behavior of Binary Search Tree Data Structures" (PDF), European Journal for the Informatics Professional, 5 (5): 67–75

[3] Gonnet, Gaston. "Optimal Binary Search Trees". Scientific Computation. ETH Zürich. Retrieved 1 December 2013.

[1]

[2]

[3]

v t e Tree data structures
Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B K-Dimensional B+ B* B^x Binary search Optimal Self-balancing Dancing HTree Interval Order statistic Palindrome (Left-leaning) Red–black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal d-ary Fibonacci Leftist Pairing Skew binomial Skew van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatial data partitioning trees	Ball BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree PH Priority R Quad R R+ R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top

v t e Data structures
Types	Collection Container
Abstract	Associative array Multimap Retrieval Data Structure List Stack Queue Double-ended queue Priority queue Double-ended priority queue Set Multiset Disjoint-set
Arrays	Bit array Circular buffer Dynamic array Hash table Hashed array tree Sparse matrix
Linked	Association list Linked list Skip list Unrolled linked list XOR linked list
Trees	B-tree Binary search tree AA tree AVL tree Red–black tree Self-balancing tree Splay tree Heap Binary heap Binomial heap Fibonacci heap R-tree R* tree R+ tree Hilbert R-tree Trie Hash tree
Graphs	Binary decision diagram Directed acyclic graph Directed acyclic word graph
List of data structures