Data structures

B-tree
B-tree
Type	tree
Operation
Time complexity in big O notation
Operation	Average
Search	O(log n)
Insert	O(log n)
Delete	O(log n)
Space complexity
Space	O(n)

	Linked list	Array	Dynamic array	Balanced tree
Search	Θ(n)	Θ(1)	Θ(1)	Θ(log n)
Insert/delete at beginning	Θ(1)	—	Θ(n)	Θ(log n)
Insert/delete at end	Θ(1)	—	Θ(1) amortized	Θ(log n)
Insert/delete in middle	search time + Θ(1)	—	Θ(n)	Θ(log n)
Wasted space (average)	Θ(n)	0	Θ(n)	Θ(n)

Linked lists

A linked list whose nodes contain two fields: an integer value and a link to the next node

A linked list is a data structure used for collecting a sequence of objects, which allows efficient addition, removal and retrieval of elements from any position in the sequence. It is implemented as nodes, each of which contains a reference (i.e., a link) to the next and/or previous node in the sequence.

Hash tables

A hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associative array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought.

Separate chaining

In the strategy known as separate chaining, direct chaining, or simply chaining, each slot of the bucket array is a pointer to a linked list that contains the key-value pairs that hashed to the same location. Lookup requires scanning the list for an entry with the given key. Insertion requires adding a new entry record to either end of the list belonging to the hashed slot. Deletion requires searching the list and removing the element. (The technique is also called open hashing or closed addressing, which should not be confused with 'open addressing' or 'closed hashing'.)

Chained hash tables with linked lists are popular because they require only basic data structures with simple algorithms, and can use simple hash functions that are unsuitable for other methods.

The cost of a table operation is that of scanning the entries of the selected bucket for the desired key. If the distribution of keys is sufficiently uniform, the average cost of a lookup depends only on the average number of keys per bucket—that is, on the load factor.

For separate-chaining, the worst-case scenario is when all entries were inserted into the same bucket, in which case the hash table is ineffective and the cost is that of searching the bucket data structure. If the latter is a linear list, the lookup procedure may have to scan all its entries; so the worst-case cost is proportional to the number n of entries in the table.

Separate chaining with list heads

Some chaining implementations store the first record of each chain in the slot array itself. The purpose is to increase cache efficiency of hash table access. To save memory space, such hash tables often have about as many slots as stored entries, meaning that many slots have two or more entries

Trees

B tree

A B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time. The B-tree is a generalization of a binary search tree in that a node can have more than two children. (Comer, p. 123) Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.

Search

Searching is similar to searching a binary search tree. Starting at the root, the tree is recursively traversed from top to bottom. At each level, the search chooses the child pointer (subtree) whose separation values are on either side of the search value.

A B Tree insertion example with each iteration.

Deletion

There are two popular strategies for deletion from a B-Tree.

locate and delete the item, then restructure the tree to regain its invariants

or

do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring

The algorithm below uses the former strategy.

There are two special cases to consider when deleting an element:

the element in an internal node may be a separator for its child nodes
deleting an element may put its node under the minimum number of elements and children.

Each of these cases will be dealt with in order.

Deletion from a leaf node

Search for the value to delete.
If the value is in a leaf node, it can simply be deleted from the node,
If underflow happens, check siblings to either transfer a key or fuse the siblings together.
if deletion happened from right child retrieve the max value of left child if there is no underflow in left child
in vice-versa situation retrieve the min element from right

Deletion from an internal node

Each element in an internal node acts as a separation value for two subtrees, and when such an element is deleted, two cases arise. In the first case, both of the two child nodes to the left and right of the deleted element have the minimum number of elements, namely L−1. They can then be joined into a single node with 2L−2 elements, a number which does not exceed U−1 and so is a legal node. Unless it is known that this particular B-tree does not contain duplicate data, we must then also (recursively) delete the element in question from the new node.

In the second case, one of the two child nodes contains more than the minimum number of elements. Then a new separator for those subtrees must be found. Note that the largest element in the left subtree is still less than the separator. Likewise, the smallest element in the right subtree is the smallest element which is still greater than the separator. Both of those elements are in leaf nodes, and either can be the new separator for the two subtrees.

If the value is in an internal node, choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator.
This has deleted an element from a leaf node, and so is now equivalent to the previous case.

Rebalancing after deletion

If deleting an element from a leaf node has brought it under the minimum size, some elements must be redistributed to bring all nodes up to the minimum. In some cases the rearrangement will move the deficiency to the parent, and the redistribution must be applied iteratively up the tree, perhaps even to the root. Since the minimum element count doesn't apply to the root, making the root be the only deficient node is not a problem. The algorithm to rebalance the tree is as follows:^{[citation needed]}

If the right sibling has more than the minimum number of elements
- Add the separator to the end of the deficient node.
- Replace the separator in the parent with the first element of the right sibling.
- Append the first child of the right sibling as the last child of the deficient node
Otherwise, if the left sibling has more than the minimum number of elements.
- Add the separator to the start of the deficient node.
- Replace the separator in the parent with the last element of the left sibling.
- Insert the last child of the left sibling as the first child of the deficient node
If both immediate siblings have only the minimum number of elements
- Create a new node with all the elements from the deficient node, all the elements from one of its siblings, and the separator in the parent between the two combined sibling nodes.
- Remove the separator from the parent, and replace the two children it separated with the combined node.
- If that brings the number of elements in the parent under the minimum, repeat these steps with that deficient node, unless it is the root, since the root is permitted to be deficient.

The only other case to account for is when the root has no elements and one child. In this case it is sufficient to replace it with its only child.

Binary search is typically (but not necessarily) used within nodes to find the separation values and child tree of interest.