User:Xionbox/Algorithms
![]() | This user page is actively undergoing a major edit for a little while. To help avoid edit conflicts, please do not edit this page while this message is displayed. This page was last edited at 10:11, 11 July 2011 (UTC) (13 years ago) – this estimate is cached, . Please remove this template if this page hasn't been edited for a significant time. If you are the editor who added this template, please be sure to remove it or replace it with {{Under construction}} between editing sessions. |
Here be summary of algorithms and data structures to be known.
Analysis
Amortized analysis
Amortized analysis is a method of analyzing algorithms that considers the entire sequence of operations of the program. It allows for the establishment of a worst-case bound for the performance of an algorithm irrespective of the inputs by looking at all of the operations. At the heart of the method is the idea that while certain operations may be extremely costly in resources, they cannot occur at a high-enough frequency to weigh down the entire program because the number of less costly operations will far outnumber the costly ones in the long run, "paying back" the program over a number of iterations. It is particularly useful because it guarantees worst-case performance while accounting for the entire set of operations in an algorithm.
Big O
Big-O notation (Landau notation) describes the limiting behavior of the function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation.
Notation | Name | Example |
---|---|---|
constant | Determining if a number is even or odd; using a constant-size lookup table or hash table | |
double logarithmic | Finding a key value in an array sorted on the keys with Interpolation search. | |
logarithmic | Finding an item in a sorted array with a binary search or a balanced search tree as well as all operations in a Binomial heap. | |
fractional power | Searching in a kd-tree | |
linear | Finding an item in an unsorted list or a malformed tree (worst case) or in an unsorted array; Adding two n-bit integers by ripple carry. | |
linearithmic, loglinear, or quasilinear | Performing a Fast Fourier transform; heapsort, quicksort (best and average case), or merge sort | |
quadratic | Multiplying two n-digit numbers by a simple algorithm; bubble sort (worst case or naive implementation), Shell sort, quicksort (worst case), selection sort or insertion sort | |
polynomial or algebraic | Tree-adjoining grammar parsing; maximum matching for bipartite graphs | |
L-notation or sub-exponential | Factoring a number using the quadratic sieve or number field sieve | |
exponential | Finding the (exact) solution to the traveling salesman problem using dynamic programming; determining if two logical statements are equivalent using brute-force search | |
factorial | Solving the traveling salesman problem via brute-force search; generating all unrestricted permutations of a poset; finding the determinant with expansion by minors. |
The statement is sometimes weakened to to derive simpler formulas for asymptotic complexity. For any and , is a subset of for any , so may be considered as a polynomial with some bigger order.
Data structures
Linked list | Array | Dynamic array |
Balanced tree | |
---|---|---|---|---|
Search | Θ(n) | Θ(1) | Θ(1) | Θ(log n) |
Insert/delete at beginning | Θ(1) | — | Θ(n) | Θ(log n) |
Insert/delete at end | Θ(1) | — | Θ(1) amortized | Θ(log n) |
Insert/delete in middle | search time + Θ(1) |
— | Θ(n) | Θ(log n) |
Wasted space (average) | Θ(n) | 0 | Θ(n) | Θ(n) |
Linked lists
A linked list is a data structure used for collecting a sequence of objects, which allows efficient addition, removal and retrieval of elements from any position in the sequence. It is implemented as nodes, each of which contains a reference (i.e., a link) to the next and/or previous node in the sequence.
Hash tables

A hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associative array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought.
Separate chaining

In the strategy known as separate chaining, direct chaining, or simply chaining, each slot of the bucket array is a pointer to a linked list that contains the key-value pairs that hashed to the same location. Lookup requires scanning the list for an entry with the given key. Insertion requires adding a new entry record to either end of the list belonging to the hashed slot. Deletion requires searching the list and removing the element. (The technique is also called open hashing or closed addressing, which should not be confused with 'open addressing' or 'closed hashing'.)
Chained hash tables with linked lists are popular because they require only basic data structures with simple algorithms, and can use simple hash functions that are unsuitable for other methods.
The cost of a table operation is that of scanning the entries of the selected bucket for the desired key. If the distribution of keys is sufficiently uniform, the average cost of a lookup depends only on the average number of keys per bucket—that is, on the load factor.
For separate-chaining, the worst-case scenario is when all entries were inserted into the same bucket, in which case the hash table is ineffective and the cost is that of searching the bucket data structure. If the latter is a linear list, the lookup procedure may have to scan all its entries; so the worst-case cost is proportional to the number n of entries in the table.
Separate chaining with list heads

Some chaining implementations store the first record of each chain in the slot array itself. The purpose is to increase cache efficiency of hash table access. To save memory space, such hash tables often have about as many slots as stored entries, meaning that many slots have two or more entries
Queue

Double-ended queue
A double-ended queue (dequeue, often abbreviated to deque, pronounced deck) is an abstract data structure that implements a queue for which elements can only be added to or removed from the front (head) or back (tail). It is also often called a head-tail linked list.
Priority queue
It is exactly like a regular queue or stack data structure, but additionally, each element is associated with a "priority".
- stack: elements are pulled in last-in first-out-order (e.g. a stack of papers)
- queue: elements are pulled in first-in first-out-order (e.g. a line in a cafeteria)
- priority queue: elements are pulled highest-priority-first (e.g. cutting in line, or VIP service).
A priority queue must at least support the following operations:
insert_with_priority
: add an element to the queue with an associated prioritypull_highest_priority_element
: remove the element from the queue that has the highest priority, and return it (also known as "pop_element(Off)", "get_maximum_element", or "get_front(most)_element"; some conventions consider lower priorities to be higher, so this may also be known as "get_minimum_element", and is often referred to as "get-min" in the literature; the literature also sometimes implement separate "peek_at_highest_priority_element" and "delete_element" functions, which can be combined to produce "pull_highest_priority_element")
Circular buffer (or circular queue)

A circular buffer, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. The circular buffer is well suited as a FIFO buffer while a standard, non-circular buffer is well suited as a LIFO buffer. This structure lends itself easily to buffering data streams. An example that could possibly use an overwriting circular buffer is with multimedia. Circular buffering makes a good implementation strategy for a Queue that has fixed maximum size.
Start / End Pointers
Generally, a circular buffer requires three pointers:
- one to the actual buffer in memory
- one to point to the start of valid data
- one to point to the end of valid data
Alternatively, a fixed-length buffer with two integers to keep track of indices can be used in languages that do not have pointers.
Taking a couple of examples from above. (While there are numerous ways to label the pointers and exact semantics can vary, this is one way to do it.)


What to note about the second one is that after each element is overwritten then the start pointer is incremented as well.
Stack

In computer science, a stack is a last in, first out (LIFO) abstract data type and data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack, or initializes the stack if it is empty. If the stack is full and does not contain enough space to accept the given item, the stack is then considered to be in an overflow state. The pop operation removes an item from the top of the stack. A pop either reveals previously concealed items, or results in an empty stack, but if the stack is empty then it goes into underflow state (It means no items are present in stack to be removed). The stack top operation gets the data from the top-most position and returns it to the user without deleting it. The same underflow state can also occur in stack top operation if stack is empty.
A stack is a restricted data structure, because only a small number of operations are performed on it. The nature of the pop and push operations also means that stack elements have a natural order. Elements are removed from the stack in the reverse order to the order of their addition: therefore, the lower elements are those that have been on the stack the longest.
Trees
In graph theory, a k-ary tree is a rooted tree in which each node has no more than k children. It is also sometimes known as a k-way tree, an N-ary tree, or an M-ary tree.A binary tree is the special case where k=2.

A node is a structure which may contain a value, a condition, or represent a separate data structure (which could be a tree of its own). Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees are drawn growing downwards). A node that has a child is called the child's parent node (or ancestor node, or superior). A node has at most one parent.
Nodes that do not have any children are called leaf nodes. They are also referred to as terminal nodes.
A free tree is a tree that is not rooted
The height of a node is the length of the longest downward path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root (i.e., its root path). This is commonly needed in the manipulation of the various self balancing trees, AVL Trees in particular. Conventionally, the value −1 corresponds to a subtree with no nodes, whereas zero corresponds to a subtree with one node.
The topmost node in a tree is called the root node. Being the topmost node, the root node will not have parents. It is the node at which operations on the tree commonly begin (although some algorithms begin with the leaf nodes and work up ending at the root). All other nodes can be reached from it by following edges or links. (In the formal definition, each such path is also unique). In diagrams, it is typically drawn at the top. In some trees, such as heaps, the root node has special properties. Every node in a tree can be seen as the root node of the subtree rooted at that node.
An internal node or inner node is any node of a tree that has child nodes and is thus not a leaf node. Similarly, an external node or outer node is any node that does not have child nodes and is thus a leaf.
A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T. (This is different from the formal definition of subtree used in graph theory.) The subtree corresponding to the root node is the entire tree; the subtree corresponding to any other node is called a proper subtree (in analogy to the term proper subset).
B tree
B-tree | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Type | tree | |||||||||||||||||||||||
|

A B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time. The B-tree is a generalization of a binary search tree in that a node can have more than two children. (Comer, p. 123) Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.
Search
Searching is similar to searching a binary search tree. Starting at the root, the tree is recursively traversed from top to bottom. At each level, the search chooses the child pointer (subtree) whose separation values are on either side of the search value.

Deletion
There are two popular strategies for deletion from a B-Tree.
- locate and delete the item, then restructure the tree to regain its invariants
or
- do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring
The algorithm below uses the former strategy.
There are two special cases to consider when deleting an element:
- the element in an internal node may be a separator for its child nodes
- deleting an element may put its node under the minimum number of elements and children.
Each of these cases will be dealt with in order.
Deletion from a leaf node
- Search for the value to delete.
- If the value is in a leaf node, it can simply be deleted from the node,
- If underflow happens, check siblings to either transfer a key or fuse the siblings together.
- if deletion happened from right child retrieve the max value of left child if there is no underflow in left child
- in vice-versa situation retrieve the min element from right
Deletion from an internal node
Each element in an internal node acts as a separation value for two subtrees, and when such an element is deleted, two cases arise. In the first case, both of the two child nodes to the left and right of the deleted element have the minimum number of elements, namely L−1. They can then be joined into a single node with 2L−2 elements, a number which does not exceed U−1 and so is a legal node. Unless it is known that this particular B-tree does not contain duplicate data, we must then also (recursively) delete the element in question from the new node.
In the second case, one of the two child nodes contains more than the minimum number of elements. Then a new separator for those subtrees must be found. Note that the largest element in the left subtree is still less than the separator. Likewise, the smallest element in the right subtree is the smallest element which is still greater than the separator. Both of those elements are in leaf nodes, and either can be the new separator for the two subtrees.
- If the value is in an internal node, choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator.
- This has deleted an element from a leaf node, and so is now equivalent to the previous case.
Rebalancing after deletion
If deleting an element from a leaf node has brought it under the minimum size, some elements must be redistributed to bring all nodes up to the minimum. In some cases the rearrangement will move the deficiency to the parent, and the redistribution must be applied iteratively up the tree, perhaps even to the root. Since the minimum element count doesn't apply to the root, making the root be the only deficient node is not a problem. The algorithm to rebalance the tree is as follows:[citation needed]
- If the right sibling has more than the minimum number of elements
- Add the separator to the end of the deficient node.
- Replace the separator in the parent with the first element of the right sibling.
- Append the first child of the right sibling as the last child of the deficient node
- Otherwise, if the left sibling has more than the minimum number of elements.
- Add the separator to the start of the deficient node.
- Replace the separator in the parent with the last element of the left sibling.
- Insert the last child of the left sibling as the first child of the deficient node
- If both immediate siblings have only the minimum number of elements
- Create a new node with all the elements from the deficient node, all the elements from one of its siblings, and the separator in the parent between the two combined sibling nodes.
- Remove the separator from the parent, and replace the two children it separated with the combined node.
- If that brings the number of elements in the parent under the minimum, repeat these steps with that deficient node, unless it is the root, since the root is permitted to be deficient.
The only other case to account for is when the root has no elements and one child. In this case it is sufficient to replace it with its only child.
Binary search is typically (but not necessarily) used within nodes to find the separation values and child tree of interest.
DSW algorithm
The DSW algorithm is a method for efficiently balancing binary search trees — that is, decreasing their height to O(log n) nodes, where n is the total number of nodes. Unlike a self-balancing binary search tree, it does not do this incrementally during each operation, but periodically, so that its cost can be amortized over many operations.
The algorithm requires linear (O(n)) time and is in-place. The original algorithm generates as compact a tree as possible: all levels of the tree are completely full except possibly the bottom-most. The modification (by S/W) generates a complete binary tree, namely one in which the bottom-most level is filled strictly from left to right. This is a useful transformation to perform if it is known that no more inserts will be done.
Left child-right sibling binary tree
In computer science, a left child-right sibling binary tree is a binary tree representation of a k-ary tree. The process of converting from a k-ary tree to an LC-RS binary tree is not reversible in general without additional information.
To form a binary tree from an arbitrary k-ary tree by this method, the root of the original tree is made the root of the binary tree. Then, starting with the root, each node's leftmost child in the original tree is made its left child in the binary tree, and its nearest sibling to the right in the original tree is made its right child in the binary tree.
If the original tree was sorted, the new tree will be a binary search tree.
Trie or prefix tree

A trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.
In the example shown, keys are listed in the nodes and values below them. Each complete English word has an arbitrary integer value associated with it. A trie can be seen as a deterministic finite automaton, although the symbol on each edge is often implicit in the order of the branches.
It is not necessary for keys to be explicitly stored in nodes. (In the figure, words are shown only to illustrate how the trie works.)
Though it is most common, tries need not be keyed by character strings. The same algorithms can easily be adapted to serve similar functions of ordered lists of any construct, e.g., permutations on a list of digits or shapes. In particular, a bitwise trie is keyed on the individual bits making up a short, fixed size of bits such as an integer number or pointer to memory.