Tango tree

A Tango tree is an online binary search tree that is $O{\Bigl (}\log \log(n){\Bigr )}$ -competitive proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Patrascu in 2004.

Overview

Tango trees were designed to surpass the usual $O{\Bigl (}\log(n){\Bigr )}$ binary search tree cost of operations. They perform basic operations such as searches in $O{\Bigl (}\log \log(n){\Bigr )}$ time. This optimization is achieved dynamically by adjusting the search tree structure after each search. They are similar in their dynamic behaviour to other types of structure like a Splay tree however the competitive ratio is dramatically improved.

The approach is similar to the Greedy BST algorithm that while searching for an element rearranges the nodes on the search path to minimize the cost of future searches.

For Tango Trees the approach is a classic divide and conquer approach combined with a bring to top approach.

The main divide and conquer idea behind this data structure is to extract from the original tree a number of virtual smaller subtrees all with a normal O(log number of subtree elements) cost of access. These subtrees are dynamically balanced to offer the usual $O{\Bigl (}\log(n){\Bigr )}$ performance for data retrieval.

The bring to top approach is not done at the node level as much as at the subtree level which further improve competitiveness. Once the original tree has been adjusted to include a collection of these subtrees, it is possible to greatly improve the cost of access of these subtrees. Both the Tango tree and these subtrees are a type of Self-balancing binary search tree.

Tango tree achieves this outstanding competitive ratio by using a combination of augmentation of attributes in the data structure, a more elaborated algorithm and the use of other type of trees for some of its structure.

Example

Similar Data Structures

Red Black tree introduced by Bayer in 1972 and have a $O{\Bigl (}\log(n){\Bigr )}$ competitive ratio
Splay tree Introduced by Sleator and Tarjan in 1985 and have a $O{\Bigl (}\log(n){\Bigr )}$ competitive ratio
AVL tree Introduced by Adelson and Landis in 1962 and have a $O{\Bigl (}\log(n){\Bigr )}$ competitive ratio
Multi-Splay Tree Introduced by Sleator and Wang in 2006 and have a $O{\Bigl (}\log \log(n){\Bigr )}$ competitive ratio

Advantages

Tango Trees offer unsurpassed competitive ratio retrieval for online data. Online data means that operations that are not known in advance before the data structure is created. Outstanding search performance for a Tango tree relies on the fact that accessing nodes constantly updates the structure of the search trees. That way the searches are rerouted to searches in much shallower balanced trees. Obviously significantly higher access time constitutes an advantage for nearly all practical applications that offer searches as a use case. Dictionary searches like telephone directory search would be just one of the possible an examples.

Disadvantages

The Tango tree focuses on data searches on static data structures, and does not support deletions or insertions, so it might not be appropriate in every situation. The Tango tree uses augmentation, meaning storing more data in a node than in a node of a plain binary search tree. Tango trees use $O{\Bigl (}\log \log(n){\Bigr )}$ bits. Although that is not a significant increase, it results in a bigger memory footprint. It is a complex algorithm to implement like for instance [splay tree], and it also makes use rarely used operations of Red-Black Tree. Tango trees change when they are accessed in a 'read-only' manner (i.e. by find operations). This complicates the use of Tango trees in a multi-threaded environment. It is believed that Tango Tree would work in a practical situation where a very large data set with strong spatial and temporal coherence fit in the memory.

Terminology and Concepts

There are several types of trees besides the Red-Black trees (RB) used as a base for all Tree structures:

Reference Trees
Tango Trees
Auxiliary Trees

As all trees are derived from RB trees, they are also [Binary Search Trees] with all their inherent behaviour. Auxiliary trees can be considered sub-trees of the Tango Tree. Tango Trees are the actual employed trees while in production mode. Reference Trees are used only initial set-up and for illustration of the concepts. Any search in the Reference Tree creates a path from root to the searched node. We call that a Preferred Path and the Preferred Child attribute specific to the Reference Tree indicates if the preferred path of a node goes to the left or right child if any. A Preferred Path is determined by the longest path formed by preferred children. Any new search in the Reference Tree will carve new paths and modify the existing paths. Correspondingly, the preferred children change too. Any switch from right to left or vice versa is called an Interleave. Interleaves changes are the basis for analysis of expected performance. Fig. 2,3 and 1 examples of Reference Tree, Auxiliary tree and Tango Tree, respectively.

Fig. 2 Reference Tree example with Preferred Paths in dark red

Fig. 3 Example of Auxiliary Tree formed from a Preferred Path.

Operations

As we stated Tango Trees are static so they support only searches. That also means that there is a construction phase where the elements are inserted in the Tango Tree. That start-up cost and any search performance during the construction period is not considered part of the operational part of Tango trees therefor the performance is not $O{\Bigl (}\log \log(n){\Bigr )}$ competitive. The outstanding idea behind Tango Trees is to collect the nodes belonging to a Preferred Path as a balanced tree of height O(log log n) called auxiliary tree and then assemble them in a tree of trees where higher trees contain the mostly accessed preferred paths elements.

Search

To search for a node x in a Tango tree, we search for the element in the collection of Auxiliary Trees that make up the Tango Tree like in any ordinary binary search tree. Simultaneously we adjust the corresponding affected Auxiliary Trees in the Tango Tree. This will preserve the ideal structure that will give us this unprecedented search performance. We could achieve this ideal structure by updating the Reference Tree P after every search and recreating the Tango tree however this would be very expensive and nonperforming. The reasons why such a structure is $O{\Bigl (}\log \log(n){\Bigr )}$ competitive is explained in the Analysis There is a direct way to update the structure and that is shown in the [Algorithm]

Tango Tree Life Cycle

The main phases are Construction and Operation

Construction First create the reference tree and insert the desired data in it. Update the attributes of depth for each node After this phase the data and the value of the depth for the nodes will remain unchanged. Let's call that field d for further reference and understand that it always refers to the Reference tree not to the Tango tree as that can cause confusions. While in principle the reference tree can be any balanced tree that is augmented with the depth of each node in the tree the TODO [Demaine et al. 2004] uses [red-black tree]. Secondly we will perform some warm-up searches with the goal to create a decent distribution of Preferred Paths in the Reference Tree. Remember there is no Tango tree and all this is done on line. This means that performance is not critical at this point.

After this begins the phase of collecting the preferred paths. Out of each Preferred Path we create a new Auxiliary Tree which is just an ordinary RedBlack Tree where the nodes inherit the value of field d. That value will stay unchanged forever because it stays with the node even if the node is moved in the trees. There is no Tango Tree at this point. We add auxiliary trees such a way that the largest one is the top of a tree and the rest and 'hung' below it. This way we effectively create a forrest where each tree is an Auxilary tree.

See

where the roots of the composing auxiliary tree are depicted by magenta nodes. It after this step that the Tango tree becomes operational. See [Construction Algorithms And Pseudo-code] for main algorithms And pseudo-code.

Operation

The operation phase is the main phase where we perform searches in the Tango tree. See [Operation Algorithms And Pseudo-code] for main algorithms And pseudo-code.

Data Augumentation

Reference Tree augmentation

Besides the regular RB Tree fields we introduce two more fields: isPreferredChild representing the direction the last search traversing the node took. It could be either Boolean or a pointer to another node. In the figures it's represented by a red line between a node and its preferred child. d representing the depth of the node in the Reference Tree. See value of d in Fig 3.

Tango Tree Augmentation

For each nodes in Tango or Auxiliary Tree we also introduce several new fields:

isRoot that will be false for all nodes except for the root.
This is depicted in magenta for nodes where isRoot is true

maxD that is the Maximum value of d for all the children of the node
. This is depicted as MD in the figures. For some nodes this value is undefined and the field is not depicted in the figure

minD that is the Minimum value of d for all the children of the node
This is depicted as mD in the figures. For some nodes this value is undefined and the field is not depicted in the figure

isRoot, maxD and minD will change in time when nodes are moved around in the structure.

Algorithms

The main task in a Tango Tree is to maintain an 'ideal' structure mirroring changes that occur in the reference tree. As recreating the Tango tree form the reference tree would not be performing the algorithms will have only use and modify the Tango tree. That means that after the construction phase the reference tree could and should be destroyed. After this we would refer to it as virtual meaning 'as if it existed'.

Construction Algorithms And Pseudo-code

Construct the reference tree and perform warm-up searches. Function: constructReferenceTree Input: None Output: ReferenceTree p

ReferenceTree p = new ReferenceTree() insertDataInReferenceTree() p.setAllDepths() p.warmUpSearches() ArrayList<PreferredPath> paths = p.collectAndSortPreferredPaths() assert paths.size() > 0 PreferredPath top = p.collectTopNodePath(p.root) TangoTree tangoTree = new TangoTree(top) tangoTree.updateMinMaxD() while (takeNext PreferredPath path in paths) do if (path.top = p.root) then continue; // skip the top path as it was already added else RBTree auxTree = new RBTree(path) auxTree.updateMinMaxD() auxTree.root.isRoot = true tangoTree.hang(auxTree) return p

Construct an Auxiliary tree out of a Preferred Path. Used in the construction phase. Function: constructAuxiliaryTree Input: Preferred Path path Output: AuxiliaryTree this RBTree(PreferredPath path) this() RefTreeNode refNode = path.top while (Next PreferredPath path in paths exists) do RBNode n = new RBNode(refNode.value, RedBlackColor.RED, null, null) n.d = refNode.d this.insert(n) refNode = refNode.getPreferredChild()

Construct a Tango tree from the Reference tree. Function: constructTangoTree Input: ReferenceTree p Output: TangoTree tangoTree ArrayList<PreferredPath> paths = p.collectAndSortPreferredPaths() assert paths.size() > 0 PreferredPath top = p.collectTopNodePath(p.root) TangoTree tangoTree = new TangoTree(top) tangoTree.updateMinMaxD() while (Next PreferredPath path in paths exists) do if (path.top = p.root) then continue; // skip the top path as it was already added else RBTree auxTree = new RBTree(path) auxTree.updateMinMaxD() auxTree.root.isRoot = true return tangoTree

Set the depths of all nodded in the tree. Used only once for setting the depths of all nodes in the reference tree. Function: setAllDepths Input: None setAllDepthsHelper(root) Algorithm 14. Set the depths of all nodded in the tree. Function: setAllDepthsHelper Input: RefTreeNode n if (n == NILL) then return n.setD(getDepth(n)) if (n.right != null) setAllDepthsHelper(((RefTreeNode) n.right)) if (n.left != null) then setAllDepthsHelper((RefTreeNode) (n.left))

Operation Algorithms And Pseudo-code

There is just one operation: search that calls a number of algorithms to rearrange the data structure.

Tango search pseudocode. Used by the main operation on a Tango tree.
Function: tangoSearch
Input: int vKey
Node currentRoot = root
Node currentNode = root
Boolean found = false
while (true) do
	if (currentNode == NILL) then
		found = false
		break
	if (currentNode.isRoot && currentNode != root) then
		cutAndJoin(minIgnoreMinusOne(currentNode.minD, currentNode.d) - 1, currentRoot, currentNode)
		currentRoot = currentNode
	if (currentNode.value == vKey) then
		found = true
		if (currentNode != currentRoot) then
			cutAndJoin(currentNode.d, currentRoot, currentRoot)
		break
	if (currentNode.value < vKey) then
		currentNode = (RBNode) currentNode.right
	else
		currentNode = (RBNode) currentNode.left
if (found) then
	return currentNode
else
	return NILL


Tango Cut and Join used by Tango Search
Function: cutAntJoin
Input: int d, Node currentRoot, Node newlyFoundRoot
RBNode n = currentRoot
RBNode l = NILL// l is the last node that gets cut
RBNode lPrime = NILL// l' is the first node that escapes from cutting
RBNode r = NILL
RBNode rPrime = NILL
if (currentRoot.maxD <= d) // no nodes to be cut besides maybe the root
	if (currentRoot.d > d) 
		l = currentRoot
		r = currentRoot
		lPrime = getPredecessor(l, currentRoot)
		rPrime = getSuccessor(r, currentRoot)
	
 else // there are nodes to be cut underneath
	l = getL(d, n, currentRoot)
	// determine lPrime as predecessor of l or NILL if it's the last
	if (l != NILL) 
		lPrime = getPredecessor(l, currentRoot)
	 else 
		lPrime = NILL// - infinity maybe redundant			
	// end calculating l and l prime
	// find r the right side node of the cutting range based on value
	n = currentRoot
	r = getR(d, n, currentRoot)
	if (r != NILL) // the root is not to be cut
		rPrime = getSuccessor(r, currentRoot)
	
checkLandR(d, l, r, currentRoot)
RBTree aTree = NILL
if (lPrime == NILL && rPrime == NILL) // nothing to cut therefore so aTree is the whole
	aTree = new RBTree()
	aTree.root = currentRoot
 else 
	RBTreesPair aAndDtreePair = new RBTreesPair()
	aTree = tangoCut(lPrime, rPrime, currentRoot)		
RBTree afterCutAndJoin = tangoJoin(aTree, newlyFoundRootOrCurrentIfWeFound)


Tango Cut used by Tango cut And Join to separate nodes that need to be pushed to top from the rest of nodes.
Function: tangoCut
Input: RBNode lPrime, RBNode rPrime, RBNode aRoot
Output: RBTree
saveShadowAuxTrees(aRoot)
if (lPrime == null || rPrime == null) {// just one splitAnd COncatenate
	return simplifiedTangoCut(lPrime, rPrime, aRoot)
}
RBTree a = new RBTree()
a.root = aRoot
RBTreesPair firstPartAndLastPart = new RBTreesPair()
split(lPrime, a, firstPartAndLastPart)
RBTree b = firstPartAndLastPart.firstPart
RBTree c = firstPartAndLastPart.lastPart
firstPartAndLastPart.firstPart.verifyProperties()
firstPartAndLastPart.lastPart.verifyProperties()
firstPartAndLastPart = new RBTreesPair()
split(rPrime, c, firstPartAndLastPart)// problem
firstPartAndLastPart.firstPart.verifyProperties()
firstPartAndLastPart.lastPart.verifyProperties()
RBTree d = firstPartAndLastPart.firstPart
RBTree e = firstPartAndLastPart.lastPart		
// disconnect d
rPrime.left = NILL
d.root.parent = NILL

Tango Join used by Tango Cut and Join to join the result of Tango cut to auxiliary trees.
Function: tangoJoin
Input: RBTree a, RBNode newlyFoundRoot
Output: RBTree finalJoinResult
RBTree bPrevOp = new RBTree()
RBTree d = new RBTree()		
order(bPrevOp,d, a, newlyFoundRoot)
RBNodePair lAndR = bPrevOp.searchBoundedLPrimeAndRPrime(d.root.value)
if (lPrime == null || rPrime == null) // just one split and one concatenate
	return simplifiedTangoJoin(lPrime, rPrime, bPrevOp, d.root)
RBNode lPrime = lAndR.lPrime
RBNode rPrime = lAndR.rPrime
RBTreesPair firstPartAndLastPart = new RBTreesPair()
split(lPrime, bPrevOp, firstPartAndLastPart)
RBTree b = firstPartAndLastPart.firstPart
RBTree c = firstPartAndLastPart.lastPart
firstPartAndLastPart.firstPart.verifyProperties()
firstPartAndLastPart.lastPart.verifyProperties()
firstPartAndLastPart = new RBTreesPair()
split(rPrime, c, firstPartAndLastPart)//
firstPartAndLastPart.firstPart.verifyProperties()
firstPartAndLastPart.lastPart.verifyProperties()
RBTree e = firstPartAndLastPart.lastPart
// reconnect d which is normally newlyFoundRoot
d.root.isRoot = false// un-mark, a difference from tangoCut
// concatenate part
rPrime.parent = NILL// avoid side effects
RBTree res1 = concatenate(rPrime, d, e)
lPrime.parent = NILL// avoid side effects
RBTree res2 = concatenate(lPrime, b, res1)
return res2 

Check if a node n is in an auxiliary tree defined by currentRoot. Used to verify wandering.
Function: isInThisTree
Input:  RBNode n, RBNode currentRoot
Output: Boolean v
if (n.isRoot and n != currentRoot) then
	return false
else
	return true

Find node l as left of range used by Tango Cut, different from the [Demaine et al. 2004] paper.
Function: getL
Input: int d, Node n, Node currentRoot
Output: Node l
Node l = n
if (left[n] != NIL) and (not(isRoot(n) or n == currentRoot) and ((left(n).maxD > d) or (left(n).d > d)) then
	l=getL(d, left(n), currentRoot)
else
	if (n.d > d)
		l = n
	else
		l=getL(d, right(n), currentRoot)
return l


Find node r as the right limit of the range used by Tango Cut.
Function: getR
Input: int d, Node n, Node currentRoot
Output: Node r
Node r = n
if (right[n] != NIL) and (not(isRoot(n) or n == currentRoot) and ((right(n).maxD > d) or (right(n).d > d)) then
	r=getR(d, left(n), currentRoot)
else
	if (n.d > d)
		r = n
	else
		r=getR(d, right(n), currentRoot)
return r

Return Sibiling. Within enclosing auxiliary tree boundary
Function: siblingBound
Input:  RBNode n, RBNode boundingRoot
Output: Node p
if (n == left[parent[n]] && isInThisTreeAcceptsNull(left[parent[n]]), boundingRoot)) then
	if (isInThisTreeAcceptsNull(right[parent[n]], boundingRoot) then
		return right[parent[n]]
	else
		return NILL
else 
	if (isInThisTreeAcceptsNull(left[parent[n]], boundingRoot)) then
		return left[parent[n]]
 	else
		return NILL


Return Uncle. Within enclosing auxiliary tree boundary
Function: uncleBound
Input:  RBNode n, RBNode boundingRoot
Output: Node p
if (isInThisTreeAcceptsNull(parent[n], boundingRoot)) then
	return siblingBound(boundingRoot)
else
	return NILL


Update Min Max D values in red black tree augmented node. Used to update Tango Tree node attributes.
Function: updateMinMaxD
Input:  RBNode n
int minCandidate
if (n.left != NILL) then
	updateMinMaxD(n.left)
if (n.right != NILL) {
	updateMinMaxD(n.right)
if (n.left != NILL) then
	int maxFromLeft = max(n.left.d, n.left.maxD)
	n.maxD = maxFromLeft > n.maxD ? maxFromLeft : n.maxD
	if (n.left.minD != -1) {
		minCandidate = min(n.left.d, n.left.minD)
	else
		minCandidate = n.left.d
	if (n.minD != -1) then
		n.minD = min(n.minD, minCandidate)
	else
		n.minD = minCandidate
if (n.right != NILL) then
	int maxFromRight = max(n.right.d, n.right.maxD)
	n.maxD = maxFromRight > n.maxD ? maxFromRight : n.maxD
	if (n.right.minD != -1) then
		minCandidate = min(n.right.d, n.right.minD)
	else
		minCandidate = n.right.d
	if (n.minD != -1) then
		n.minD = min(n.minD, minCandidate)
	else
		n.minD = minCandidate

Search Bounded lPrime and rPrime used by Tango Join.
Function: searchBoundedLPrimeAndRPrime
Input:  RBTree rbTree, int value
Output: NodePair p
RBNode n = root
RBNode prevPrevN = NILL
RBNode prevN = NILL
RBNodePair lAndR
while (n != NILL && isInThisTree(n, rbTree.root)) do
	int compResult = value.compareTo(n.value)
	if (key(n) == value) then
		lAndR = new RBNodePair(n, n)
		return lAndR
	else 
		if (key(n) < value) then
			prevPrevN = prevN
			prevN = n
			if (isInThisTree(n.left, rbTree.root) then
				n = n.left
			else
				n = NILL
		else
			prevPrevN = prevN
			prevN = n
			if (isInThisTree(n.right, rbTree.root) then
				n = n.right
			else
				n = NILL
lAndR = new RBNodePair(prevPrevN, prevN)
return lAndR


The minimum in a binary search tree is always located if the left side path is traversed down to the leaf in O(log n) time:
Minimum Value Tree pseudocode. Used by successor.
Function: min_val_tree
Input: Node x
Output: Node x
while left(x) != NIL do
	x = left(x)
return x

Maximum Value Tree pseudocode used by predecessor.
Function: max_val_tree
Input: Node x
Output: Node x
while right(x) != NIL do
	x = right(x)
return x


The next two algorithms describe how to compute the predecessor and successor of a node. The predecessor of a node x is a node with the greatest value smaller the key[x].
Predecessor computing pseudocode used to find lPrime.
Function: predecessor
Input: RBNode x
Output: RBNode y
RBNode y = null
if (n.left != null && isInThisTree(((RBNode) (n.left)), root)) 
	return getMaximum((RBNode) (n.left), root)
y = (RBNode) (n.parent)
if (y == currentRoot) // don't let it escape above it's own root
	return NILL// --------------------------------------------------------------------->		
while (y != NILL && (y != currentRoot) && n == ((RBNode) (y.left))) do
	n = y
	if (isInThisTree(((RBNode) (y.parent)), root)) 
		y = (RBNode) (y.parent)
	 else 
		y = null
return (RBNode) y

Successor computing algorithm used to find rPrime.
Function: successor
Input: Node x
Output: Node y
RBNode y = NILL
if (n.right != NILL && isInThisTree(((RBNode) (n.right)), currentRoot)) 
	return getMinimum((RBNode) (n.right), currentRoot)

y = (RBNode) (n.parent)
if (y == currentRoot) // don't let it escape above it's own root
	return NILL// --------------------------------------------------------------------->		
while (y != NILL && isInThisTree(y, currentRoot) && n == ((RBNode) (y.right))) do
	n = y
	y = (RBNode) (y.parent)

return (RBNode) y


Traverse Tree pseudocode used by Tango Search.
Function: traverse_tree
Input: Node x
Output: None
if x != NIL and InTheAuxiliaryTree(x) then
	traverse_tree (left(x))
	traverse_tree (right(x))


Search Tree algorithm used to find lPrime and rPrime for Tango Join.
Function: search_tree
Input: Node x, value
Output: Node x
if x = NIL or k = key[x] then 
	return x
if k < key[x] then 
	return search_tree(left[x]; value)
else 
	return search_tree(right[x]; value)


Find minimum by ignoring specific values used for the calculation of d.
Function:  minIgnoreMinusOne
Input: int minD, int d
Output: int d
if (minD == -1) then
	return d
if (d == -1) then
	return minD
return min(minD, d)

RedBlack split and Red Black concatenate algorithms are described in the RON WEIN, 2005, Efficient Implementation of Red-Black Trees with Split and Catenate Operations, http://www.cs.tau.ac.il/~wein/publications/pdfs/rb_tree.pdf

Analysis

Here are some elements necessary to understand why the Tango Tree achieve such an amazing performance and become $O{\Bigl (}\log \log(n){\Bigr )}$ competitive.

Wilber's 1st Lower Bound [Wil89]

Fix an arbitrary static lower bound tree P with no relation to the actual BST T, but over the same keys. In the application that we consider, P is a perfect binary tree. For each node y in P, we label each access X1 L if key X1 is in y's left subtree in P, R if key X1 is in y's right subtree in P, or leave it blank if otherwise. For each y, we count the number of interleaves (the number of alterations) between accesses to the left and right subtrees: interleave(y)= ? of alternations L ? R.

Wilber's 1st Lower Bound [Wil89] states that the total number of interleaves is a lower bound for all BST data structures serving the access sequence x. The lower bound tree P must remain static.

Proof.

File:Wilber 1 bound picture.jpg

We define the transition point of y in P to be the highest node z in the BST T such

that the root-to-z path in T includes a node from the left and right subtrees if y in P. Observe that the transition point is well defined, and does not change until we touch z. In addition, the transition point is unique for every node y.

Lemma 1

The running time of an access xi is $O{\Bigl (}(k+1)(1+lglg(n)){\Bigr )}$ , where k is the number of nodes whose preferred child changes during access xi.

Lemma 2

The number of nodes whose preferred child changes from left to right or from right to left during an access xi is equal to the interleave bound ${\Bigl (}IBi(X{\Bigr )}$ of access xi.

Theorem 1

The running time of the Tango BST on an sequence X of m accesses over the universe ${1,2,...,n}.$ is $O{\Bigl (}(OPT(X)+n)(1+lglg(n)){\Bigr )}$ where $OPT{\Bigl (}X{\Bigr )}$ is the cost of the offline optimal BST servicing X.

Corollary 1.1

When m = (n), the running time of the Tango BST is

$O{\Bigl (}OPT(X)(1+lglg(n)){\Bigr )}$

-BST model

-Comparativeness def
-Splay configured

- PP ? TANGO connection

- If no reconfiguring cost how much

- How do you do reconfiguring in kloglog u

Back to BST model

- Given sequence of accesses

X=<X1,X2,...,Xm>

What would it mean to say that some BST execute this sequence optimally that is best possible
- If we are allowed to look at X ? D off-line BSTs
- If not ? on-line (dynamic) BST

Question: Is the dy-BST that executes X as well as best off-line BST for X
Relationship (or ? there of) to entropy y

Why does the fact that splay trees met entropy bound does not answer the above question?

SET UP:
Assume over $[n]={1,2,....n}$
Access sequence $X=<X1,X2,...Xm>$ €[n]^m
- $O{\Bigl (}\log \log(n){\Bigr )}$ must process the sequence in order
MODEL: - Search from the root to X_i and then before some arbitrary num of rotation

COST: num nodes in search path + num of rotations

DEF: OPT(x): smallest cost over all (off-line) BST i.e length of the shortest sequence that executes X
COMPETITIVENESS
BST runs X in K.OPT(X)?K competitive
if K cost ? DYNAMICALUU

- Splay configured to be OPTIMA BST

But in fact not known if such BST even exists.
What is comparativeness of BALANCES STATIC BST?

Trivially (m log n)

- one of the most important or at least most frustrating O in DS_i:

Is there O(₁) competitive BST

- until recently only trivially O(log) Comparativeness (obtained by balances BST) was best known
Trivial upper bound for balanced BST was obtained lower bound O(m).
Need better LB.

LOWER BOUND
Trivial lower bound O(m)

Wilber interleave bound W2

Defined with respect to arbitrary (but fixed) tree P. take P? perfect BST

left region $L(y)=y+itsleftsubtree$
right region $R(y)=itsrightsubtree$

the amount of interleaving through y: $I(y)$

$L(7)={7,6}$
$R(7)={9,8}$

e.g.
X=<3,1,4,1,5,9,2,6,5,3,5,8,9,7,3, 2,3,8,4,6,L7>
x⁷=<9,6,8,9,78,6,7>
I(y)=5

INTERLEAVE BOUND: W2

File:Diagram05.jpg

for any access tag X and any ref tree P examples
EXAMPLES SCANNING

File:Diagram07.jpg

? OPT(X)=O(m)
achievable by SPLAY

In fact: think of what DYNAMIC FINGER says
EXAMPLE RANDOM X

|B(randomX)=O(mlog)
that is highest W2 can give
O(mlog)? worst case

IDEA FOR TANGO (good BST)
Simulate interleave bound
DIFFERENT VIEW OF INTERLEAVE BOUND
PREFERRED PATH
node if: its preferred child indicates
if L(y) or R(y) contains the latest accessed element accessed by element in X
[[ Image:1 Reference tree after inserting all the elements and before any searches, d is the depth in tree, Md is the Max depth of a node considering all it's subchildren. JPG| frame | center | sizepx |Reference Tree after inserting all the elements and before any searches. d is the depth in tree. Md is the Max depth of a node considering all it's subchildren]]

Reference Tree after first search for element 17

Reference Tree after first 2 searches for 17 and 7

Reference Tree after performing searches for 17,7,1,18,23,31,27,13,9

preferred paths after accesses:
X=<7,1,18,23,31,2713,9...>
-since reference tree is balanced high of ? per. path O(logn)

Back to interleave
|B(X)=total ? times
the preferred paths changes as we execute X

IDEA FOR BST (tango)
Simulate IB(x)
Tango: after every access to element in X, preferred paths, in P are redefined.
WANT TANGO after EVERY ACCESS
to look as follows

top AUXILIARY TBE preferred path at balanced BST

second level Preferred Paths

third level etc

Reference Tree P with Preferred Paths marked in red after performing all warm-up searches for elements 17,7,1,18,23,31,27,13,9,20 searches

NOTICE: because preferred paths are PATHS, all the children subpaths form consecutive keys in[n]
thus the subtree hang in different places on top path

height of every path representation in TANGO is $O{\Bigl (}\log \log(n){\Bigr )}$ since P has O(log) vertices

tango of previous example

7 Second formed Auxiliary Tree formed from a Preferred Path from the previously searched Reference Tree.

Last Auxiliary Tree to be added to the Tango Tree, note that nodes that are not part of a preferred path become one node Auxiliary Trees.

9 Tango Tree based on previous Reference Tree where the first 2 Auxiliary Trees were added.

Tango Tree generated from the reference Tree P after the last warm-up search for for element 20, Nodes in pink are roots of Auxiliary Trees.

IMAGINING that after every access the TANGO looks the way we want (that is as defined above) and that we get that MAGICALLY FOR TREE. So we are not concerned how we get from Tango T_i to Tango T_i+1 after accessing X_i+1.

Given that for tree

How much would just accesses that is searches, of X COST

X=<X₁,X₂,...X_i,...X_m> how much is access to X_i in T_i O(K_i.loglogn)

number of times preferred child changes while searching for X_i

thus access to the whole X costs

File:Diagram03.jpg

cost searches = O(IB_(x).loglogn)=O(opt(x),logn) given transformation for tree

REST OF CLASS

System accessing Xi how do we transform T_i to T_i+1
spending O(Klglogn) time

(that is same time as accessing Xi)

We would then need following operations on Auxiliary Trees

search
cutting Aux Tree in 2

1 containing all elements at dept= d in P

2-containing all elements at dept > d in P

joining 2 Aux Trees that store disjoint paths when the bottom of one

Each preferred path as Auxiliary Tree -> RB tree

3 additional info in every node of AUX Tree

1. own depth in P
2. max depth in its subtree
3. node is marked if root of AUX Tree

join + cut can be achieved by standard RB tree operation

- split at x

- concatenate two RB trees

- HOW TO DO CUT IN TANGO

Reference Tree after inserting all the elements and before any searches. d is the depth in tree. Md is the Max depth of a node considering all it's subchildren

Remember the Reference Tree after warm-up search for 20 looked like this

If after that we would do a search for 30 it would have looked like this, where the path 16,24,20,22 and 23 becoming 16 , 24, 28, 30 therefore loosing 20, 22 and 23

Tango Tree after search for 20 with the Auxiliary tree corresponding

Tango Tree after search for 20 with the Auxiliary tree corresponding to the preferred path that will change after searching for 30 and cut points l, l, l prime and r prime marked

- You can find l in AUX by going to left child as long as its subtree contains an element of greater depth then l

-similarly you can find r

l' = predecessor of l
r'= successor of r

CUT

Implementation Tango Cut using RedBlack Split

Tango Join Algorithm as a Sequence of Tree Split Un-mark and Concatenate RP Operations

- cut of D (by concatenating its root)

- merge c,l,r,& E all done in $O{\Bigl (}\log \log(n){\Bigr )}$ time

Bibliography

Erik D. Demaine, Dion Harmon, John Iacono and Mihai Patrascu, Dynamic optimality - almost [competitive online binary search tree], In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 484-490, Rome, Italy, October 2004, http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=9430

Allen, Brian; and Munro, Ian (1978), "Self-organizing search trees", Journal of the ACM 25 (4): 526–535, doi:10.1145/322092.322094
Knuth, Donald. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Page 478 of section 6.2.3.

References

DANIEL DOMINIC SLEATOR AND ROBERT ENDRE TARJAN, 1985, Self-adjusting binary search trees. Journal of the ACM, 32(3):652-686, July 1985. ERIK D. DEMAINE, DION HARMON, JOHN IACONO, AND MIHAI PATRASCU, 2004, Dynamic optimality-almost, FOCS 2004 ERIK D. DEMAINE, DION HARMON, JOHN IACONO, AND MIHAI PATRASCU, Dynamic optimality-almost. SIAM J. Computers., 37(1):240-251, 2007 R. BAYER 1972, Symmetric Binary B-trees: Data Structure and Maintenance Algorithms, Acta Informatica 1:290-306 L. J. GUIBAS AND R. SEDGEWICK, 1978, A dichromatic framework for balanced trees. Nineteenth Annual IEEE Symposium on Foundations of Computer Science, pages 8-12, 1978 CHENGWEN CHRIS WANG, JONATHAN DERRYBERRY, DANIEL DOMINIC SLEATOR 2006, O(log log n)-Competitive Dynamic Binary Search Trees, SODA ’06, January 22-26, Miami, FL 2006 SIAM ISBN 0-89871-605-5/06/01 ROBERT WILBER, 1989, Lower bounds for accessing binary search trees with rotations, SIAM Journal on Computing, 18(1):56-67, 1989 ROBERT ENDRE TARJAN, 1983, Linking and cutting trees, In Data Structures and Network Algorithms, chapter 5, pages 59-70. Society for Industrial and Applied Mathematics,1983 DANIEL DOMINIC SLEATOR, Open Source top-down splay tree implementation, http://www.link.cs.cmu.edu/splay/

External links

ERIK D. DEMAINE, DION HARMON, JOHN IACONO, AND MIHAI PATRASCU, 2007, Dynamic optimality-almost.

RON WEIN, 2005, Efficient Implementation of Red-Black Trees with Split and Catenate Operations

RON WEIN, 2005, Efficient Implementation of Red-Black Trees with Split and Catenate Operations

DANIEL DOMINIC SLEATOR, Open Source top-down splay tree implementation