Oblivious data structure

Introduction

In most conditions, even if the data is encrypted, the access pattern can be achieved, and this pattern can leak some important information such as encryption keys. And in the outsourcing of cloud data, this leakage of access pattern is still very serious. An access pattern is a specification of an access mode for every attribute of a relation schema. For example, the sequences of user read or write the data in the cloud are access patterns.

We say if a machine is oblivious if the sequence in which it accesses is equivalent for any two input with the same running time. So the data access pattern is independent from the input.

Oblivious data structure means data structures that will give nothing about the sequence or pattern of the operations that have been applied except for the final result of the operations.

Applications:

Cloud data outsourcing: Oblivious data structures：When writing or reading data from a cloud server, the oblivious data structures will be useful. And modern database rely on data structure heavily, so oblivious data structure will come in handy.

Secure processor: Tamper-resilient secure processors are used for defense physical attacks or the malicious intruders will access the users’ computer platforms. For now the existing secure processors designed in academia and industry include AEGIS and Intel SGX encrypt. But the memory addresses are still transferred in the clear on the memory bus. So the research finds that this memory buses can give out the information about encryption keys. With the Oblivious data structure comes in practical, the secure processor can obfuscate memory access pattern in a provably secure manner.

Secure computation: Traditionally people used circuit-model to do the secure computation, but the model is not enough for the security when the amount of data is getting big. RAM-model secure computation was proposed as an alternative to the traditional circuit model, and oblivious data structure is used to prevent information access pattern being stolen.

Oblivious Data Structures

Oblivious RAM

Goldreich and Ostrovsky proposed this term on software protection.

The memory access of oblivious RAM is probabilistic and the probabilistic distribution is independent of the input. In the paper composed by Goldreich and Ostrovsky have theorem to oblivious RAM: Let RAM(m) denote a RAM with m memory locations and access to a random oracle. Then t steps of an arbitrary RAM(m) program can be simulated by less than O(t*(log₂t)³) steps of an oblivious RAM(m*(log₂m)²). Every oblivious simulation of RAM(m) must make at least max{m, (t-1)*log₂m} accesses in order to simulate t steps.

Now we have the square-root algorithm to simulate the oblivious ram working.

For each accesses, randomly permute first m+ memory.
Check the shelter words first if we want to access a word.
If the word is there, access one of the dummy words. And if the word is not there, find the permuted location.

To access original RAM in t steps we need to simulate it with t + steps for the oblivious RAM. For each access, the cost would be O(*log m).

Another way to simulate is hierarchical algorithm. The basic idea is to consider the shelter memory as a buffer, and extend it to the multiple levels of buffers. For level I, there are 4ⁱ buckets and for each bucket has log t items. For each level there is a random selected hash function..

The operation will be like following: At first load program to the last level, which can be say has 4^t buckets. For reading, check the bucket h_i(V) from each level, If (V,X) is already found, pick a bucket randomly to access, and if it is not found, check the bucket h_i(V), there is only one real match and remaining are dummy entries . For writing, put (V,X) to the first level, and if the first I levels are full, move all I levels to I+1 levels and empty the first I levels.

The time cost for each level cost O(log t); cost for every access is O((log t)²); The cost of Hashing is O(t(log t)³).

Oblivious Tree

An Oblivious Tree is a rooted tree with the following property:

All the leaves are in the same level.

All the internal nodes have degree at most 3.

Only the nodes along the rightmost path in the tree may have degree of one.

The oblivious tree is a data structure similar to 2-3 Tree, but with the additional property of being oblivious. The rightmost path may have degree one and this can help to describe the update algorithms. Oblivious tree requires randomization to achieve a O(log(n)) running time for the update operations. And for two sequences of operations M and N acting to the tree, the output of the tree will have the same output probability distributions. For the tree, there are three operations:

CREATE (L):build a new tree storing the sequence of values L at its leaves.

INSERT (b, i,T): insert a new leaf node storing the value b as the i^th leaf of the tree T.

DELETE (i, T): remove the i^th leaf from T.

Step of Create: The list of nodes at the i^thlevel is obtained traversing the list of nodes at level i+1 from left to right and repeatedly doing the following:

Choose d {2, 3} uniformly at random.
If there are less than d nodes left at level i+1, set d equal to the number of nodes left.
Create a new node n at level I with the next d nodes at level i+1 as children and compute the size of n as the sum of the sizes of its children.
oblivious tree

For example, if the coin tosses of d {2, 3} has an outcome of: 2, 3, 2, 2, 2, 2, 3 will store the string “OBLIVION” as follow oblivious tree.

Both the INSERT (b, I, T) and DELETE(I, T) have the O(log n) expected running time. And for INSERT and DELETE we have:

INSERT (b, I, CREATE (L)) = CREATE (L [1] + …….., L[ i], b, L[i+1]………..)

DELETE (I, CREATE (L)) = CREATE (L[1]+ ………L[I - 1], L[i+1], ………..)

For example if we run CREATE (ABCDEFG) or INSERT (C, 2, CREATE (ABDEFG)), we will get the same probabilities of out come between these two operations.

Cache-Oblivious kd-tree

A kd-tree is used for answering the orthogonal range queries.

The kd-tree is a binary tree of height O(log₂N) with the N points into two subsets of equal size. On even levels of the tree the dividing lines are horizontal, and on odd levels of the tree the dividing lines are vertical. In this way rectangular region is associated with each node, and the nodes on any particular level of the tree partition the plane into disjoint regions. This structure will answers queries in O(log_B N+ T/B)memory transfers using O(N log₂²N) space.

In the RAM model, a kd-tree on N points can be constructed recursively in O (Nlog₂N) time; the root dividing line is found using an O(N) time median algorithm, the points are distributed into two sets according to this line in O(N) time, and the two sub trees are constructed recursively.

Cache-Oblivious Buffer Heap

Cache-oblivious buffer heap is a cache-oblivious priority queue that supports Delete, Delete-Min and Decrease-Key operation in O() (B is the size of a block; N is the number of items in the priority queue; M is the items in the memory) amortized cache-misses each. A buffer heap with N items consists of 1+log₂N levels.

Delete (x) operation deletes element x from queue if it exists.

Delete-Min () operation retrieves and deletes an element with the minimum key from the queue if it exists.

A Decrease-Key (x, k_x) operation inserts the element x with the key k_x into the queue if x does not already exist in the queue, other wise it replaces the smallest key k’_x of x in the queue with k_x provided k_x < k’_x, and deletes all the remaining keys of x in the queue.

Application of Buffer Heap:

Cache-oblivious Undirected single source shortest path (SSSP)

The algorithm incurs O() cache-misses for the O(m) priority queue operations it performs. In addition to that it incurs O() cache-misses for accessing O(n) adjacency lists.

Cache-oblivious Directed SSSP

Cache-aware Undirected APSP

Cache Oblivious Binary Search Tree

This data structure has a cache oblivious layout of static binary search trees permitting searches in the asymptotically optimal number of memory transfers. This data structure has an advantage is avoiding usage of the pointer. The basic idea of this data structure is to maintain a dynamic binary tree of height log n + O(1) using existing methods, embed this tree in a static binary tree.

The depth d(v) of a node v in a tree T is the number of nodes on the simple path from the node to the root. The height h(T) of T is the maximum depth of a node in T, and the size |T| of T is the number of nodes in T. A complete tree T is a tree with 2^h(T)-1 nodes.

There are four memory layouts for static trees: DFS, inorder, BFS and van Emde Boas layouts.

DFS layout: The nodes of T are stored in the order they are visited by a left-to-right depth first traversal of T.

Inorder layout: The nodes of T are stored in the order that they are visited by a left-to-right inorder traversal of T.

BFS layout: The nodes of T are stored in the order they are visited by a left-to-right breath first traversal of T.

Van Emde Boas layout: The layout is defined recursively. A tree with only one node is a single node record. If a tree T has two or more nodes, let H₀=h(T)/2 , let T₀ be the tree consisting of all nodes in T with depth at most H₀, and let T₁,……., T_k be the subtrees of T rooted at nodes with depth H₀+1, numbered from left to right. We will denote T₀ the top tree and T₁,……., T_k the bottom trees of the recursion_.

Using the structure to sort n elements, it will use (1 + )n times the element size of memory, and performs searches in worst case O(log_Bn) memory transfers, updates in amortized O((log² n)= (B)) memory transfers, and range queries in worst case O(log_B n+ k=B) memory transfers, where k is the size of the output.

Quickheaps

Qickheap is a simple and efficient data structure for implementing priority queue in main and secondary memory. Quickheaps enable efficient element insertion, minimum extraction, deletion of arbitrary elements and modification of the priority of elements within the heap. For a queue has m elements, it requires O(log m) extra integers.

To implement a quickheap we need these structures:

An array heap to store elements.

A stack S to store the positions of pivots partitioning heap.

An integer idx to indicate the first cell of the quickheap.

An integer capacity to indicate the size of heap.