Soft heap
![]() | This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. (April 2023) |
In computer science, a soft heap is a variant on the simple heap data structure that has constant amortized time complexity for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a constant number of values in the heap.
Details
The constant time operations are:
- create(S): Create a new soft heap
- insert(S, x): Insert an element into a soft heap
- meld(S, S' ): Combine the contents of two soft heaps into one, destroying both
- delete(S, x): Delete an element from a soft heap
- findmin(S): Get the element with minimum key in the soft heap
Other heaps such as Fibonacci heaps achieve most of these bounds without any corruption, but cannot provide a constant-time bound on the critical delete operation. The amount of corruption can be controlled by the choice of a parameter ε, but the lower this is set, the more time insertions require (O(log 1/ε) for an error rate of ε).
More precisely, the guarantee offered by the soft heap is the following: for a fixed value ε between 0 and 1/2, at any point in time there will be at most ε*n corrupted keys in the heap, where n is the number of elements inserted so far. Note that this does not guarantee that only a fixed percentage of the keys currently in the heap are corrupted: in an unlucky sequence of insertions and deletions, it can happen that all elements in the heap will have corrupted keys. Similarly, we have no guarantee that in a sequence of elements extracted from the heap with findmin and delete, only a fixed percentage will have corrupted keys: in an unlucky scenario only corrupted elements are extracted from the heap.
The soft heap was designed by Bernard Chazelle in 2000. The term "corruption" in the structure is the result of what Chazelle called "carpooling" in a soft heap. Each node in the soft heap contains a linked list of keys and one common key. The common key is an upper bound on the values of the keys in the linked list. Once a key is added to the linked list, it is considered corrupted because its value is never again relevant in any of the soft heap operations: only the common keys are compared. This is what makes soft heaps "soft"; one cannot be sure whether any particular value put into it will be corrupted. The purpose of these corruptions is effectively to lower the information entropy of the data, enabling the data structure to break through information-theoretic barriers regarding heaps.
Applications
Despite their limitations and unpredictable nature, soft heaps are useful in the design of deterministic algorithms. They were used to achieve the best complexity to date for finding a minimum spanning tree. They can also be used to easily build an optimal selection algorithm, as well as near-sorting algorithms, which are algorithms that place every element near its final position, a situation in which insertion sort is fast.
A simple example is a selection algorithm, for find the th largest of a group of numbers:
- Initialize a soft heap with error rate , allowing at most 33% of its inserted keys to be corrupted.
- Insert all elements into the heap.
- Repeat times: perform a findmin operation and delete the key that it returns.
- Let be the deleted element whose correct key is largest.
- Compare to all of the elements, partition them into the subset less than or equal to and the subset greater than , and recursively call the same selection algorithm in the subset that contains the th largest.
In analyzing this algorithm, call the originally given values the "correct" keys, and the values stored in the heap the "stored" keys. The stored key of is possibly larger than its correct key (if was corrupted), and even this larger value is smaller than all the stored keys of the remaining elements in the heap (as we were removing minimums), including at least uncorrupted elements. Thus, at least these remaining uncorrupted elements are greater than . On the other hand, the set of elements less than or equal to includes all of the deleted elements, and so also has size at least . Thus, divides the elements somewhere between 33%/66% and 66%/33%. Because each level of recursion reduces the problem size by a constant factor, the total time of the algorithm can be bounded by a geometric series, showing that it is .
References
- Chazelle, Bernard (November 2000). "The soft heap: an approximate priority queue with optimal error rate" (PDF). J. ACM. 47 (6): 1012–1027. CiteSeerX 10.1.1.5.9705. doi:10.1145/355541.355554. S2CID 12556140.
- Kaplan, Haim; Zwick, Uri (2009). "A simpler implementation and analysis of Chazelle's soft heaps". Proceedings of the Nineteenth Annual ACM–SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics. pp. 477–485. CiteSeerX 10.1.1.215.6250. doi:10.1137/1.9781611973068.53. ISBN 978-0-89871-680-1.