Multi-key quicksort

Three-way radix quicksort, also known as multi-key quicksort,^[1] is an algorithm for sorting strings. This hybrid of quicksort and radix sort was originally suggested by P. Shackleton and reported in C.A.R. Hoare's seminal paper on quicksort; its modern incarnation was developed by Jon Bentley and Robert Sedgewick in the mid-1990s.^[2] The algorithm is designed to exploit the property that in many problems, strings tend to have shared prefixes.

Description

The three-way radix quicksort algorithm sorts an array of $N$ (pointers to) strings in lexicographic order. It is assumed that all strings are of equal length $K$ ; if the strings are of varying length, they must be padded with extra elements that are less-than any element in the strings.^{[note 1]} The pseudocode for the algorithm is then^{[note 2]}

function sort(a : array of string, d : integer):
    if length(a) ≤ 1 or d ≥ K:
        return
    p ← pivot(a, d)
    i, j ← partition(a, d, p)
    sort(a[0..i], d)
    sort(a[i..j], d+1)
    sort(a[j..length(a)], d)

The pivot function must return a single character. Bentley and Sedgewick suggest either picking the median of a[0][d], ..., a[length(a)−1][d] or some random character in that range.^[2] The partition function is a variant of the one used in ordinary three-way quicksort: it rearranges a so that all of a[0], ..., a[i−1] have an element at position d that is less than p, a[i], ..., a[j−1] have $p$ at position $d$ , and strings from $j$ onward have a d'th element larger than $p$ .

Practical implementations of multi-key quicksort can benefit from the same optimizations typically applied to quicksort: median-of-three pivoting, switching to insertion sort for small arrays, etc.^[3]

Notes

^ One way to do so without altering the in-memory representation of the strings is to index them using a function that returns −1 or some other small value when the index is out of range.
^ Arrays and strings are zero-indexed. Array slicing is assumed to be a non-copying, constant-time operation.

References

^ This article incorporates public domain material from Paul E. Black. "multikey Quicksort". Dictionary of Algorithms and Data Structures. NIST.
^ ^a ^b Bentley, Jon; Sedgewick, Robert (1997). Fast algorithms for sorting and searching strings (PDF). Proc. Annual ACM-SIAM Symp. on Discrete Algorithms (SODA). ISBN 0-89871-390-0.
^ Bentley, Jon; Sedgewick, Robert (1998). "Sorting Strings with Three-Way Radix Quicksort". Dr. Dobb's Journal.

[3] One way to do so without altering the in-memory representation of the strings is to index them using a function that returns −1 or some other small value when the index is out of range.

[4] Arrays and strings are zero-indexed. Array slicing is assumed to be a non-copying, constant-time operation.

[1] This article incorporates public domain material from Paul E. Black. "multikey Quicksort". Dictionary of Algorithms and Data Structures. NIST.

[soda-2] Bentley, Jon; Sedgewick, Robert (1997). Fast algorithms for sorting and searching strings (PDF). Proc. Annual ACM-SIAM Symp. on Discrete Algorithms (SODA). ISBN 0-89871-390-0.

[5] Bentley, Jon; Sedgewick, Robert (1998). "Sorting Strings with Three-Way Radix Quicksort". Dr. Dobb's Journal.

[1]

[2]

[note 1]

[note 2]

[3]