Library sort

Library sort
Class	Sorting algorithm
Data structure	Array
Worst-case performance
Best-case performance
Average performance
Worst-case space complexity
Optimal	?

Library sort, or gapped insertion sort is a sorting algorithm that uses an insertion sort, but with gaps in the array to accelerate subsequent insertions. The name comes from an analogy:^[1]

Suppose a librarian were to store his books alphabetically on a long shelf, starting with the A's at the left end, and continuing to the right along the shelf with no spaces between the books until the end of the Z's. If the librarian acquired a new book that belongs to the B section, once he finds the correct space in the B section, he will have to move every book over, from the middle of the B's all the way down to the Z's in order to make room for the new book. This is an insertion sort. However, if he were to leave a space after every letter, as long as there was still space after B, he would only have to move a few books to make room for the new one. This is the basic principle of the Library Sort.

The algorithm was proposed by Michael A. Bender, Martín Farach-Colton, and Miguel Mosteiro in 2004^[2] and published 2006.^[3]

Like the insertion sort it is based on, library sort is a stable comparison sort and can be run as an online algorithm; however, it was shown to have a high probability of running in O(n log n) time (comparable to quicksort), rather than an insertion sort's O(n²). The mechanism used for this improvement is very similar to that of a skip list. There is no full implementation given in the paper, nor the exact algorithms of important parts, such as insertion and rebalancing. Further information would be needed to discuss how the library sort efficiency compares to other sorting methods in reality.

Compared to basic insertion sort, the drawback of library sort is that it requires extra space for the gaps. The amount and distribution of that space would be implementation dependent. In the paper the size of the needed array is (1 + ε)n^[3], but with no further recommendations on how to choose ε. One weakness of insertion sort is that it may require a high number of swap operations and be costly if memory write is expensive. Library sort may improve that somewhat in the insertion step, as fewer elements need to move to make room, but is also adding an extra cost in the rebalancing step.

Implementation

Algorithm

Let us say we have an array of n elements. We choose the gap we intend to give. Then we would have a final array of size (1 + ε)n. The algorithm works in log n rounds. In each round we insert as many elements as there are in the final array already, before re-balancing the array. For finding the position of inserting, we apply Binary Search in the final array and then swap the following elements till we hit an empty space. Once the round is over, we re-balance the final array by inserting spaces between each element.

Following are three important steps of the algorithm:

1. Binary Search: Finding the position of insertion by applying binary search within the already inserted elements. This can be done by linearly moving towards left or right side of the array if you hit a empty space in the middle element.

2. Insertion: Inserting the element in the position found and swapping the following elements by 1 position till an empty space is hit.

3. Re-Balancing: Inserting spaces between each pair of elements in the array. Here we have used a queue to accomplish this. This takes linear time, and because there are log n rounds in the algorithm, total re-balancing takes O(n log n) time only.

C-Implementation

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define swap(a,b) (a)=(b)+(a)-((b)=(a))
int last,queue[1000008];

//This routine rebalances the array..i.e. inserts the given number of spaces 
//in between numbers with the help of a queue
void balance(int f[], int e, int inserted)
{
	int top,bottom;
	top=bottom=0;
	int i=1,s=1,t;
	while(s<inserted)
	{
		t=0;
		while(t<e)
		{
			if(f[i]!=-1)
			{
				queue[bottom++]=f[i];
			}
			f[i++]=-1;
			t++;
		}
		if(f[i]!=-1)
			queue[bottom++]=f[i];
		f[i++]=queue[top++];
		s++;
	}
	last=i-1;
}

//This routine inserts the element in the position found
//by binary search and then swaps the positions of the following 
//elements till an empty space is hit
void insert(int f[], int element , int position)
{
	if(f[position]==-1)
	{
		f[position]=element;
		if(position>last)
			last=position;
	}
	else
	{
		int temp=element;
		swap(temp,f[position]);
		position++;
		while(f[position]!=-1)
		{
			swap(temp,f[position]);
			position++;
		}
		f[position]=temp;
		if(position>last)
			last=position;
	}
}

//This routine applies a binary search on the final array for
//finding the place where the new element will be inserted
void find_place(int f[], int element, int start, int end)
{
	int mid=start+((end-start)/2);
	if(start==end)
	{
		if(f[mid]==-1)
		{
			f[mid]=element;
			if(mid>last)
				last=mid;
			return;
		}
		else if(f[mid]<=element)
		{
			insert(f,element,mid+1);
			return;
		}
		else
		{
			insert(f,element,mid);
			return;
		}
	}
	int m=mid;
	while( m < end && f[m] == -1 )
		m++;
	if(m==end)
	{
		if(f[m]!=-1&&f[m]<=element)
			insert(f,element,m+1);
		else
			find_place(f,element,start,mid);
	}
	else if(m==start)
	{
		if(f[m]>element)
			insert(f,element,m);
		else
			find_place(f,element,m+1,end);
	}
	else
	{
		if(f[m]==element)
		{
			insert(f,element,m+1);
		}
		else if(f[m]>element)
		{
			find_place(f,element,start,m-1);
		}
		else
			find_place(f,element,m+1,end);
	}
}

//The main function :)
int main()
{
	int i,j,k,n,e;
	int *s,*f;
	scanf("%d",&n);                              //Scan the number of elements.
	s=(int *)malloc(sizeof(int)*n);
	for(i=0;i<n;i++)
		scanf("%d",&s[i]);
	scanf("%d",&e);                              // Choose the gap size.
	f=(int *)malloc((1+e)*n*sizeof(int));
	for(i=0;i<(1+e)*n;i++)
		f[i]=-1;
	f[0]=s[0];
	i=1;
	last=0;
	int inserted=1;
	while( inserted < n )
	{
		k=inserted;
		while(inserted < n && k--)
		{
			find_place(f,s[i],0,last);
			inserted++;
			i++;
		}
		balance(f,e,inserted);
	}
	for(i=0;i<(1+e)*n;i++)
		if(f[i]>=0)
			printf("%d ",f[i]);
	printf("\n");
	return 0;
}

Python Implementation

def library_sort(array, epsilon):

    def binary_search(array, element, start, end):        
        mid = start + ((end-start)//2)
        if start == end:
            if array[mid] is not None and array[mid] <= element: return mid + 1
            else: return mid
        else:
            m = mid
            while m < end and array[m] is None: m += 1
            if m == end:
                if array[m] is not None and array[m] <= element: return m + 1
                else: return binary_search(array, element, start, mid)
            elif m == start:
                if array[m] > element: return m
                else: return binary_search(array, element, m+1, end)
            else:
                if array[m] == element: return m + 1
                elif array[m] > element: return binary_search(array, element, start, m-1)
                else: return binary_search(array, element, m+1, end)            


    def insert(array, element, index):
        nonlocal last_insert_index
        if array[index] is None:
            array[index] = element
        else:
            while array[index] is not None:
                array[index], element = element, array[index]
                index += 1
            array[index] = element
            index += 1
        if index > last_insert_index:
                last_insert_index = index


    def balance(array, num_spaces, total_inserted):
        nonlocal last_insert_index
        queue = [None] * len(array)
        inserted = index = 1
        top = bottom = 0

        while inserted < total_inserted:
            spaces = 0
            while spaces < num_spaces:
                if array[index] is not None:
                    queue[bottom] = array[index]
                    bottom += 1
                array[index] = None
                index += 1; spaces += 1
            if array[index] is not None:
                queue[bottom] = array[index]
                bottom += 1
            array[index] = queue[top]
            index += 1; top += 1; inserted += 1
            
        last_insert_index = index - 1

    
    array_len = len(array)
    copy = [None] * (1 + epsilon) * array_len
    copy[0] = array[0]
    last_insert_index = 0
    inserted = index = 1

    while inserted < array_len:
        round_inserts = inserted

        while inserted < array_len and round_inserts > 0:
            insertion_index = binary_search(copy, array[index], 0, last_insert_index)
            insert(copy, array[index], insertion_index)
            round_inserts -= 1; inserted += 1; index += 1
        balance(copy, epsilon, inserted)

    return [x for x in copy if x is not None]

Analysis

The two graphs show the performance of library sort and insertion sort for the same inputs. It is quite clear that library sort takes O(n log n) time approximately while the insertion sort takes O(n²) time.

References

^ Budd, Timothy A., An Active Learning approach to Data Structures using C (PDF)
^ http://arxiv.org/abs/cs/0407003
^ ^a ^b Bender, M. A., Farach-Colton, M., and Mosteiro M. (2006). "Insertion Sort is O(n log n)". Theory of Computing Systems. 39 (3): 391. doi:10.1007/s00224-005-1237-z.{{cite journal}}: CS1 maint: multiple names: authors list (link)

External links

Gapped Insertion Sort

[1] Budd, Timothy A., An Active Learning approach to Data Structures using C (PDF)

[2] ttp://arxiv.org/abs/cs/0407003

[definition-3] Bender, M. A., Farach-Colton, M., and Mosteiro M. (2006). "Insertion Sort is O(n log n)". Theory of Computing Systems. 39 (3): 391. doi:10.1007/s00224-005-1237-z.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]

v t e Sorting algorithms
Theory	Computational complexity theory Big O notation Total order Lists Inplacement Stability Comparison sort Adaptive sort Sorting network Integer sorting X + Y sorting Transdichotomous model Quantum sort
Exchange sorts	Bubble sort Cocktail shaker sort Odd–even sort Comb sort Gnome sort Proportion extend sort Quicksort
Selection sorts	Selection sort Heapsort Smoothsort Cartesian tree sort Tournament sort Cycle sort Weak-heap sort
Insertion sorts	Insertion sort Shellsort Splaysort Tree sort Library sort Patience sorting
Merge sorts	Merge sort Cascade merge sort Oscillating merge sort Polyphase merge sort
Distribution sorts	American flag sort Bead sort Bucket sort Burstsort Counting sort Interpolation sort Pigeonhole sort Proxmap sort Radix sort Flashsort
Concurrent sorts	Bitonic sorter Batcher odd–even mergesort Pairwise sorting network Samplesort
Hybrid sorts	Block merge sort Introsort Kirkpatrick–Reisch sort Merge-insertion sort Powersort Timsort Spreadsort
Other	Topological sorting Pre-topological order Pancake sorting Spaghetti sort
Impractical sorts	Stooge sort Slowsort Bogosort