Heap Sort

Heap sort is a comparison based sorting technique based on Binary Heap data structure. It is similar to selection sort where we first find the maximum element and place the maximum element at the end. We repeat the same process for remaining element.

What is Binary Heap?
Let us first define a Complete Binary Tree. A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.

A Binary Heap is a Complete Binary Tree where items are stored in a special order such that value in a parent node is greater(or smaller) than the values in its two children nodes. The former is called as max heap and the latter is called min heap. The heap can be represented by binary tree or array.

Why array based representation for Binary Heap?
Since a Binary Heap is a Complete Binary Tree, it can be easily represented as array and array based representation is space efficient. If the parent node is stored at index I, the left child can be calculated by 2 * I + 1 and right child by 2 * I + 2.

Heap Sort Algorithm for sorting in increasing order:
1. Build a max heap from the input data.
2. At this point, the largest item is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of heap by 1. Finally, heapify the root of tree.
3. Repeat above steps until size of heap is greater than 1.

How to build the heap?
Heapify procedure can be applied to a node only if its children nodes are heapified. So the heapification must be performed in the bottom up order.

Lets understand with the help of an example:

From the below animation it can be seen that Heap sort works fastest for a few uniquely sorted array of elements. The time required to sort the elements increases in the order: reversed, random and then nearly sorted.

What are random, nearly sorted, reversed and few unique elements?

  1. Random: {6, 9, 5, 8, 7, 4, 2, 1, 3}
    Here all the elements are arranged in a random fashion.
  2. Nearly sorted: {5, 6, 7, 8, 9, 1, 2, 3, 4}
    The array is nearly sorted. For example, the above array if divided into two, are sorted individually.
  3. Reversed: {9, 8, 7, 6, 5, 4, 3, 2, 1}
    The array elements are reverse sorted and we are supported to sort them in ascending order.
  4. Few Unique: {1, 1, 2, 9, 4, 3, 2, 5, 9, 6}
    The array contains duplicate entries.
Random Nearly Sorted Reversed Few Unique

Key

  1. Black values are sorted.
  2. Gray values are unsorted.
  3. A red triangle marks the algorithm position.


Implementation

// A heap has current size and array of elements
struct MaxHeap
{
    int size;
    int* array;
};
 
/* The main function to heapify a Max Heap. The function assumes that everything under given root (element at index idx) is already heapified */
void maxHeapify(struct MaxHeap* maxHeap, int idx)
{
    int largest = idx;  // Initialize largest as root
    int left = (idx << 1) + 1;  // left = 2*idx + 1
    int right = (idx + 1) << 1; // right = 2*idx + 2
 
    // See if left child of root exists and is greater than root
    if (left < maxHeap->size && maxHeap->array[left] > maxHeap->array[largest])
        largest = left;
 
    // See if right child of root exists and is greater than the largest so far
    if (right < maxHeap->size && maxHeap->array[right] > maxHeap->array[largest])
        largest = right;
 
    // Change root, if needed
    if (largest != idx)
    {
        swap(&maxHeap->array[largest], &maxHeap->array[idx]);
        maxHeapify(maxHeap, largest);
    }
}
 
// A utility function to create a max heap of given capacity
struct MaxHeap* createAndBuildHeap(int *array, int size)
{
    int i;
    struct MaxHeap* maxHeap = (struct MaxHeap*) malloc(sizeof(struct MaxHeap));
    maxHeap->size = size;   // initialize size of heap
    maxHeap->array = array; // Assign address of first element of array
 
    /* Start from bottom most and rightmost internal mode and heapify all internal modes in bottom up way */
    for (i = (maxHeap->size - 2) / 2; i >= 0; --i)
        maxHeapify(maxHeap, i);
    return maxHeap;
}
 
// The main function to sort an array of given size
void heapSort(int* array, int size)
{
    // Build a heap from the input data.
    struct MaxHeap* maxHeap = createAndBuildHeap(array, size);
 
    /* Repeat following steps while heap size is greater than 1. The last element in max heap will be the minimum element */
    while (maxHeap->size > 1)
    {
       /* The largest item in Heap is stored at the root. Replace it with the last item of the heap followed by reducing the size of heap by 1.*/
        swap(&maxHeap->array[0], &maxHeap->array[maxHeap->size - 1]);
        --maxHeap->size;  // Reduce heap size
 
        // Finally, heapify the root of tree.
        maxHeapify(maxHeap, 0);
    }
}

Analysis

Time complexity of heapify is O(Logn). Time complexity of createAndBuildHeap() is O(n) and overall time complexity of Heap Sort is O(nLogn). 

Heap sort primarily competes with quick sort which is typically somewhat faster due to some factors, but the worst-case running time for quick sort is O(n2), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. Thus, because of the O(n log n) upper bound on heap sort’s running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heap sort.

Heap sort also competes with merge sort, which has the same time bounds. Merge sort requires Ω(n) auxiliary space, but heap sort requires only a constant amount. Heap sort typically runs faster in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort:

  • Merge sort on arrays has considerably better data cache performance, often outperforming heap sort on modern desktop computers because merge sort frequently accesses contiguous memory locations (good locality of reference); heap sort references are spread throughout the heap.
  • Heap sort is not a stable sort; merge sort is stable.
  • Merge sort can be adapted to operate on singly linked lists with O(1) extra space. Heap sort can be adapted to operate on doubly linked lists with only O(1) extra space overhead.

Auxiliary Space: O(1)
Sorting In Place: Yes
Stable: No

Heap sort algorithm has limited uses because Quick sort and Merge sort are better in practice. Nevertheless, the Heap data structure itself is enormously used.

Source: Wikipedia, GeeksForGeeks