Bucket Sort

Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, and is a cousin of radix sort in the most to least significant digit flavour. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity estimates involve the number of buckets.

Bucket sort works as follows:

  1. Set up an array of initially empty “buckets”.
  2. Scatter: Go over the original array, putting each object in its bucket.
  3. Sort each non-empty bucket.
  4. Gather: Visit the buckets in order and put all elements back into the original array.

Following diagram demonstrates working of bucket sort.


bucketSort(arr[], n)
1) Create n empty buckets (Or lists).
2) Do following for every array element arr[i].
.......a) Insert arr[i] into bucket[n*array[i]]
3) Sort individual buckets using insertion sort.
4) Concatenate all sorted buckets.


//#include <algorithm>
//#include <vector>
// Function to sort arr[] of size n using bucket sort
void bucketSort(float arr[], int n)
    // 1) Create n empty buckets
    vector b[n];
    // 2) Put array elements in different buckets
    for (int i=0; i<n; i++)
       int bi = n*arr[i]; // Index in bucket
    // 3) Sort individual buckets
    for (int i=0; i<n; i++)
       sort(b[i].begin(), b[i].end());
    // 4) Concatenate all buckets into arr[]
    int index = 0;
    for (int i = 0; i < n; i++)
        for (int j = 0; j < b[i].size(); j++)
          arr[index++] = b[i][j];


If we assume that insertion in a bucket takes O(1) time then steps 1 and 2 of the above algorithm clearly take O(n) time. The O(1) is easily possible if we use a linked list to represent a bucket (In the above code, C++ vector is used for simplicity). Step 4 also takes O(n) time as there will be n items in all buckets.
The main step to analyze is step 3. This step also takes O(n) time on average if all numbers are uniformly distributed.

Auxiliary Space: O(n.k)
Sorting In Place: No
Stable: Yes

Bucket sort with two buckets is effectively a version of quick sort where the pivot value is always selected to be the middle value of the value range. While this choice is effective for uniformly distributed inputs, other means of choosing the pivot in quick sort such as randomly selected pivots make it more resistant to clustering in the input distribution.

The n-way merge sort algorithm also begins by distributing the list into n sublists and sorting each one; however, the sublists created by merge sort have overlapping value ranges and so cannot be recombined by simple concatenation as in bucket sort. Instead, they must be interleaved by a merge algorithm. However, this added expense is counterbalanced by the simpler scatter phase and the ability to ensure that each sublist is the same size, providing a good worst-case time bound.

Top-down radix sort can be seen as a special case of bucket sort where both the range of values and the number of buckets is constrained to be a power of two. Consequently, each bucket’s size is also a power of two, and the procedure can be applied recursively. This approach can accelerate the scatter phase, since we only need to examine a prefix of the bit representation of each element to determine its bucket.

Source: Wikipedia