BucketSort

In BucketSort we have an input list and a number of buckets. In order to distribute the data records to the buckets we need a function which tells, into which bucket an actual record has to go. And this function must comply with the following relation:

a	<	b	<	c
f(a)	<	f(b)	<	f(c)

Having m buckets, the buckets are numbered and the function returns an integer between 1 and m. Normally, after a sort, in every bucket will be records with different key values. Therefor BucketSort in itself is normally not a complete sorting algorithm; with one exception: there are as many buckets as there are possible key values. If we want to sort by unsigned 16-bit-integers and we put up 65.536 buckets, then the sort can be done in one go. Any key is inspected only once; thus the time for a sort increases in a linear fashion with the number of records.

The algorithm is stable and the time for a sort does not depend on the key values.

Timing

Normally, after one distribution, there will be records with different key values in the same bucket; it is necessary to sort them internally. For this internal sort BucketSort can be used recursively, but then a different function for selecting the buckets is necessary (different key values!). Depending on this function, the behavior of BucketSort changes completely.

If we are sorting by a positive 8-bit number, put up 256 buckets, and take the value of the key as the number of the destination bucket, the algorithm is identical with ExtraDix. Hence the timing is

T = E * n * lg₂₅₆ (N) = E * n

where E is a constant reflecting the implementation. The time increases in a linear fashion with the number of records.

We use only two buckets and a pivot for the distribution function. If the key value is smaller or equal to the pivot, the record goes into the first bucket or else into the second bucket. Then we call the function recursively. The result is that BucketSort converts into Quicksort. Hence the timing is

T = C * n * lg₂ (n)

where C is a constant reflecting the implementation.

We use ten buckets. The function to determine the bucket number converts the key value into a decimal number and depending on the repetitions the corresponding digit (starting with the most significant digit) is selected; the digit gives directly the number of the bucket where the record shall go. BucketSort behaves like an inverted RadixSort and the timing is

T = D * n * lg₁₀ (N)

where D is a constant reflecting the implementation.

We use ten buckets. The function to determine the bucket number converts the key value into a decimal number and depending on the repetitions the corresponding digit (starting with the least significant digit) is selected; the digit gives directly the number of the bucket where the record shall go. After distributing all records to the buckets, the records are recollected and the sort is repeated for the next digit position. BucketSort now behaves exactly like RadixSort and the timing is

T = E * n * lg₁₀ (N)

where E is a constant reflecting the implementation.

The normal way how BucketSort is used recursively is to use a fixed number m of buckets on any level. The keys of all records are checked in order to find the smallest value and the biggest one. Then the difference between the smallest and biggest value is divided evenly into m intervals and for each interval a start value is calculated. The distribution function is just a nested sequence of if-statements; if the actual key value is bigger than x put the record into bucket m, if the actual key value is bigger than y put the record into bucket m-1 and so on. Now the content of a bucket is recursively distributet to m sub-buckets; therefor the timing is

T = F * n * lg_m (N)

where F is a constant reflecting the implementation.

Example

For the demonstration we use the variant simulating Quicksort with two buckets. The number of necessary division levels is calculated dynamically by help of the average of the key values. All records with a value smaller or equal to the average go into the first bucket and all others into the second bucket. Buckets with sorted content are displayed in orange.

Input List: average = 385

387	310	529	174	011	227	020	901	522	771

Bucket 1: average = 148

310	174	011	227	020

Bucket 2: average = 622

					387	529	901	522	771

We continue with bucket 1. The average is 148; therefor all records with a value smaller or equal to 148 go into bucket three and all others into bucket four.

Bucket 3: average = 15

011	020

Bucket 4: average = 237

		310	174	227

We continue with bucket 3. The average is 15; therefor all records with a value smaller or equal to 15 go into bucket five and all others into bucket six.

Bucket 5:

011

Bucket 6:

	020

We now recombine Bucket 5 and 6 to form the sorted bucket 3:

Sorted Bucket 3:

011	020

We continue with bucket 4. The average is 237; therefor all records with a value smaller or equal to 237 go into bucket seven and all others into bucket eight.

Bucket 7: average = 200

		174	227

Bucket 8:

				310

We continue with bucket 7. The average is 200; therefor all records with a value smaller or equal to 200 go into bucket nine and all others into bucket ten.

Bucket 9:

		174

Bucket 10:

			227

We can recombine them to form the sorted bucket 7:

Sorted Bucket 7:

		174	227

Bucket 8 has only one element and is sorted by definition. We can recombine it with bucket 7 to build sorted bucket 4:

Sorted Bucket 4:

		174	227	310

We can now recombine bucket 3 and bucket 4 to form the sorted bucket 1:

Sorted Bucket 1:

011	020	174	227	310

We now make a shortcut and get directly the sorted bucket 2:

Sorted Bucket 2:

					387	522	529	771	901

The Sorted List is build by recombining bucket 1 and 2:

011	020	174	227	310	387	522	529	771	901

It is obvious from the relative position of the buckets to each other that BucketSort can easily be realized as an in place algorithm.