Jump to content

Bubble sort

From Wikiversity

The time necessary for a sort with BubbleSort increases in a quadratic fashion with the number of records. The algorithm works in place and stable. In the best case, when the data records are already sorted, the time increases in a linear fashion with the number of records.

Short Description

[edit | edit source]

The main idea behind BubbleSort is to find the biggest key value and to move that record to the end of the list. This moving is done by comparing the key value of the actual record with the key value of the next record. If the key value of the next record is smaller, the two records are swapped; if the value is bigger, the candidate for having the biggest key value is from then on the next record.

The algorithm works with an outer and an inner loop. After the first repetition of the outer loop the record with the biggest key value is definitely on top of the list. When searching for the second biggest key value, it is not necessary to compare it with the last element, since we already know the result of such a compare; the number of repetitions of the inner loop is decreased by one with every repetition of the outer loop.

Since every record with the actually biggest key value is moved up, it is normally not just one record that moves upwards during a repetition of the outer loop. This is compared with the bubbles in a drink and that's where the algorithm got its name from.

Stability

[edit | edit source]

If there are records with identical key values and they become neighbors during the sort, the upper record becomes the new candidate for having the biggest key value. If the algorithm is realized with a bigger- and not a bigger-or-same- clause, it is stable, because records with identical key value do not change their position relative to each other.

Timing

[edit | edit source]

For an estimate of the runtime we visualize the number of compare operations. In the vertical direction we have the repetitions of the outer loop and in the horizontal direction the number of compare operations.

   1  2  3  4  5  6  7  8  9 10   |   compares
   x  x  x  x  x  x  x  x  x  -   |      9
   x  x  x  x  x  x  x  x  -  -   |      8
   x  x  x  x  x  x  x  -  -  -   |      7
   x  x  x  x  x  x  -  -  -  -   |      6
   x  x  x  x  x  -  -  -  -  -   |      5
   x  x  x  x  -  -  -  -  -  -   |      4
   x  x  x  -  -  -  -  -  -  -   |      3
   x  x  -  -  -  -  -  -  -  -   |      2
   x  -  -  -  -  -  -  -  -  -   |      1
   -  -  -  -  -  -  -  -  -  -   |      0

Half of the n*n matrix is filled except for the diagonal; that makes ½ n2 - n compares. Since we don't need the exact number but the rough behavior, we calculate the number of compare operations with

NC = ½ n2

Worst Case & Normal Case

[edit | edit source]

In order to get the worst case concerning the swap operations, we assume that the list is sorted in reverse order. Then every compare operation is followed by a swap.

NS = ½ n2

Every compare- and swap-operation needs time. We evaluate the time for a complete sort with

TSort = ½ * C * n2 + ½ * S * n2
TSort = n2 * (½ C + ½ * S)
TSort = B * n2

where B, C and S are constants and n the number of records to be sorted.

With an even distribution of the key values the calculation of the necessary number of steps becomes a bit complicated and the correct evaluation can be found here [1]. It increases with ¼ n2. The time for a sort will be in average twice as fast as for the worst case. But the timing still follows the quadratic characteristic, only the actual value of B changes.

Termination: The loop terminates at j=i+1. At that point A[i]  A[i+1], A[i]  A[i+2], …, A[i]  A[n]. So, A[i] is the smallest element between A[i] through A[n]. QED

Theorem: BubbleSort correctly sorts the input array A.

Loop invariant for lines 1 through 4 is that A[i]  A[i+1].

Initialization: Starts with A[1].

Maintenance: After the iteration i=p, for some 1<pn, the above lemma for lines 2 through 4 guarantees that A[p] is the smallest element from A[p] through A[n]. So, A[p]  A[p+1].

Termination: The loop 1 through 4 terminates at i=n. At that point A[1]  A[2]  . . .  A[n]. Hence, the correctness theorem is proved. QED

Complexity:  i=1n  j=i+1n 1 =  i=1n (n –i) = n2 – n(n+1)/2 = n2/2 – n/2 = (n2)

Best Case

[edit | edit source]

Now we come to the best case: the records are already sorted. Without any additions to the algorithm we still have the same number of compare operations, but no swaps. This information about "no swaps" can be used to speed up the sort when the records are sorted or nearly sorted. At the beginning of the outer loop we set a marker to NO_SWAPS_DONE. In the inner loop this value is set to SWAPS_DONE in case any records were swapped. At the end of the outer loop we check the value of this marker. If there were no swaps, the records are sorted and the algorithm can terminate. We have a best case behavior of

TBest = n * C

Nearly Sorted

[edit | edit source]

Now we have a look at a nearly sorted situation by adding a new record at the end of the list; and the key value is the smallest of all records. The number of compare operations will still be

NC = ½ n2

and the number of swaps will be

NS = n

Compared with algorithms like InsertionSort the timewise behavior is pretty bad.

Example

[edit | edit source]

In the list to be sorted are 10 records. A blue background indicates that there has to be a swap of data records. A green background indicates, that the upper record becomes the new candidate for the biggest key value.


First repetition of outer loop

387 310 529 174 011 227 020 901 522 771
310 387 529 174 011 227 020 901 522 771
310 387 529 174 011 227 020 901 522 771
310 387 174 529 011 227 020 901 522 771
310 387 174 011 529 227 020 901 522 771
310 387 174 011 227 529 020 901 522 771
310 387 174 011 227 020 529 901 522 771
310 387 174 011 227 020 529 901 522 771
310 387 174 011 227 020 529 522 901 771
310 387 174 011 227 020 529 522 771 901


Second repetition of outer loop

310 387 174 011 227 020 529 522 771 901
310 387 174 011 227 020 529 522 771 901
310 174 387 011 227 020 529 522 771 901
310 174 011 387 227 020 529 522 771 901
310 174 011 227 387 020 529 522 771 901
310 174 011 227 020 387 529 522 771 901
310 174 011 227 020 387 529 522 771 901
310 174 011 227 020 387 522 529 771 901
310 174 011 227 020 387 522 529 771 901


Third repetition of outer loop

310 174 011 227 020 387 522 529 771 901
174 310 011 227 020 387 522 529 771 901
174 011 310 227 020 387 522 529 771 901
174 011 227 310 020 387 522 529 771 901
174 011 227 020 310 387 522 529 771 901
174 011 227 020 310 387 522 529 771 901
174 011 227 020 310 387 522 529 771 901
174 011 227 020 310 387 522 529 771 901


Fourth repetition of outer loop

174 011 227 020 310 387 522 529 771 901
011 174 227 020 310 387 522 529 771 901
011 174 227 020 310 387 522 529 771 901
011 174 020 227 310 387 522 529 771 901
011 174 020 227 310 387 522 529 771 901
011 174 020 227 310 387 522 529 771 901
011 174 020 227 310 387 522 529 771 901


Fifth repetition of outer loop

011 174 020 227 310 387 522 529 771 901
011 174 020 227 310 387 522 529 771 901
011 020 174 227 310 387 522 529 771 901
011 020 174 227 310 387 522 529 771 901
011 020 174 227 310 387 522 529 771 901
011 020 174 227 310 387 522 529 771 901


Sixth repetition of outer loop

011 020 174 227 310 387 522 529 771 901
011 020 174 227 310 387 522 529 771 901
011 020 174 227 310 387 522 529 771 901
011 020 174 227 310 387 522 529 771 901
011 020 174 227 310 387 522 529 771 901


The algorithm terminates after this loop due to the fact that there were no swaps done.


  1. Knuth, Donald E.: The Art of Computer Programming, Vol. 3: Sorting and Searching, 2nd Edition, Addison-Wesley, 1981