QuickSort

From Wikiversity
Jump to navigation Jump to search

Quicksort belongs to the group of dynamically dividing algorithms and works with recursions. It is not stable.

The main idea behind Quicksort is to divide a group of data records into two subgroups in such a way that all the elements in the lower subgroup are smaller than those of the upper subgroup. In order to do this division, a pivot element is selected. In the original version of the algorithm this was in a fixed way either the first or the last element. The best choice for a pivot is a value which sorts half of the records to each subgroup. Without having a look at all the key values a good selection is not possible. And if the pivot is selected just by chance, any choice is as good as all the others; and selecting the first or last element makes the implementation of the algorithm a little bit easier.

We assume that we use the last element as the pivot. We start at the beginning of the list, until we find an element which is bigger than the pivot and remember it. Then we start at the end of the list and search for the first element which is smaller than the pivot. These records are then swapped. Then, beginning from the remembered positions, again the first bigger and the first smaller element are looked up and swapped. This is repeated until the two pointer meet in the middle. At the end of a split operation it is checked whether the pivot itself has to be moved.

Then the algorithm takes both subgroups and calls itself for the next division operation. If a subgroup contains only one or no element, there is no further division possible and the process returns to the calling level. Since both subgroups are sorted and all the elements of the lower subgroup were smaller than those of the upper subgroup, all the elements of the returned group are sorted without any more handling.

Timing[edit | edit source]

On each level the records are divided into two subgroups. With n data records we get

L = lg2(n)

levels. For every subgroup on any level all the records have to be compared with the pivot. On any level the number of compare operations is n. With an even distribution the number of swaps will be ½ n. The time for a sort can be estimated with

T = C * n * lg2(n) + D * ½ * n * lg2(n)
T = (C + D * ½) * n * lg2(n)
T = E * n * lg2(n)

where E is a constant.

If the input shall be sorted ascending, the list is already sorted descending, and the last element is selected as a pivot, all elements but one will go into one of the sublists. The number of levels will then be n and not lg2(n). The timing can then be aproximated as a worst case by

T = E * n2

Variations[edit | edit source]

In order to make sure that the worst case will not happen, there are some strategies. The simplest is to check the last, the first and the element in the middle and to select the one that is nearer to the average of the three. With little effort the chance to make a bad selection can be lowered significantly.

Another possibility for improvement is to select the pivot just by chance. With big numbers of data records this method makes it improbable to run into a worst case.

Another possibility is to calculate the average of the key values and to select the key as a pivot which has the smallest distance to the average. We have to sum up all the values in order to calculate the average which makes n operations. Then all key values are compared with the average and if a key is found with a smaller distance to the average the candidate is changed. For this again n operations are needed. The effort for the hole selection increases in a linear fashion with the number of records. This method does not guarantee that both subgroups will have the same number of records, but with evenly distributed key values the selection will be pretty good.

The search for an optimal pivot, for example by generating histograms, causes a much higher effort.

Example[edit | edit source]

The pivot-element is shown in red and the sublists in green and yellow. We show the variant where the last element of a domain is selected as pivot.

Step 1: The pivot is 771.

387 310 529 174 011 227 020 901 522 771

The first element bigger than the pivot is 901. The last element smaller than the pivot is 522. These elements are swapped.

387 310 529 174 011 227 020 522 901 771

Step 2: Now we handle the lower subgroup of step 1. The pivot is 522; the first element bigger is 529 and the last element smaller is 020. The two swap. No more swaps possible.

387 310 529 174 011 227 020 522
387 310 020 174 011 227 529 522

Step 3: Now the lower subgroup of step 2. The pivot is 227. The first bigger value is 387; the last smaller value is 011; the two swap. Now the first bigger value is 310 and the last smaller value is 174; the two swap.

387 310 020 174 011 227
011 174 020 310 387 227

Step 4: The lower subgroup of step 3 is divided now. The pivot is 20. The first bigger value is 174 but there is no last smaller value between 174 and the pivot: therefor the pivot and 174 swap.

011 174 020
011 020 174

Step 5: The upper subgroup can not be divided. Now the lower subgroup is divided but there will be no further subdivisions

011 020

Result 1: The sorted lists of step 4 and step 5 are joined

011 020 174

Step 6: We continue with the upper subgroup of step 3. The pivot is 227; the first bigger value is 310 but there is no last smaller value between the pivot and 310; therefor they swap.

310 337 227
227 337 310

Step 7: we sort the upper subgroup of step 6

310 337

Result 2: The sorted lists of step 6 and step 7 are joined

011 020 174 227 310 337

Step 8: Now we come to the upper subgroup of step 2. There is no smaller last element therefor the two records swap.

529 522
522 529

Result 3: Result 2 and Step 8 are joined

011 020 174 227 310 337 522 529

Step 9: We sort the upper subgroup of step 1. The pivot is 771; since there is no last smaller element, the two records are swapped.

901 771
771 901

Result: The data of step 9 are added.

011 020 174 227 310 337 522 529 771 901