InsertionSort

From Wikiversity
Jump to navigation Jump to search

InsertionSort is a „natural“ sorting algorithm. In the simplest form it works with an input list and an already sorted list. If the sorted list is empty, the first record of the input list is moved to the sorted list. Then record after record is taken and its appropriate position in the sorted list is searched; all records behind this position are moved up by one position and the record is inserted, hence the name.

Variants[edit]

There are three main ways to find the appropriate position (we assume that the input records are read in original order):

  • Start to End
Starting with the first record, the key of the actual record is compared record by record with those of the sorted list; if a matching or bigger key is found, the insertion position is just in front of this element. The sorting is not stable, since sequences of identical key values are inverted.
  • End to Start
Starting with the last record, the key of the actual record is compared record by record with the keys in the sorted list; if a matching or smaller key is found, the insertion position is just behind this element. The sorting is stable, since the order of records with identical keys is the same as in the input array.
  • Binary Search
When searching in a sorted list, binary search is said to be the fastest solution to find a matching record or a position next to a matching key, if it existed. Normally InsertionSort with binary search is not stable. This can be changed by help of a little addition: if the key at the insertion position matches with the key of the input record, the last record with identical key is identified and the insertion position is just behind this element.

Timing[edit]

In order to make an estimate about the timing we asume to have n records in the sorted list and we are going to add one. If the input data are evenly distibuted, in average we will have

CO ≈ ½ n

compare operations in order to find the correct position. Then we will have

RM ≈ ½ n

record moves in order to get the correct position cleared for inserting the actual record. The time for this can be estimated with

Tn+1 = n * (CO + RM) = n * F

where F, CO and RM are constants. Since we allready did n insertions, we can estimate the total time for sorting a list as

Ttot = n * Tn+1 = n2 * F

The charactristic of InsertionSort is quadratic.

When comparing InsertionSort with other algorithms one should take into account that copying or moving records can be a very expensive operation. When using binary search instead of linear search for finding the appropriate position, the sorting speed is in maximum doubled what is near to nothing when working with an algorithm with a quadratic characteristic.

Example[edit]

For a short demonstration we start with this input list and the empty output list:

387 310 529 174 011 227 020 901 522 771
- - - - - - - - - -

When starting a sort, the first record can be moved directly to the output list

- 310 529 174 011 227 020 901 522 771
387 - - - - - - - - -

When now looking for the correct new position of the second record with the key value 310, this position is found directly in front of the first element of the sorted list.

- - 529 174 011 227 020 901 522 771
310 387 - - - - - - - -

The key value 529 is bigger than any of the sorted list, so this record goes to the last place.

- - - 174 011 227 020 901 522 771
310 387 529 - - - - - - -

The key value 174 is smaller than any of the sorted list, so this record goes to the first place.

- - - - 011 227 020 901 522 771
174 310 387 529 - - - - - -

The principle should be clear by now, so we stop the demonstration.

But we can already see a very interesting effect: The sorted list has the same length as the part of the input list, which has already been processed. Therefore it is not necessary to have distinct input and output lists. If they are combined, InsertionSort turns out to be "in place".