# Averages

Subject classification: this is a mathematics resource. |

Subject classification: this is a statistics resource. |

Given any collection of real numbers, an **average** is a single number intended to give an estimate of the general magnitude of the numbers. Formally, it is a function from a set of *n* numbers to a single number with the following properties:

- If all the numbers are equal, their average should also equal this value: AV(x, x, x, ...) = x.
- The average must not exceed the maximum of the numbers nor be less than their minimum. We may wish to be stricter and say that if not all the numbers are equal, it must be greater than the minimum and less than the maximum, but this would rule out the
*median*as an average, since the median of say (1,1,1,1,2) is 1. - The average must be multiplicatively linear, i.e. if all numbers are multiplied by the same constant
*k*their average will be multiplied by the same number: AV(kx, ky) = k.AV(x, y). - The average must be order invariant; if we permute the numbers, it will not change their average: AV(y, x) = AV(x, y). This rules out just picking the first number, or the average of the first and last, or other weighted averages.
- The average must be monotonic; if any one number increases (the others being unchanged), the average must not decrease, and vice versa. This rules out some "robust measures", where outliers are rejected before the average is taken. We may wish to be stricter and say that if any number increases, so must the average. Again, this would rule out the median as an average, since the median of say (1,1,1,1,2) and (1,1,1,2,2) are both 1.

It might be supposed that the average should be translation invariant, so that if all numbers are increased by the same constant *k* their average will increase by the same number: AV(x+k,y+k) = AV(x,y)+k. However, it can be shown that there is only one average meeting this and the other requirements strictly: the arithmetic mean discussed below. If the less strict versions of the requirements are used, the median and other quantiles would meet all the requirements.

It might also be supposed that the average should be a continuous function of the numbers. Again, this would rule out quantiles.

## The arithmetic mean[edit | edit source]

The simplest average is the *arithmetic mean*, defined as the sum of the numbers divided by the number of numbers. Thus the average of {1,2,3,4,5) is (1+2+3+4+5)/5 = 3.

*Exercise:* Verify that this meets all the conditions above, including the stricter versions.

## The geometric mean[edit | edit source]

Another common average is the *geometric mean*, obtained by multiplying all the numbers together and, if there are *n* numbers, taking the *nth* root. Thus the geometric mean of {1,2,3,4,5) is (1x2x3x4x5)^{1/5} = 2.605 (approximately).

Note that the geometric mean should not be used if any of the numbers is negative (why?) and is zero if any of the numbers is zero, no matter how large the other numbers are (why?).

*Exercise:* Verify that the geometric mean meets all the conditions above, including the stricter versions.

## The arithmetic/geometric inequality[edit | edit source]

Unless all the numbers are equal, the geometric mean is always less than the arithmetic mean.

This is easily proved for just two numbers and three numbers; the outlines of the proof are:

If *a* and *b* are two unequal non-negative numbers, then

This can be rearranged as

If *a*, *b* and *c* are three non-negative numbers, not all equal, then

So

This can be rearranged as

For four numbers, take the numbers in pairs and apply the already proven result for two numbers.

## Quantifying the arithmetic/geometric inequality[edit | edit source]

How can we assess whether the geometric mean is close to the arithmetic mean, or substantially less?

Given a set of n numbers {x_{1} ... x_{n}}, let their arithmetic mean be m and

- so

Then the sum of the y_{i} is zero. (Why?) We assume that the x_{i} do not diverge too much from their mean, so the y_{i} are fairly small numbers and we can expand log (1+y) in a power series.

- (since Σ y
_{i}is zero). Thus

- (Why?)

In other words, the greater the dispersion of the numbers about their arithmetic mean, the greater the difference between the two means.

*Exercise:* Some approximations have been made in deriving this result. Demonstrate by actual calculations that the result is true in general.

## Root mean square[edit | edit source]

The **RMS** is the square root of the arithmetic mean of the squares of a collection of numbers, i.e.

This average should not be used for a mixture of positive and negative numbers (why?).

*Exercise:* Verify that this meets all the conditions above, including the stricter versions.

Unless all the numbers are equal, the RMS is always greater than the arithmetic mean. This is easily proved for just two numbers by considering the square of the RMS and of the arithmetic mean:

A similar but more complex proof will work for any number of numbers. With a slight extension of the proof, it may be shown that if *m* is the arithmetic mean and *s*^{2} is the variance of a set of numbers, then

## Harmonic mean[edit | edit source]

The **harmonic mean** of a set of numbers is the reciprocal of the arithmetic mean of the reciprocals of those numbers. Thus for three numbers we have

This mean should not be used if any number is zero or negative (why?).

*Exercise:* Verify that this meets all the conditions above, including the stricter versions.

Unless all the numbers are equal, the harmonic is always less than the geometric mean. This follows because its reciprocal is the arithmetic mean of the reciprocals of the numbers, hence is greater than the geometric mean of the reciprocals which is the reciprocal of the geometric mean. Thus we have:

- harmonic mean < geometric mean < arithmetic mean < RMS.

## Rth power mean[edit | edit source]

The **rth power mean** of a set of numbers, for any real number r, is

This average should only be used for positive numbers if r < 0 and non-negative numbers if r > 0 (why?).

*Exercise:* Verify that this meets all the conditions above, including the stricter versions.

This mean is undefined for r = 0, but the limit as r tends to 0 is the geometric mean. Thus all of the averages we have considered so far are special cases of this mean (r = -1, harmonic; r = 0, geometric; r = 1, arithmetic; r = 2, RMS).

It can be shown that for any collection of positive numbers (not all equal), this mean is a continuous, strictly monotonic increasing function of r; the inequalities above are special cases of this. As r tends to infinity, the mean tends to the maximum of the x_{i}, and as r tends to minus infinity, the mean tends to the minimum of the x_{i}.

## Power plus 1 mean[edit | edit source]

The **power plus 1 mean** of a set of numbers, for any real number s, is

If s=0, this is the arithmetic mean; if s=-1 it is the harmonic mean.

Note in particular that if s=1, this is RMS^{2}/(arithmetic mean). If *m* is the arithmetic mean and *s*^{2} is the variance of a set of nubers, than the s=1 mean is

It can be shown that this type of mean behaves much like the rth power mean. For any collection of positive numbers (not all equal), this mean is a continuous, strictly monotonic increasing function of s. As s tends to infinity, the mean tends to the maximum of the x_{i}, and as s tends to minus infinity, the mean tends to the minimum of the x_{i}.

Both of these types of mean can be regarded as a special case of the following:

Clearly, if s=0 this is the rth power mean; if r=1 this is the power plus 1 mean.

## Mixed averages[edit | edit source]

More types of average can be found by mixing different averages, provided that the formula is symmetric in the variables. For example, for any three numbers x, y, z, the following are all averages:

## The median and other quantiles[edit | edit source]

As noted above, the median is an average if we do not require some strict inequalities in the definition. In fact, this is true of any quantile of the distribution, even maximum and minimum. It may seem perverse to call the maximum and minimum "averages". However, the fundamental purpose of an average is to give an estimate of the order of magnitude of a group of numbers. If the range of numbers is small, the maximum and minimum can do this; if the range is very large, any one number as the average may be misleading. As noted above, the maximum and minimum are limiting values of other averages as parameters tend to infinity.

It may be argued that any quantile other than the median is "biased" hence unsatisfactory. However, if in a group of numbers a few of them are much larger or much smaller than the others ("outliers"), any average may seem biased. For example, consider {1,2,3,4,5,6,7,8,144}. The arithmetic mean is 20, far higher than the *upper quartile* of 7 and indeed far higher than all but one of the numbers.

## The mode[edit | edit source]

The mode is not an average. Firstly, it is not always uniquely defined; for example, in {1,1,1,2,2,3,3,3} there are two modes, 1 and 3 (which are also the minimum and maximum). Secondly and more important, it does not satisfy the monotonicity rule. Consider the set of eight numbers

- {1,1,1,2,2,2,2,3}

The mode is 2. Now suppose the seventh number increases from 2 to 3:

- {1,1,1,2,2,2,3,3}

The joint modes are now 1 and 2. Now suppose the sixth number increases from 2 to 4:

- {1,1,1,2,2,3,3,4}

The mode is now 1.

## Transformation means[edit | edit source]

Let f(x) be any strictly monotonic function. Given a set of numbers {x_{1} ... x_{n}}, define:

- y
_{i}= f(x_{i}) - Y as the arithmetic mean of the y
_{i} - X as the solution of the equation Y = f(X).

X is then the *transformation mean* of the x_{i} with respect to f(x).

*Examples:* f(x) = x^{r} gives the rth power mean; f(x) = log(x) gives the geometric mean.

*Exercises*

- Will such a mean always obey the conditions at the beginning of the article?
- Would using a different sort of mean at step 2, e.g. the geometric mean, give yet another sort of average?
- Why does the equation in step 3 always have one and only one solution? Does it matter if f(x) is not continuous?
- Find a function f(x) such that the transformation mean equals the median.