# Moving Average/Exponential

## Exponential moving average

An exponential moving average (EMA), also known as an exponentially weighted moving average (EWMA),[1] is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. The graph at right shows an example of the weight decrease.

The EMA for a series ${\displaystyle C:=(C(0),C(1),\ldots )}$ of collected data with a set of dates ${\displaystyle T:=\mathbb {N} _{0}}$, where ${\displaystyle C(t)}$ is the collected data at time index ${\displaystyle t\in T:=\mathbb {N} _{0}}$.

• First of all we have to defined a value ${\displaystyle \alpha }$ with ${\displaystyle 0<\alpha <1}$, that represents the degree of weighting decrease as a constant smoothing factor between 0 and 1. A lower α discounts older observations faster.
• The weights are defined by
${\displaystyle w_{t}=(1-\alpha )\cdot \alpha ^{t}}$ for all ${\displaystyle t\in T:=\mathbb {N} _{0}}$ with ${\displaystyle \sum _{t=0}^{\infty }w_{t}=1}$ (geometric series).
• The sum of weights from 0 to the time index ${\displaystyle t\in T}$ is defined by:
${\displaystyle s(t)=\sum _{k=0}^{t}w_{k}=\sum _{k=0}^{t}(1-\alpha )\cdot \alpha ^{k}=(1-\alpha )\cdot \underbrace {\sum _{k=0}^{t}\alpha ^{k}} _{={\frac {1-\alpha ^{t+1}}{1-\alpha }}}=(1-\alpha )\cdot {\frac {1-\alpha ^{t+1}}{1-\alpha }}=1-\alpha ^{t+1}}$
• The discrete probability mass function is defined by:
${\displaystyle p_{t}(x)={\begin{cases}{\frac {w_{t-x}}{s(t)}}={\frac {(1-\alpha )\cdot \alpha ^{t-x}}{1-\alpha ^{t+1}}}&\mathrm {for} \ 0\leq x\leq t,\\[8pt]0&\mathrm {for} \ x<0\ \mathrm {or} \ x>t\end{cases}}}$

The definition above creates the exponential moving average EMA with discrete probability mass function ${\displaystyle p_{t}}$ by

${\displaystyle EMA(t):=\sum _{k\in T}p_{t}(k)\cdot C(k)=\sum _{k=0}^{t}p_{t}(k)\cdot C(k)=\sum _{x=0}^{t}{\frac {w_{t-k}}{s(t)}}\cdot C(k)=\sum _{k=0}^{t}{\frac {(1-\alpha )\cdot \alpha ^{t-k}}{1-\alpha ^{t+1}}}\cdot C(k)}$

The ${\displaystyle EMA}$ at time index ${\displaystyle t\in T:=\mathbb {N} _{0}}$ may be calculated recursively:

${\displaystyle EMA(0):=C(0)}$ and
${\displaystyle EMA(t+1):={\frac {1-\alpha }{1-\alpha ^{t+2}}}\cdot C(t+1)+\alpha \cdot {\frac {1-\alpha ^{t+1}}{1-\alpha ^{t+2}}}\cdot EMA(t)}$ for all ${\displaystyle t\in T=\mathbb {N} _{0}=\{0,1,2,3,...\}}$

Where:

• The coefficient ${\displaystyle \alpha }$ represents the degree of weighting decrease from ${\displaystyle EMA(t)}$ to ${\displaystyle EMA(t+1)}$. This implements the aging of data from ${\displaystyle t}$ to time index ${\displaystyle t+1}$.
• the fraction ${\displaystyle {\frac {1-\alpha ^{t+1}}{1-\alpha ^{t+2}}}}$ adjusts the denominator ${\displaystyle EMA(t)}$ for ${\displaystyle EMA(t+1)}$.
• the coefficient ${\displaystyle {\frac {1-\alpha }{1-\alpha ^{t+2}}}=p_{t+1}(t+1)={\frac {w_{(t+1)-(t+1)}}{s(t+1)}}}$ in the EMA at time index t+1.

### Initialization of EMA and Elimination of MA Impact form old data

${\displaystyle EMA(0)}$ may be initialized in a number of different ways, most commonly by setting ${\displaystyle EMA(0)}$ to the first collected data ${\displaystyle C(0)}$ at time index 0 as shown above, though other techniques exist, such as starting the calculation of the moving average after the first 4 or 5 observations. Furthermore only a most recent subset of collected data before the time index ${\displaystyle t}$ from the total history of collected date might be used for the ${\displaystyle EMA(t)}$. The discrete probability mass function puts weights on the most recent ${\displaystyle m+1}$ values of the collected data by:

${\displaystyle p_{t}(x)={\begin{cases}{\frac {w_{t-x}}{s(t)}}={\frac {(1-\alpha )\cdot \alpha ^{t-x}}{1-\alpha ^{t+1}}}&\mathrm {for} \ 0\leq x\leq t\ \mathrm {and} \ t\leq m\\[8pt]{\frac {w_{t-x}}{s(t)-s(t-m)}}={\frac {(1-\alpha )\cdot \alpha ^{t-x}}{1-\alpha ^{t+1}-(1-\alpha ^{t-m+1})}}={\frac {(1-\alpha )\cdot \alpha ^{t-x}}{\alpha ^{t-m+1}-\alpha ^{t+1}}}&\mathrm {for} \ (t-m)\leq x\leq t\ \mathrm {and} \ t>m,\\[8pt]0&\mathrm {for} \ x<0\ \mathrm {or} \ x>t\end{cases}}}$

The limitations to the most recent ${\displaystyle m+1}$ values of the collected data eliminates the impact of very old data on the resultant moving average completely. By choosing a small ${\displaystyle \alpha }$ old data is less important than recent data and discounts older observations faster, but even the oldest data has a impact on the calculation of ${\displaystyle EMA(t)}$ at time index ${\displaystyle t}$.

Tne initialiation of ${\displaystyle EMA(t)}$ could incorporate something about values prior to the available data, i.e. history before ${\displaystyle t=0}$. Tne initialiation could introduce an error in the ${\displaystyle EMA(t)}$. In view of this the early results should be regarded as unreliable until the iterations have had time to converge. This is sometimes called a 'spin-up' interval.

This formulation of EMA is designed as an application of an expected value, which is a standard definition in probability theory.

According to Hunter (1986).[2] this can be written as an repeated application of the recursive formula for different times ${\displaystyle t}$ without standardisation, i.e.

${\displaystyle \sum _{x\in T}p_{t}(x)=1}$.

An alternate approach defined by Roberts (1959)[3] is missing the standardisation of the probability distribution too, while the basic principle of exponential moving average remains the same.

### Application to measuring computer performance

Some computer performance metrics, e.g. the average process queue length, or the average CPU utilization, use a form of exponential moving average with the recursive definition.

${\displaystyle S_{n}=\alpha (t_{n}-t_{n-1})\times Y_{n}+(1-\alpha (t_{n}-t_{n-1}))\times S_{n-1}.}$

Here α is defined as a function of time between two readings. An example of a coefficient giving bigger weight to the current reading, and smaller weight to the older readings is

${\displaystyle \alpha (t_{n}-t_{n-1})=1-\exp \left({-{{t_{n}-t_{n-1}} \over {W\times 60}}}\right)}$

where exp() is the exponential function, time for readings tn is expressed in seconds, and W is the period of time in minutes over which the reading is said to be averaged (the mean lifetime of each reading in the average). Given the above definition of α, the moving average can be expressed as

${\displaystyle S_{n}=\left(1-\exp \left(-{{t_{n}-t_{n-1}} \over {W\times 60}}\right)\right)\times Y_{n}+\exp \left(-{{t_{n}-t_{n-1}} \over {W\times 60}}\right)\times S_{n-1}}$

For example, a 15-minute average L of a process queue length Q, measured every 5 seconds (time difference is 5 seconds), is computed as

{\displaystyle {\begin{aligned}L_{n}&=\left(1-\exp \left({-{5 \over {15\times 60}}}\right)\right)\times Q_{n}+e^{-{5 \over {15\times 60}}}\times L_{n-1}\\[6pt]&=\left(1-\exp \left({-{1 \over {180}}}\right)\right)\times Q_{n}+e^{-1/180}\times L_{n-1}\\[6pt]&=Q_{n}+e^{-1/180}\times (L_{n-1}-Q_{n})\end{aligned}}}