Least-Squares Method

Content summary

A brief introduction to Least-Squares method, and its statistic meaning.

Goals

This learning project offers learning activities and some application for Least-Squares Method. With this project, one should understand the intention of Least-Squares Method, and what it means. Moreover, one should be able to apply some simple Least-Squares methods to find a good approximation for any functions. For more mathematical explanation, one should visit the following page: "Least squares" to obtain more information.

Learning materials

Texts

 Numerical Mathematics and Computing Chapter 12.1

 Numerical Method for Engineers: With Software and Programming Applications Chapter 17.3

 Statistics for Management and Economics Chapter 17.1

 T.Strutz: Data Fitting and Uncertainty. A practical introduction to weighted least squares and beyond. 2nd edition, Springer Vieweg, 2016, ISBN 978-3-658-11455-8.

Lessons

Lesson 1: Introduction to Least-Squares Method

The goal of Least-Squares Method is to find a good estimation of parameters that fit a function, f(x), of a set of data, $x_{1}...x_{n}$ . The Least-Squares Method requires that the estimated function has to deviate as little as possible from f(x) in the sense of a 2-norm. Generally speaking, Least-Squares Method has two categories, linear and non-linear. We can also classify these methods further: ordinary least squares (OLS), weighted least squares (WLS), and alternating least squares (ALS) and partial least squares (PLS).

To fit a set of data best, the least-squares method minimizes the sum of squared residuals (it is also called the Sum of Squared Errors, SSE.)

$S=\sum _{i=1}^{i=n}r_{i}^{2}$ ,

with, $r_{i}$ , the residual, which is the difference between the actual points and the regression line, and is defined as

$r_{i}=y_{i}-f(x_{i})$ where the m data pairs are $(x_{i},y_{i})\!$ , and the model function is $f(x_{i})$ .

At here, we can choose n different parameters for f(x), so that the approximated function can best fit the data set.

For example, in the right graph, R3 = Y3 - f(X3), and $(R1^{2}+R2^{2}+R3^{2}+R4^{2}+R5^{2})$ , the sum of the square of each red line's length, is what we want to minimize.

Lesson 2: Linear Least-Squares Method

Linear Least-Squares (LLS) Method assumes that the data set falls on a straight line. Therefore, $f(x)=ax+b$ , where a and b are constants. However, due to experimental error, some data might not be on the line exactly. There must be error (residual) between the estimated function and real data. Linear Least-Squares Method (or $l_{2}$ approximation) defined the best-fit function as the function that minimizes $S=\sum _{i=1}^{i=n}(y_{i}-(ax_{i}+b))^{2}$ The advantages of LLS:

1. If we assume that the errors have a normal probability distribution, then minimizing S gives us the best approximation of a and b.

2. We can easily use calculus to determine the approximated value of a and b.

To minimize S, the following conditions must be satisfied ${\frac {\partial S}{\partial a}}=0$ , and ${\frac {\partial S}{\partial b}}=0$ Taking the partial derivatives, we obtain $\sum _{i=0}^{i=n}(2*((ax_{i}+b)-y_{i}))*x_{i}=0$ , and $\sum _{i=0}^{i=n}(2*((ax_{i}+b)-y_{i}))=0$ .

This system actually consists of two simultaneous linear equations with two unknowns a and b. (These two equations are so-called normal equations.)

Based on the simple calculation on summation, we can easily find out that

$a={\frac {1}{c}}*[(n+1)*\sum _{i=0}^{i=n}(x_{i}*y_{i})-(\sum _{i=0}^{i=n}(x_{i}))(\sum _{i=0}^{i=n}(y_{i}))]$ and $b={\frac {1}{c}}*[(\sum _{i=0}^{i=n}((x_{i})^{2}))*(\sum _{i=0}^{i=n}(y_{i}))-(\sum _{i=0}^{i=n}(x_{i}))(\sum _{i=0}^{i=n}(x_{i}*y_{i}))]$ where

$c=(n+1)*(\sum _{i=0}^{i=n}((x_{i})^{2}))-(\sum _{i=0}^{i=n}(x_{i}))^{2}$ .

Thus, the best estimated function for data set $(i,y_{i})$ , for i is an integer between [1, n], is

$y=ax+b$ , where $a={\frac {1}{n^{3}-n}}*[12*\sum _{i=1}^{i=n}(i*y_{i})-6*(n+1)(\sum _{i=1}^{i=n}(y_{i}))]$ and $b={\frac {1}{n^{2}-n}}*[(4*n+2)*\sum _{i=1}^{i=n}(y_{i})-6*(\sum _{i=1}^{i=n}(i*y_{i}))]$ .

Lesson 3: Linear Least-Squares Method in matrix form

We can also represent estimated linear function in the following model: $y_{i}=a_{1}*x_{1}+a_{2}*x_{2}+...+a_{m}*x_{m}+r$ .

It can be also represented in the matrix form: ${Y}=[X]{A}+{R}$ , where [X] is a matrix containing coefficients that are derived from the data set (It might not be a square matrix based on the number of variables (m), and data point (n).); Vector ${Y}$ contains the value of dependent variable, which is ${Y}^{T}={\begin{bmatrix}y_{1}&y_{2}&...&y_{n}\end{bmatrix}}$ ; Vector ${A}$ contains the unknown coefficients that we'd like to solve, which is ${A}^{T}={\begin{bmatrix}a_{1}&a_{2}&...&a_{m}\end{bmatrix}}$ ; Vector {R} contains the residuals, which is ${R}^{T}={\begin{bmatrix}r_{1}&r_{2}&...&r_{n}\end{bmatrix}}$ .

To minimize ${R}^{T}$ , we follow the same method in lesson 2, obtaining partial derivative for each coefficient, and setting it equal zero. As a result, we have a system of normalized equations, and they can be represented in the following matrix form: $[[X]^{T}[X]]{A}={[X]^{T}[Y]}$ .

To solve the system, we have many options, such as LU method, Cholesky method, inverse matrix, and Gauss-Seidel. (Generally, the equations might not result in diagonal dominated matrices, so Gauss-Seidel method is not recommended.)

182.56.115.73 (discuss) 11:40, 23 August 2014 (UTC)KB

Lesson 4: Least-Squares Method in statistical view

From equation $[[X]^{T}[X]]{A}={[X]^{T}[Y]}$ , we can derive the following equation: ${A}=[[X]^{T}[X]]^{-1}{[X]^{T}{Y}}$ .

From this equation, we can determine not only the coefficients, but also the approximated values in statistic.

Using calculus, the following formulas for coefficients can be obtained:

$a=$ $S_{xy} \over {S_{x}^{2}}$ and $b={\bar {y}}-a*{\bar {x}}$ where

$S_{xy}=$ $\sum _{i=1}^{i=n}((x_{i}-{\bar {x}})*(y_{i}-{\bar {y}})) \over {n-1}$ $S_{x}^{2}=$ $\sum _{i=1}^{i=n}((x_{i}-{\bar {x}})^{2}) \over {n-1}$ ${\bar {x}}=$ $\sum _{i=1}^{i=n}(x_{i}) \over {n}$ ${\bar {y}}=$ $\sum _{i=1}^{i=n}(y_{i}) \over {n}$ .

Moreover, the diagonal values and non-diagonal values matrix $[[X]^{T}[X]]^{-1}$ represents variances and covariances of coefficient $a_{i}$ , respectively.

Assume the diagonal values of $[[X]^{T}[X]]^{-1}$ is $x_{i,i}$ and the corresponding coefficient is $a_{i}$ , then

$var(a_{i-1})=x_{i,i}*s_{y/x}^{2}$ and $cov(a_{i-1},a_{j-1})=x_{i,j}*s_{y/x}^{2}$ where $s_{y/x}$ is called stand error of the estimate, and $s_{y/x}={\sqrt {S \over {n-2}}}$ .

(Here, lower index, y/x, means that the error of certain x is caused by the inaccurate approximation of corresponding y.)

We have many application on these two information. For example, we can derive the upper and lower bound of intercept and slope.

Assignments

To better understand the application of Least-Squares application, the first question will be solved by applying the LLS equations, and the second one will be solved by Matlab program.

Question1: Linear Least-Square Example

The following are 8 data points that shows the relationship between the number of fishermen and the amount of fish (in thousand pounds) they can catch a day.

Number of Fishermen Fish Caught
18 39
14 9
9 9
10 7
5 8
22 35
14 36
12 22

According to this data set, what is the function between the number of fishermen and the amount of fish caught? hint: let the number of fisherman be x, and the amount of fish caught be y, and use LLS to find the coefficients.

Calculation

By the simple calculation and statistic knowledge, we can easily find out:

1. ${\bar {X}}$ = 13
2. ${\bar {Y}}$ = 20.625, and
3. the following chart
X Y $X-{\bar {X}}$ $Y-{\bar {Y}}$ $(X-{\bar {X}})*(Y-{\bar {Y}})$ $(X-{\bar {X}})^{2}$ 18 39 5 18.375 91.875 25
14 9 1 $-11.625$ $-11.625$ 1
9 9 $-4$ $-11.625$ 46.5 16
10 7 $-3$ $-13.625$ 40.875 9
5 8 $-8.0$ $-12.625$ 101 64
22 35 9 14.375 129.375 81
14 36 1 15.375 15.375 1
12 22 $-1$ 1.375 $-1.375$ 1

Thus, we have $\Sigma {(X-{\bar {X}})*(Y-{\bar {Y}})}=412$ , and $\Sigma {(X-{\bar {X}})^{2}}=198$ , so the slope, a, = $412 \over {198}$ $=2.{\bar {08}}$ .

And last the intercept, b, = $20.625-2.{\bar {08}}*13=-6.4255$ .

Therefore, the linear least-squares line is $Y=f(X)=-6.4255+2.{\bar {08}}*X$ .

Question2: Nonpolynomial example

We have the following data $(x_{i},y_{i})$ , where $1\leq i\leq n$ , by a function of the form $a*e^{x}+b*lnx+c*sinx+d*cosx$ .

 x y 0.23 0.66 0.93 1.25 1.75 2.03 2.24 2.57 2.87 2.98 0.25 $-0.27$ $-1.12$ $-0.45$ 0.28 0.13 $-0.27$ 0.26 0.58 1.03

Write a Matlab program that uses Least-Squares method to obtain the estimated function. hint: input the data in the matrix form, and solve the system to obtain the coefficients.

The blue spots are the data, the green spots are the estimated nonpolynomial function.