# Least-Squares Method

(Redirected from Least-Squares Fitting)

## Content summary

A brief introduction to Least-Squares method, and its statistic meaning.

## Goals

This learning project offers learning activities and some application for Least-Squares Method. With this project, one should understand the intention of Least-Squares Method, and what it means. Moreover, one should be able to apply some simple Least-Squares methods to find a good approximation for any functions. For more mathematical explanation, one should visit the following page: "Least squares" to obtain more information.

## Learning materials

### Texts

 Numerical Mathematics and Computing Chapter 12.1

 Numerical Method for Engineers: With Software and Programming Applications Chapter 17.3

 Statistics for Management and Economics Chapter 17.1

 T.Strutz: Data Fitting and Uncertainty. A practical introduction to weighted least squares and beyond. 2nd edition, Springer Vieweg, 2016, ISBN 978-3-658-11455-8.

### Lessons

#### Lesson 1: Introduction to Least-Squares Method

The goal of Least-Squares Method is to find a good estimation of parameters that fit a function, f(x), of a set of data, $x_{1}...x_{n}$ . The Least-Squares Method requires that the estimated function has to deviate as little as possible from f(x) in the sense of a 2-norm. Generally speaking, Least-Squares Method has two categories, linear and non-linear. We can also classify these methods further: ordinary least squares (OLS), weighted least squares (WLS), and alternating least squares (ALS) and partial least squares (PLS).

To fit a set of data best, the least-squares method minimizes the sum of squared residuals (it is also called the Sum of Squared Errors, SSE.)

$S=\sum _{i=1}^{i=n}r_{i}^{2}$ ,

with, $r_{i}$ , the residual, which is the difference between the actual points and the regression line, and is defined as

$r_{i}=y_{i}-f(x_{i})$ where the m data pairs are $(x_{i},y_{i})\!$ , and the model function is $f(x_{i})$ .

At here, we can choose n different parameters for f(x), so that the approximated function can best fit the data set.

For example, in the right graph, R3 = Y3 - f(X3), and $(R1^{2}+R2^{2}+R3^{2}+R4^{2}+R5^{2})$ , the sum of the square of each red line's length, is what we want to minimize.

#### Lesson 2: Linear Least-Squares Method

Linear Least-Squares (LLS) Method assumes that the data set falls on a straight line. Therefore, $f(x)=ax+b$ , where a and b are constants. However, due to experimental error, some data might not be on the line exactly. There must be error (residual) between the estimated function and real data. Linear Least-Squares Method (or $l_{2}$ approximation) defined the best-fit function as the function that minimizes $S=\sum _{i=1}^{i=n}(y_{i}-(ax_{i}+b))^{2}$ 1. If we assume that the errors have a normal probability distribution, then minimizing S gives us the best approximation of a and b.

2. We can easily use calculus to determine the approximated value of a and b.

To minimize S, the following conditions must be satisfied ${\frac {\partial S}{\partial a}}=0$ , and ${\frac {\partial S}{\partial b}}=0$ Taking the partial derivatives, we obtain $\sum _{i=0}^{i=n}(2*((ax_{i}+b)-y_{i}))*x_{i}=0$ , and $\sum _{i=0}^{i=n}(2*((ax_{i}+b)-y_{i}))=0$ .

This system actually consists of two simultaneous linear equations with two unknowns a and b. (These two equations are so-called normal equations.)

Based on the simple calculation on summation, we can easily find out that

$a={\frac {1}{c}}*[(n+1)*\sum _{i=0}^{i=n}(x_{i}*y_{i})-(\sum _{i=0}^{i=n}(x_{i}))(\sum _{i=0}^{i=n}(y_{i}))]$ and $b={\frac {1}{c}}*[(\sum _{i=0}^{i=n}((x_{i})^{2}))*(\sum _{i=0}^{i=n}(y_{i}))-(\sum _{i=0}^{i=n}(x_{i}))(\sum _{i=0}^{i=n}(x_{i}*y_{i}))]$ where

$c=(n+1)*(\sum _{i=0}^{i=n}((x_{i})^{2}))-(\sum _{i=0}^{i=n}(x_{i}))^{2}$ .

Thus, the best estimated function for data set $(i,y_{i})$ , for i is an integer between [1, n], is

$y=ax+b$ , where $a={\frac {1}{n^{3}-n}}*[12*\sum _{i=1}^{i=n}(i*y_{i})-6*(n+1)(\sum _{i=1}^{i=n}(y_{i}))]$ and $b={\frac {1}{n^{2}-n}}*[(4*n+2)*\sum _{i=1}^{i=n}(y_{i})-6*(\sum _{i=1}^{i=n}(i*y_{i}))]$ .

#### Lesson 3: Linear Least-Squares Method in matrix form

We can also represent estimated linear function in the following model: $y_{i}=a_{1}*x_{1}+a_{2}*x_{2}+...+a_{m}*x_{m}+r$ .

It can be also represented in the matrix form: ${Y}=[X]{A}+{R}$ , where [X] is a matrix containing coefficients that are derived from the data set (It might not be a square matrix based on the number of variables (m), and data point (n).); Vector ${Y}$ contains the value of dependent variable, which is ${Y}^{T}={\begin{bmatrix}y_{1}&y_{2}&...&y_{n}\end{bmatrix}}$ ; Vector ${A}$ contains the unknown coefficients that we'd like to solve, which is ${A}^{T}={\begin{bmatrix}a_{1}&a_{2}&...&a_{m}\end{bmatrix}}$ ; Vector {R} contains the residuals, which is ${R}^{T}={\begin{bmatrix}r_{1}&r_{2}&...&r_{n}\end{bmatrix}}$ .

To minimize ${R}^{T}$ , we follow the same method in lesson 2, obtaining partial derivative for each coefficient, and setting it equal zero. As a result, we have a system of normalized equations, and they can be represented in the following matrix form: $[[X]^{T}[X]]{A}={[X]^{T}[Y]}$ .

To solve the system, we have many options, such as LU method, Cholesky method, inverse matrix, and Gauss-Seidel. (Generally, the equations might not result in diagonal dominated matrices, so Gauss-Seidel method is not recommended.)

182.56.115.73 (discuss) 11:40, 23 August 2014 (UTC)KB

#### Lesson 4: Least-Squares Method in statistical view

From equation $[[X]^{T}[X]]{A}={[X]^{T}[Y]}$ , we can derive the following equation: ${A}=[[X]^{T}[X]]^{-1}{[X]^{T}{Y}}$ .

From this equation, we can determine not only the coefficients, but also the approximated values in statistic.

Using calculus, the following formulas for coefficients can be obtained:

$a=$ $S_{xy} \over {S_{x}^{2}}$ and $b={\bar {y}}-a*{\bar {x}}$ where

$S_{xy}=$ $\sum _{i=1}^{i=n}((x_{i}-{\bar {x}})*(y_{i}-{\bar {y}})) \over {n-1}$ $S_{x}^{2}=$ $\sum _{i=1}^{i=n}((x_{i}-{\bar {x}})^{2}) \over {n-1}$ ${\bar {x}}=$ $\sum _{i=1}^{i=n}(x_{i}) \over {n}$ ${\bar {y}}=$ $\sum _{i=1}^{i=n}(y_{i}) \over {n}$ .

Moreover, the diagonal values and non-diagonal values matrix $[[X]^{T}[X]]^{-1}$ represents variances and covariances of coefficient $a_{i}$ , respectively.

Assume the diagonal values of $[[X]^{T}[X]]^{-1}$ is $x_{i,i}$ and the corresponding coefficient is $a_{i}$ , then

$var(a_{i-1})=x_{i,i}*s_{y/x}^{2}$ and $cov(a_{i-1},a_{j-1})=x_{i,j}*s_{y/x}^{2}$ where $s_{y/x}$ is called stand error of the estimate, and $s_{y/x}={\sqrt {S \over {n-2}}}$ .

(Here, lower index, y/x, means that the error of certain x is caused by the inaccurate approximation of corresponding y.)

We have many application on these two information. For example, we can derive the upper and lower bound of intercept and slope.

### Assignments

To better understand the application of Least-Squares application, the first question will be solved by applying the LLS equations, and the second one will be solved by Matlab program.

#### Question1: Linear Least-Square Example

The following are 8 data points that shows the relationship between the number of fishermen and the amount of fish (in thousand pounds) they can catch a day.

Number of Fishermen Fish Caught
18 39
14 9
9 9
10 7
5 8
22 35
14 36
12 22

According to this data set, what is the function between the number of fishermen and the amount of fish caught? hint: let the number of fisherman be x, and the amount of fish caught be y, and use LLS to find the coefficients.

##### Calculation

By the simple calculation and statistic knowledge, we can easily find out:

1. ${\bar {X}}$ = 13
2. ${\bar {Y}}$ = 20.625, and
3. the following chart
X Y $X-{\bar {X}}$ $Y-{\bar {Y}}$ $(X-{\bar {X}})*(Y-{\bar {Y}})$ $(X-{\bar {X}})^{2}$ 18 39 5 18.375 91.875 25
14 9 1 $-11.625$ $-11.625$ 1
9 9 $-4$ $-11.625$ 46.5 16
10 7 $-3$ $-13.625$ 40.875 9
5 8 $-8.0$ $-12.625$ 101 64
22 35 9 14.375 129.375 81
14 36 1 15.375 15.375 1
12 22 $-1$ 1.375 $-1.375$ 1

Thus, we have $\Sigma {(X-{\bar {X}})*(Y-{\bar {Y}})}=412$ , and $\Sigma {(X-{\bar {X}})^{2}}=198$ , so the slope, a, = $412 \over {198}$ $=2.{\bar {08}}$ .

And last the intercept, b, = $20.625-2.{\bar {08}}*13=-6.4255$ .

Therefore, the linear least-squares line is $Y=f(X)=-6.4255+2.{\bar {08}}*X$ .

#### Question2: Nonpolynomial example

We have the following data $(x_{i},y_{i})$ , where $1\leq i\leq n$ , by a function of the form $a*e^{x}+b*lnx+c*sinx+d*cosx$ .

 x y 0.23 0.66 0.93 1.25 1.75 2.03 2.24 2.57 2.87 2.98 0.25 $-0.27$ $-1.12$ $-0.45$ 0.28 0.13 $-0.27$ 0.26 0.58 1.03

Write a Matlab program that uses Least-Squares method to obtain the estimated function. hint: input the data in the matrix form, and solve the system to obtain the coefficients.

The blue spots are the data, the green spots are the estimated nonpolynomial function.