# Nonlinear finite elements/Newton method for finite elements

## Newton's method for nonlinear finite elements

The finite element system of equations is of the form

$K(u) u = f ~.$

In order to write it in the form $f(x) = 0$, we define the residual

$r(u) := K(u) u - f \qquad \implies \qquad r(u) = 0 ~.$

Then, the Newton iteration formula becomes

${ u_{r+1} = u_r - \cfrac{r(u_r)}{\cfrac{dr(u_r)}{du}} = u_r - T^{-1}(u_r)~r(u_r) ~. }$

The slope of the tangent (the tangent stiffness) is

\begin{align} T(u_r) := \cfrac{dr(u_r)}{du} & = \cfrac{d}{du}[K(u) u - f]\ \\ & = K(u) + \cfrac{dK(u)}{du}u\ \\ & = K(u_r) + \cfrac{dK(u_r)}{du}u_r \end{align}

Therefore the tangent stiffness is

${ T(u_r) := K(u_r) + \cfrac{dK(u_r)}{du}u_r ~. }$

So far we have considered Newton's method only for single equations. What changes have to be made for a system of equations such as the ones encountered in FEM?

### Newton's method for a system of equations

For a system of equations, all we have to do is a straightforward extension of the method as it applies to one dimension. In this case, it's easier to think in terms of matrices instead of individual coefficients of the matrices. Thus, we have

$\mathbf{K}(\mathbf{u})~\mathbf{u} = \mathbf{f} ~.$

The residual is defined as

$\mathbf{r}(\mathbf{u}) := \mathbf{K}(\mathbf{u})~\mathbf{u} - \mathbf{f} ~.$

Then, the Newton iteration formula is

${ \mathbf{u}_{r+1} = \mathbf{u}_r - \mathbf{T}^{-1}(\mathbf{u}_r)~\mathbf{r}(\mathbf{u}_r) }$

where the tangent stiffness matrix is given by

$\text{(7)} \qquad { \mathbf{T}(\mathbf{u}_r) := \frac{\partial \mathbf{r}(\mathbf{u}_r)}{\partial \mathbf{u}_r} = \mathbf{K}(\mathbf{u}_r) + \frac{\partial \mathbf{K}(\mathbf{u}_r)}{\partial \mathbf{u}}~\mathbf{u}_r~. }$

The iterative procedure is terminated when either the residual is very small or the difference between successive solutions is less than a specified tolerance.

However, both the residual $\mathbf{r}$ and the solution $\mathbf{u}$ are vectors. We usually compare the $L_2$ (Euclidean) norm of the vectors with a tolerance $\epsilon$. In symbolic form, we check the norm of the residual using

$\lVert\mathbf{r}\rVert_{2} \le \epsilon \qquad \text{or} \qquad \sqrt{\mathbf{r}\cdot\mathbf{r}} \le \epsilon \qquad \text{or} \qquad \sqrt{\sum_{i=1}^n r_i^2} \le \epsilon ~.$

For the difference between successive solutions we check

$\cfrac{\lVert\mathbf{u}_{r+1}-\mathbf{u}_r\rVert_{2}}{\lVert\mathbf{u}_{r+1}\rVert_{2}} \le \epsilon \qquad \text{or} \qquad \cfrac{\sqrt{(\mathbf{u}_{r+1} - \mathbf{u}_r)\cdot(\mathbf{u}_{r+1} - \mathbf{u}_r)}} {\sqrt{\mathbf{u}_{r+1}\cdot\mathbf{u}_{r+1}}} \le \epsilon \qquad \text{or} \qquad \cfrac{\sqrt{\sum_{i=1}^n (u^{r+1}_i - u^r_i)^2}} {\sqrt{\sum_{i=1}^n (u^{r+1}_i)^2}} \le \epsilon ~.$

#### How do we calculate the derivative of $\mathbf{K}$ with respect to $\mathbf{u}$?

In equation (7) we have a terms that involves a partial derivative of matrix $\mathbf{K}$ with respect to the vector $\mathbf{u}$. To see what that means, let us look at the component form of the tangent stiffness matrix.

The tangent stiffness matrix is defined as

$\mathbf{T} := \frac{\partial \mathbf{r}(\mathbf{u}_r)}{\partial \mathbf{u}_r} ~.$

Let us consider a $3 \times 3$ stiffness matrix and see what the above equation means. The residual is

$\begin{bmatrix} r_1 \\ r_2 \\ r_3 \end{bmatrix} = \begin{bmatrix} K_{11} & K_{12} & K_{13} \\ K_{21} & K_{22} & K_{23} \\ K_{31} & K_{32} & K_{33} \end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \\ u_3 \end{bmatrix} - \begin{bmatrix} f_1 \\ f_2 \\ f_3 \end{bmatrix} ~.$

Expanding the above matrix equation out we get

\begin{align} r_1 & = K_{11} u_1 + K_{12} u_2 + K_{13} u_3 - f_1 \\ r_2 & = K_{21} u_1 + K_{22} u_2 + K_{23} u_3 - f_2 \\ r_3 & = K_{31} u_1 + K_{32} u_2 + K_{33} u_3 - f_3 \end{align}

Taking derivatives with respect to $u_1$, we get

\begin{align} \frac{\partial r_1}{\partial u_1} & = \left(\frac{\partial K_{11}}{\partial u_1} u_1 + K_{11}\right) + \frac{\partial K_{12}}{\partial u_1} u_2 + \frac{\partial K_{13}}{\partial u_1} u_3 \\ \frac{\partial r_2}{\partial u_1} & = \left(\frac{\partial K_{21}}{\partial u_1} u_1 + K_{21}\right) + \frac{\partial K_{22}}{\partial u_1} u_2 + \frac{\partial K_{23}}{\partial u_1} u_3 \\ \frac{\partial r_3}{\partial u_1} & = \left(\frac{\partial K_{31}}{\partial u_1} u_1 + K_{31}\right) + \frac{\partial K_{32}}{\partial u_1} u_2 + \frac{\partial K_{33}}{\partial u_1} u_3 \end{align}

In shorter form, we can write

$\frac{\partial r_i}{\partial u_1} = K_{i1} + \frac{\partial K_{i1}}{\partial u_1} u_1 + \frac{\partial K_{i2}}{\partial u_1} u_2 + \frac{\partial K_{i3}}{\partial u_1} u_3 = K_{i1} + \sum_{m=1}^3 \frac{\partial K_{im}}{\partial u_1} u_m ~.$

Similarly, taking derivatives with respect to $u_2$, we get

\begin{align} \frac{\partial r_1}{\partial u_2} & = \frac{\partial K_{11}}{\partial u_2} u_1 + \left(\frac{\partial K_{12}}{\partial u_2} u_2 + K_{12}\right) + \frac{\partial K_{13}}{\partial u_2} u_3 \\ \frac{\partial r_2}{\partial u_2} & = \frac{\partial K_{21}}{\partial u_2} u_1 + \left(\frac{\partial K_{22}}{\partial u_2} u_2 + K_{22}\right) + \frac{\partial K_{23}}{\partial u_2} u_3 \\ \frac{\partial r_3}{\partial u_2} & = \frac{\partial K_{31}}{\partial u_2} u_1 + \left(\frac{\partial K_{32}}{\partial u_2} u_2 + K_{32}\right) + \frac{\partial K_{33}}{\partial u_2} u_3 \end{align}

The short form of the above is

$\frac{\partial r_i}{\partial u_2} = K_{i2} + \sum_{m=1}^3 \frac{\partial K_{im}}{\partial u_2} u_m ~.$

Finally, taking derivatives with respect to $u_3$ gives us

\begin{align} \frac{\partial r_1}{\partial u_3} & = \frac{\partial K_{11}}{\partial u_3} u_1 + \frac{\partial K_{12}}{\partial u_3} u_2 + \left(\frac{\partial K_{13}}{\partial u_3} u_3 + K_{13}\right) \\ \frac{\partial r_2}{\partial u_3} & = \frac{\partial K_{21}}{\partial u_3} u_1 + \frac{\partial K_{22}}{\partial u_3} u_2 + \left(\frac{\partial K_{23}}{\partial u_3} u_3 + K_{23}\right) \\ \frac{\partial r_3}{\partial u_3} & = \frac{\partial K_{31}}{\partial u_3} u_1 + \frac{\partial K_{32}}{\partial u_3} u_2 + \left(\frac{\partial K_{33}}{\partial u_3} u_3 + K_{33}\right) \end{align}

The short form is

$\frac{\partial r_i}{\partial u_3} = K_{i3} + \sum_{m=1}^3 \frac{\partial K_{im}}{\partial u_3} u_m ~.$

Combining the three short forms of the 9 equations, we get

${ \frac{\partial r_i}{\partial u_j} = K_{ij} + \sum_{m=1}^3 \frac{\partial K_{im}}{\partial u_j} u_m ~. }$

Each of these terms represents one component $T_{ij}$ of the matrix $\mathbf{T}$, and for a $n \times n$ stiffness matrix we get

${ T_{ij} := K_{ij} + \sum_{m=1}^n \frac{\partial K_{im}}{\partial u_j} u_m ~. }$

In matrix notation, the above equation is written as

${ \mathbf{T} := \mathbf{K} + \frac{\partial \mathbf{K}}{\partial \mathbf{u}}~\mathbf{u}~. }$

This shows us how to compute the derivative of $\mathbf{K}$ with respect to $\mathbf{u}$ and hence $\mathbf{T}$.

Note that we can form the tangent stiffness matrix over an element and assemble the contributions from each element to get the global tangent stiffness matrix. The reasons are the same as those for the standard global stiffness matrix.