# Interpolation of sparse pointwise and integral geographical Data over Time

Interpolation of sparse pointwise and integral geographical Data over Time

Martin Papke*

• *Universität Koblenz-Landau, Germany

Abstract

Geographical data, for example distribution densities of species as discussed in [1], often are given both by pointwise and integral values. We aim to give two interpolation algorithm which can handle both data types together. As these data tend to change over time, we will extend our algorithm to allow a timestamp attached to the data and give newer data more weight in the interpolation. This will allow us to model time dependent functions and data. We will compare both approaches and discuss their advantages and disadvantages. The possibility of giving both pointwise and integral data extends the basic ideas from [2], allowing for integral data points.

Keywords: Interpolation, Applied Mathematics, Numerical Mathematics

## Introduction

Let ${\displaystyle \mathbb {T} }$ be a time set. Assume we have collected data about an unknown mapping ${\displaystyle (f_{t})_{t\in \mathbb {T} }}$ that changes over time. ${\displaystyle f_{t}}$ is the mapping at time ${\displaystyle t\in \mathbb {T} }$. The ocollected data ${\displaystyle (d,t)}$ indicates that ${\displaystyle d}$ was collected at time ${\displaystyle t\in \mathbb {T} }$. As time set we use ${\displaystyle \mathbb {T} :=[0,+\infty )=\mathbb {R} _{0}^{+}}$ with ${\displaystyle t_{0}\in \mathbb {T} }$ denotes the current time stamp. All ${\displaystyle t>t_{0}}$ are time stamp of the future. Collected data ${\displaystyle (d,t)}$ has always a time stamp ${\displaystyle t\leq t_{0}}$.

For the following sections we use the time index ${\displaystyle t\in \mathbb {T} }$ as last argument of the function ${\displaystyle f}$ and use ${\displaystyle f_{t}(x)}$ as ${\displaystyle f(x,t)}$. If ${\displaystyle f_{t}}$ does not change over time we have ${\displaystyle f_{t_{1}}=f_{t_{2}}}$ for all ${\displaystyle t_{1},t_{2}\in \mathbb {T} }$. If the domain of ${\displaystyle f}$ does not contain the time set ${\displaystyle \mathbb {T} }$, that the function ${\displaystyle f}$ is regarded as static of time.

We propose two algorithms to handle sparse pointwise and integral data as they arrise in geographical problems. The first algorithm uses a least squares idea, the second one has convex combinations as its grounding idea.

## Setting and Notations

We consider a triangulation ${\displaystyle T_{\Delta }=\{\Delta _{i}:i=1,\ldots ,n\}}$ of a region ${\displaystyle \Omega \subseteq \mathbb {R} ^{2}}$. We denote the set of nodes of ${\displaystyle T_{\Delta }}$ by ${\displaystyle \{p_{j}:j=1,\ldots ,m\}}$. For each triangle ${\displaystyle \Delta _{i}}$, we denote by ${\displaystyle j(i,\alpha )}$, ${\displaystyle \alpha =1,2,3}$ the indices of ${\displaystyle \Delta _{i}}$'s nodes, that is the nodes of ${\displaystyle \Delta _{i}}$, are ${\displaystyle p_{j(i,1)}}$, ${\displaystyle p_{j(i,2)}}$ and ${\displaystyle p_{j(i,3)}}$. For each node ${\displaystyle p_{j}}$, we denote by ${\displaystyle p_{n(j,\beta )}}$, ${\displaystyle \beta =1,\ldots ,N(j)}$, the neighbours of ${\displaystyle p_{j}}$, that is the nodes which are connected to ${\displaystyle n_{j}}$.

We are given two types of data for an unknown function ${\displaystyle f\colon \Omega \to \mathbb {R} }$. Firstly, point values ${\displaystyle y_{k}}$, which are measured values of ${\displaystyle f}$ at a node ${\displaystyle p_{j(k)}}$. Secondly we are given integrals ${\displaystyle I_{\ell }}$ of ${\displaystyle f}$ over triangles ${\displaystyle \Delta _{i(\ell )}}$.

Our aim is to construct a function ${\displaystyle f\colon \Omega \to \mathbb {R} }$ which interpolates the given values in the sense that

${\displaystyle f(p_{j(k)})\approx y_{k},k=1,\ldots ,K}$

${\displaystyle \int _{\Delta _{i(\ell )}}f(x)\,dx\approx I_{\ell },\ell =1,\ldots ,L}$

taking into account that our data are for example values of a density distribution of a species, hence give ${\displaystyle f}$ only locally.

### Example: Single Point Data Collection

E.g. at single point in a habitat a camera is placed, that takes automatically snapshots of all animals that pass this point. The aggregated data provides pointwise values of the density distribution of a species. Including the time index ${\displaystyle t\in \mathbb {T} }$ the collected data provides pointwise data of the density distribution for every month.

### Combine Different Data Collections

Two different data collections at the same time ${\displaystyle t}$ for a given population density could indicate different population estimations, which can arrise for example when using the capture-recapture method for estimating populations, as done in[3] and[4].

## The least squares Approach

The first algorithm we propose to takle this kind of problem is a least squares approach, which allows for more than one data point for a fixed site in our lattice.

As geographical data tend to be time dependent, we want to add a time stamp to each data point and consider a time dependent function ${\displaystyle f\colon \Omega \times [0,T]\to \mathbb {R} }$. We therefore replace the above equations by

${\displaystyle f(p_{j(k)},t_{k})\approx y_{k},k=1,\ldots ,K}$
${\displaystyle \int _{\Delta _{i(\ell )}}f(x,t_{\ell })\,dx\approx I_{\ell },\ell =1,\ldots ,L}$

At a given time ${\displaystyle t}$ we will only consider values with a timestamp ${\displaystyle \leq t}$, which will allow live data being considered. The importance of measurements decays over time, we attach to each equation a weight, given for a data point with time stamp ${\displaystyle \tau }$ by ${\displaystyle \phi (t-\tau )}$ denoting by ${\displaystyle \phi }$ the weighting function ${\displaystyle \phi (s):=e^{-\eta s}}$, where ${\displaystyle \eta \geq 0}$ denotes the speed of decay. ${\displaystyle \eta }$ has to be choosen in a way that addapts to the given problem, a possible choice could be an estimate for the time derivative of the model function ${\displaystyle f}$, because ${\displaystyle f}$ having strong fluctuations in time means that the importance of past time values decays more rapidly.

### Constructing a linear least squares system

For a given time ${\displaystyle t}$ we will construct an overdetermined linear system of equations for the node values ${\displaystyle x_{j}:=f(p_{j},t)}$ of our function ${\displaystyle f}$, which we will solve in a weighted least square sense, that is we will transform both point and integral data into equations for the values of ${\displaystyle f}$ at the nodes ${\displaystyle p_{j}}$. Each of the equations to be modelled gives rise to one linear equation. Another set of equations models the smoothness of our function ${\displaystyle f}$.

#### Equations for the point data

For each given point datum with ${\displaystyle t_{k}\leq t}$ - future measurements are not considered giving information for the current time - we add the equation

${\displaystyle x_{j}=y_{j}}$ with weight ${\displaystyle \phi (t-t_{j})}$

The weight is choosen such that it has its maximal value when ${\displaystyle t=t_{j}}$ and decays over time.

#### Equations for the volume data

To give an equation for a given volume datum we first have to approximate the integral by point values of the function ${\displaystyle f}$. We use the two-dimensional variant of the trapezoidal rule, that is we approximate the integral of a function ${\displaystyle g}$ over a triangle ${\displaystyle \Delta _{i}}$ by the volume of the trapezoidal body determined by the values of ${\displaystyle g}$ at the nodes of ${\displaystyle \Delta _{i}}$, hence

${\displaystyle \int _{\Delta _{i}}g(x)\,dx\approx {\frac {1}{3}}|\Delta _{i}|\sum _{\alpha =1}^{3}g(p_{i,\alpha })}$

where ${\displaystyle |\Delta _{i}|}$ denotes the area of the triangle.

Hence, we get the equation

${\displaystyle {\frac {1}{3}}|\Delta _{i(\ell )}|\cdot \sum _{\alpha =1}^{3}x_{p(i(\ell ),\alpha )}=I_{\ell }}$ with weight ${\displaystyle \phi (t-t_{\ell })}$,

where the weight is choosen exactly as in the point value case.

#### Smoothness equations

If we have only a little set of data points, our system will be underdertermined and the calculated function may have strong fluctuations. To prevent that, we add for each node the equation

${\displaystyle {\frac {1}{N(j)}}\sum _{\beta =1}^{N(j)}x_{n(j,\beta )}-x_{j}=0,}$ with weight ${\displaystyle 0.1}$

stating that we want the value at each note to be approximately the mean of its neighbouring values.

The resulting linear system ${\displaystyle Ax=b}$ is solved in a weighted least square sense, giving us point values for ${\displaystyle f}$ which we interpolate by linear functions.

### Implementation

We represent the triangulation by a structure consisting of the nodes, given by their coordinates and the triangles, which are given by the indices of their vertices. From that we calculate the neighbours of each node and store this also in the lattice structure. Given the data we now assemble for each time ${\displaystyle t}$ a matrix ${\displaystyle A}$ and a right hand side ${\displaystyle b}$ collecting the equations.

## The convex combination approach

As a second possibility to attack our problem, we propose an algorithm that handles point and integral data as distinct interpolation problems and afterwards takes a convex combination of the to solution to obtain a solution to the full problem.

### The pointwise interpolation

At each given point, we first combine all data we have at this point into one by taking a convex combination of the measured values, where each value is weighted with ${\displaystyle \phi (t-t_{j})}$ as in the first algorithm. That is to a node ${\displaystyle p_{j}}$ in our lattice, we look at all values given for that point with ${\displaystyle t_{k}\leq t}$ and form the average

${\displaystyle q_{j}:={\frac {\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{k})y_{k}}{\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{j})}}}$

of the values measured at this particular point.

### Interpolation of the integral values

The same way as we interpolated the point values, we now interpolate the given integral values. Hence, we first assign to each triangle of our lattice an unique integral value by weighting the given ones. Afterwards, we interpolate this values in a linear sense, assigning the value divided by the volume of the particular cell to the Center of Gravity of each triangle. That is, we approximate the integral by means of the midpoint rule, i. e.

${\displaystyle \int _{\Delta }f(x)\,dx\approx f(p_{\Delta })\cdot |\Delta |}$

where ${\displaystyle p_{\Delta }}$ denotes ${\displaystyle \Delta }$'s center of gravity.

The value, that we want the integral to have is calculated exactly as in the point case, that is we have assign

${\displaystyle I_{\Delta }:={\frac {\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{k})I_{k}}{\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{j})}}}$

### Obtaining the solution to the full problem

Let ${\displaystyle f_{p}\colon \Omega \to \mathbb {R} }$ denote the interpolating function obtained in step one and ${\displaystyle f_{i}\colon \Omega \to \mathbb {R} }$ denote the interpolation of the integral values. We now let ${\displaystyle f:={\frac {n_{p}}{n_{p}+n_{i}}}f_{p}+{\frac {n_{i}}{n_{p}+n_{i}}}f_{i}}$ where ${\displaystyle n_{p}}$ and ${\displaystyle n_{i}}$ are the number of given point and integral values respectively.

## Another idea that did not work out

As a second possibility to attack our problem, we propose an algorithm starts by interpolating the point data and than matches the integral data by a refinement of the lattice. The value at the center of gravity of each cell is choosen in a way to match the integral data.

### The pointwise interpolation

At each given point, we first combine all data we have at this point into one by taking a convex combination of the measured values, where each value is weighted with ${\displaystyle \phi (t-t_{j})}$ as in the first algorithm. That is to a node ${\displaystyle p_{j}}$ in our lattice, we look at all values given for that point with ${\displaystyle t_{k}\leq t}$ and form the average

${\displaystyle q_{j}:={\frac {\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{k})y_{k}}{\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{j})}}}$

of the values measured at this particular point.

### Interpolation of the integral values

The same way as we interpolated the point values, we now interpolate the given integral values. Hence, we first assign to each triangle of our lattice an unique integral value by weighting the given ones. Afterwards, we interpolate this values in a linear sense, assigning the value divided by the volume of the particular cell to the Center of Gravity of each triangle. That is, we approximate the integral by means of the midpoint rule, i. e.

${\displaystyle \int _{\Delta }f(x)\,dx\approx f(p_{\Delta })\cdot |\Delta |}$

where ${\displaystyle p_{\Delta }}$ denotes ${\displaystyle \Delta }$'s center of gravity.

### Full solution

We then choose a function ${\displaystyle f_{3}}$ that interpolates both the point and the integral values. That did not lead to a good result, because the fluctiation of the function ${\displaystyle f_{3}}$ was so large, that it could not be regarded as an approximation of a smooth function.

## Examples

As an example we generate for a simple triangulation of ${\displaystyle \Omega =[0,1]^{2}}$, over 10 seconds 4000 random point data and 1000 random volume data. As timestep we choose ${\displaystyle \tau =0.1}$, that is we interpolate our data every ${\displaystyle 0.1}$ seconds. As random data, we start with a simple function, here

${\displaystyle f_{t}(x)=f_{t}(x_{1},x_{2})=\sin(2\pi (x_{1}+t))+\cos(\pi (x_{2}+t))}$,

that creates a diagonal shift of the graph of ${\displaystyle f_{0}}$ in time. Furthermore we add some normally distributed noise. The points and time values to interpolate are also generated randomly.

.

## Conclusion

Extending the ideas form[2], we allow for integral values to be given. Representing the data points as a weighted linear squares system would allow us to use an iterative least square solver to reuse the calculations we have already done, for example the solver LSQR discussed by[5].