Interpolation of sparse pointwise and integral geographical Data over Time

Interpolation of sparse pointwise and integral geographical Data over Time

Martin Papke*

*Universität Koblenz-Landau, Germany

Abstract

Geographical data, for example distribution densities of species as discussed in ^[1], often are given both by pointwise and integral values. We aim to give two interpolation algorithm which can handle both data types together. As these data tend to change over time, we will extend our algorithm to allow a timestamp attached to the data and give newer data more weight in the interpolation. This will allow us to model time dependent functions and data. We will compare both approaches and discuss their advantages and disadvantages. The possibility of giving both pointwise and integral data extends the basic ideas from ^[2], allowing for integral data points.

Keywords: Interpolation, Applied Mathematics, Numerical Mathematics

Introduction

Let $\mathbb {T}$ be a time set. Assume we have collected data about an unknown mapping $(f_{t})_{t\in \mathbb {T} }$ that changes over time. $f_{t}$ is the mapping at time $t\in \mathbb {T}$ . The ocollected data $(d,t)$ indicates that $d$ was collected at time $t\in \mathbb {T}$ . As time set we use $\mathbb {T} :=[0,+\infty )=\mathbb {R} _{0}^{+}$ with $t_{0}\in \mathbb {T}$ denotes the current time stamp. All $t>t_{0}$ are time stamp of the future. Collected data $(d,t)$ has always a time stamp $t\leq t_{0}$ .

For the following sections we use the time index $t\in \mathbb {T}$ as last argument of the function $f$ and use $f_{t}(x)$ as $f(x,t)$ . If $f_{t}$ does not change over time we have $f_{t_{1}}=f_{t_{2}}$ for all $t_{1},t_{2}\in \mathbb {T}$ . If the domain of $f$ does not contain the time set $\mathbb {T}$ , that the function $f$ is regarded as static of time.

We propose two algorithms to handle sparse pointwise and integral data as they arrise in geographical problems. The first algorithm uses a least squares idea, the second one has convex combinations as its grounding idea.

Setting and Notations

We consider a triangulation $T_{\Delta }=\{\Delta _{i}:i=1,\ldots ,n\}$ of a region $\Omega \subseteq \mathbb {R} ^{2}$ . We denote the set of nodes of $T_{\Delta }$ by $\{p_{j}:j=1,\ldots ,m\}$ . For each triangle $\Delta _{i}$ , we denote by $j(i,\alpha )$ , $\alpha =1,2,3$ the indices of $\Delta _{i}$ 's nodes, that is the nodes of $\Delta _{i}$ , are $p_{j(i,1)}$ , $p_{j(i,2)}$ and $p_{j(i,3)}$ . For each node $p_{j}$ , we denote by $p_{n(j,\beta )}$ , $\beta =1,\ldots ,N(j)$ , the neighbours of $p_{j}$ , that is the nodes which are connected to $n_{j}$ .

We are given two types of data for an unknown function $f\colon \Omega \to \mathbb {R}$ . Firstly, point values $y_{k}$ , which are measured values of $f$ at a node $p_{j(k)}$ . Secondly we are given integrals $I_{\ell }$ of $f$ over triangles $\Delta _{i(\ell )}$ .

Our aim is to construct a function $f\colon \Omega \to \mathbb {R}$ which interpolates the given values in the sense that

$f(p_{j(k)})\approx y_{k},k=1,\ldots ,K$

\int _{\Delta _{i(\ell )}}f(x)\,dx\approx I_{\ell },\ell =1,\ldots ,L

taking into account that our data are for example values of a density distribution of a species, hence give $f$ only locally.

Example: Single Point Data Collection

E.g. at single point in a habitat a camera is placed, that takes automatically snapshots of all animals that pass this point. The aggregated data provides pointwise values of the density distribution of a species. Including the time index $t\in \mathbb {T}$ the collected data provides pointwise data of the density distribution for every month.

Combine Different Data Collections

Two different data collections at the same time $t$ for a given population density could indicate different population estimations, which can arrise for example when using the capture-recapture method for estimating populations, as done in^[3] and^[4].

The least squares Approach

The first algorithm we propose to takle this kind of problem is a least squares approach, which allows for more than one data point for a fixed site in our lattice.

As geographical data tend to be time dependent, we want to add a time stamp to each data point and consider a time dependent function $f\colon \Omega \times [0,T]\to \mathbb {R}$ . We therefore replace the above equations by

$f(p_{j(k)},t_{k})\approx y_{k},k=1,\ldots ,K$
$\int _{\Delta _{i(\ell )}}f(x,t_{\ell })\,dx\approx I_{\ell },\ell =1,\ldots ,L$

At a given time $t$ we will only consider values with a timestamp $\leq t$ , which will allow live data being considered. The importance of measurements decays over time, we attach to each equation a weight, given for a data point with time stamp $\tau$ by $\phi (t-\tau )$ denoting by $\phi$ the weighting function $\phi (s):=e^{-\eta s}$ , where $\eta \geq 0$ denotes the speed of decay. $\eta$ has to be choosen in a way that addapts to the given problem, a possible choice could be an estimate for the time derivative of the model function $f$ , because $f$ having strong fluctuations in time means that the importance of past time values decays more rapidly.

Constructing a linear least squares system

For a given time $t$ we will construct an overdetermined linear system of equations for the node values $x_{j}:=f(p_{j},t)$ of our function $f$ , which we will solve in a weighted least square sense, that is we will transform both point and integral data into equations for the values of $f$ at the nodes $p_{j}$ . Each of the equations to be modelled gives rise to one linear equation. Another set of equations models the smoothness of our function $f$ .

Equations for the point data

For each given point datum with $t_{k}\leq t$ - future measurements are not considered giving information for the current time - we add the equation

x_{j}=y_{j}

with weight

\phi (t-t_{j})

The weight is choosen such that it has its maximal value when $t=t_{j}$ and decays over time.

Equations for the volume data

To give an equation for a given volume datum we first have to approximate the integral by point values of the function $f$ . We use the two-dimensional variant of the trapezoidal rule, that is we approximate the integral of a function $g$ over a triangle $\Delta _{i}$ by the volume of the trapezoidal body determined by the values of $g$ at the nodes of $\Delta _{i}$ , hence

\int _{\Delta _{i}}g(x)\,dx\approx {\frac {1}{3}}|\Delta _{i}|\sum _{\alpha =1}^{3}g(p_{i,\alpha })

where $|\Delta _{i}|$ denotes the area of the triangle.

Hence, we get the equation

${\frac {1}{3}}|\Delta _{i(\ell )}|\cdot \sum _{\alpha =1}^{3}x_{p(i(\ell ),\alpha )}=I_{\ell }$ with weight $\phi (t-t_{\ell })$ ,

where the weight is choosen exactly as in the point value case.

Smoothness equations

If we have only a little set of data points, our system will be underdertermined and the calculated function may have strong fluctuations. To prevent that, we add for each node the equation

${\frac {1}{N(j)}}\sum _{\beta =1}^{N(j)}x_{n(j,\beta )}-x_{j}=0,$ with weight $0.1$

stating that we want the value at each note to be approximately the mean of its neighbouring values.

The resulting linear system $Ax=b$ is solved in a weighted least square sense, giving us point values for $f$ which we interpolate by linear functions.

Implementation

We represent the triangulation by a structure consisting of the nodes, given by their coordinates and the triangles, which are given by the indices of their vertices. From that we calculate the neighbours of each node and store this also in the lattice structure. Given the data we now assemble for each time $t$ a matrix $A$ and a right hand side $b$ collecting the equations.

The convex combination approach

As a second possibility to attack our problem, we propose an algorithm that handles point and integral data as distinct interpolation problems and afterwards takes a convex combination of the to solution to obtain a solution to the full problem.

The pointwise interpolation

At each given point, we first combine all data we have at this point into one by taking a convex combination of the measured values, where each value is weighted with $\phi (t-t_{j})$ as in the first algorithm. That is to a node $p_{j}$ in our lattice, we look at all values given for that point with $t_{k}\leq t$ and form the average

$q_{j}:={\frac {\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{k})y_{k}}{\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{j})}}$

of the values measured at this particular point.

Interpolation of the integral values

The same way as we interpolated the point values, we now interpolate the given integral values. Hence, we first assign to each triangle of our lattice an unique integral value by weighting the given ones. Afterwards, we interpolate this values in a linear sense, assigning the value divided by the volume of the particular cell to the Center of Gravity of each triangle. That is, we approximate the integral by means of the midpoint rule, i. e.

$\int _{\Delta }f(x)\,dx\approx f(p_{\Delta })\cdot |\Delta |$

where $p_{\Delta }$ denotes $\Delta$ 's center of gravity.

The value, that we want the integral to have is calculated exactly as in the point case, that is we have assign

$I_{\Delta }:={\frac {\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{k})I_{k}}{\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{j})}}$

Obtaining the solution to the full problem

Let $f_{p}\colon \Omega \to \mathbb {R}$ denote the interpolating function obtained in step one and $f_{i}\colon \Omega \to \mathbb {R}$ denote the interpolation of the integral values. We now let $f:={\frac {n_{p}}{n_{p}+n_{i}}}f_{p}+{\frac {n_{i}}{n_{p}+n_{i}}}f_{i}$ where $n_{p}$ and $n_{i}$ are the number of given point and integral values respectively.

Another idea that did not work out

As a second possibility to attack our problem, we propose an algorithm starts by interpolating the point data and than matches the integral data by a refinement of the lattice. The value at the center of gravity of each cell is choosen in a way to match the integral data.

The pointwise interpolation

At each given point, we first combine all data we have at this point into one by taking a convex combination of the measured values, where each value is weighted with $\phi (t-t_{j})$ as in the first algorithm. That is to a node $p_{j}$ in our lattice, we look at all values given for that point with $t_{k}\leq t$ and form the average

$q_{j}:={\frac {\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{k})y_{k}}{\sum _{k:t_{k}\leq t,j(k)=j}\phi (t-t_{j})}}$

of the values measured at this particular point.

Interpolation of the integral values

The same way as we interpolated the point values, we now interpolate the given integral values. Hence, we first assign to each triangle of our lattice an unique integral value by weighting the given ones. Afterwards, we interpolate this values in a linear sense, assigning the value divided by the volume of the particular cell to the Center of Gravity of each triangle. That is, we approximate the integral by means of the midpoint rule, i. e.

$\int _{\Delta }f(x)\,dx\approx f(p_{\Delta })\cdot |\Delta |$

where $p_{\Delta }$ denotes $\Delta$ 's center of gravity.

Full solution

We then choose a function $f_{3}$ that interpolates both the point and the integral values. That did not lead to a good result, because the fluctiation of the function $f_{3}$ was so large, that it could not be regarded as an approximation of a smooth function.

Examples

As an example we generate for a simple triangulation of $\Omega =[0,1]^{2}$ , over 10 seconds 4000 random point data and 1000 random volume data. As timestep we choose $\tau =0.1$ , that is we interpolate our data every $0.1$ seconds. As random data, we start with a simple function, here

f_{t}(x)=f_{t}(x_{1},x_{2})=\sin(2\pi (x_{1}+t))+\cos(\pi (x_{2}+t))

,

that creates a diagonal shift of the graph of $f_{0}$ in time. Furthermore we add some normally distributed noise. The points and time values to interpolate are also generated randomly.

.

Conclusion

Extending the ideas form^[2], we allow for integral values to be given. Representing the data points as a weighted linear squares system would allow us to use an iterative least square solver to reuse the calculations we have already done, for example the solver LSQR discussed by^[5].

Source code files

References

↑ Franklin, Janet (1995). "Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients". Progress in Physical Geography: Earth and Environment 29 (4): 474-499. doi:10.1177/030913339501900403. http://journals.sagepub.com/doi/10.1177/030913339501900403.
↑ ^2.0 ^2.1 Li, Lixin; Revesz, Peter (2004-05). "Interpolation methods for spatio-temporal geographic data". Computers, Environment and Urban Systems 28 (3): 201–227. doi:10.1016/s0198-9715(03)00018-8. ISSN 0198-9715. https://doi.org/10.1016/S0198-9715(03)00018-8.
↑ Schouten, Leo J.; Straatman, Huub; Kiemeney, Lambertus a. L. M.; Gimbrère, Charles H. F.; Verbeek, André L. M. (1994-12-01). "The Capture-Recapture Method for Estimation of Cancer Registry Completeness: A Useful Tool?". International Journal of Epidemiology 23 (6): 1111–1116. doi:10.1093/ije/23.6.1111. ISSN 0300-5771. https://academic.oup.com/ije/article/23/6/1111/660321.
↑ Brenner, Hermann (1995-01). "USE AND LIMITATIONS OF THE CAPTURE-RECAPTURE METHOD IN DISEASE MONITORING WITH TWO DEPENDENT SOURCES". Epidemiology 6 (1): 42–48. doi:10.1097/00001648-199501000-00009. ISSN 1044-3983. https://insights.ovid.com/crossref?an=00001648-199501000-00009.
↑ Paige, Christopher C.; Saunders, Michael A. (1982-03-01). "LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares". ACM Transactions on Mathematical Software (TOMS) 8 (1): 43–71. doi:10.1145/355984.355989. ISSN 0098-3500. http://dl.acm.org/citation.cfm?id=355984.355989.

[1] Franklin, Janet (1995). "Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients". Progress in Physical Geography: Earth and Environment 29 (4): 474-499. doi:10.1177/030913339501900403. http://journals.sagepub.com/doi/10.1177/030913339501900403.

[:0-2] 2.0 ^2.1 Li, Lixin; Revesz, Peter (2004-05). "Interpolation methods for spatio-temporal geographic data". Computers, Environment and Urban Systems 28 (3): 201–227. doi:10.1016/s0198-9715(03)00018-8. ISSN 0198-9715. https://doi.org/10.1016/S0198-9715(03)00018-8.

[3] Schouten, Leo J.; Straatman, Huub; Kiemeney, Lambertus a. L. M.; Gimbrère, Charles H. F.; Verbeek, André L. M. (1994-12-01). "The Capture-Recapture Method for Estimation of Cancer Registry Completeness: A Useful Tool?". International Journal of Epidemiology 23 (6): 1111–1116. doi:10.1093/ije/23.6.1111. ISSN 0300-5771. https://academic.oup.com/ije/article/23/6/1111/660321.

[4] Brenner, Hermann (1995-01). "USE AND LIMITATIONS OF THE CAPTURE-RECAPTURE METHOD IN DISEASE MONITORING WITH TWO DEPENDENT SOURCES". Epidemiology 6 (1): 42–48. doi:10.1097/00001648-199501000-00009. ISSN 1044-3983. https://insights.ovid.com/crossref?an=00001648-199501000-00009.

[5] Paige, Christopher C.; Saunders, Michael A. (1982-03-01). "LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares". ACM Transactions on Mathematical Software (TOMS) 8 (1): 43–71. doi:10.1145/355984.355989. ISSN 0098-3500. http://dl.acm.org/citation.cfm?id=355984.355989.

[1]

[2]

[3]

[4]

[5]

Introduction

Setting and Notations

Example: Single Point Data Collection

Combine Different Data Collections

The least squares Approach

Constructing a linear least squares system

Equations for the point data

Equations for the volume data

Smoothness equations

Implementation

The convex combination approach

The pointwise interpolation

Interpolation of the integral values

Obtaining the solution to the full problem

Another idea that did not work out

The pointwise interpolation

Interpolation of the integral values

Full solution

Examples

Conclusion

Source code files

See also

References