Interpolation of sparse pointwise and integral geographical Data over Time

From Wikiversity
Jump to navigation Jump to search

Interpolation of sparse pointwise and integral geographical Data over Time



Martin Papke*

  • *Universität Koblenz-Landau, Germany

Abstract

Geographical data, for example distribution densities of species as discussed in [1], often are given both by pointwise and integral values. We aim to give two interpolation algorithm which can handle both data types together. As these data tend to change over time, we will extend our algorithm to allow a timestamp attached to the data and give newer data more weight in the interpolation. This will allow us to model time dependent functions and data. We will compare both approaches and discuss their advantages and disadvantages. The possibility of giving both pointwise and integral data extends the basic ideas from [2], allowing for integral data points.

Keywords: Interpolation, Applied Mathematics, Numerical Mathematics

Introduction[edit | edit source]

Let be a time set. Assume we have collected data about an unknown mapping that changes over time. is the mapping at time . The ocollected data indicates that was collected at time . As time set we use with denotes the current time stamp. All are time stamp of the future. Collected data has always a time stamp .

For the following sections we use the time index as last argument of the function and use as . If does not change over time we have for all . If the domain of does not contain the time set , that the function is regarded as static of time.

We propose two algorithms to handle sparse pointwise and integral data as they arrise in geographical problems. The first algorithm uses a least squares idea, the second one has convex combinations as its grounding idea.

Setting and Notations[edit | edit source]

We consider a triangulation of a region . We denote the set of nodes of by . For each triangle , we denote by , the indices of 's nodes, that is the nodes of , are , and . For each node , we denote by , , the neighbours of , that is the nodes which are connected to .

We are given two types of data for an unknown function . Firstly, point values , which are measured values of at a node . Secondly we are given integrals of over triangles .

Our aim is to construct a function which interpolates the given values in the sense that


taking into account that our data are for example values of a density distribution of a species, hence give only locally.

Example: Single Point Data Collection[edit | edit source]

E.g. at single point in a habitat a camera is placed, that takes automatically snapshots of all animals that pass this point. The aggregated data provides pointwise values of the density distribution of a species. Including the time index the collected data provides pointwise data of the density distribution for every month.

Combine Different Data Collections[edit | edit source]

Two different data collections at the same time for a given population density could indicate different population estimations, which can arrise for example when using the capture-recapture method for estimating populations, as done in[3] and[4].

The least squares Approach[edit | edit source]

The first algorithm we propose to takle this kind of problem is a least squares approach, which allows for more than one data point for a fixed site in our lattice.

As geographical data tend to be time dependent, we want to add a time stamp to each data point and consider a time dependent function . We therefore replace the above equations by


At a given time we will only consider values with a timestamp , which will allow live data being considered. The importance of measurements decays over time, we attach to each equation a weight, given for a data point with time stamp by denoting by the weighting function , where denotes the speed of decay. has to be choosen in a way that addapts to the given problem, a possible choice could be an estimate for the time derivative of the model function , because having strong fluctuations in time means that the importance of past time values decays more rapidly.

Constructing a linear least squares system[edit | edit source]

For a given time we will construct an overdetermined linear system of equations for the node values of our function , which we will solve in a weighted least square sense, that is we will transform both point and integral data into equations for the values of at the nodes . Each of the equations to be modelled gives rise to one linear equation. Another set of equations models the smoothness of our function .

Equations for the point data[edit | edit source]

For each given point datum with - future measurements are not considered giving information for the current time - we add the equation

with weight

The weight is choosen such that it has its maximal value when and decays over time.

Equations for the volume data[edit | edit source]

To give an equation for a given volume datum we first have to approximate the integral by point values of the function . We use the two-dimensional variant of the trapezoidal rule, that is we approximate the integral of a function over a triangle by the volume of the trapezoidal body determined by the values of at the nodes of , hence

where denotes the area of the triangle.

Hence, we get the equation

with weight ,

where the weight is choosen exactly as in the point value case.

Smoothness equations[edit | edit source]

If we have only a little set of data points, our system will be underdertermined and the calculated function may have strong fluctuations. To prevent that, we add for each node the equation

with weight

stating that we want the value at each note to be approximately the mean of its neighbouring values.

The resulting linear system is solved in a weighted least square sense, giving us point values for which we interpolate by linear functions.

Implementation[edit | edit source]

We represent the triangulation by a structure consisting of the nodes, given by their coordinates and the triangles, which are given by the indices of their vertices. From that we calculate the neighbours of each node and store this also in the lattice structure. Given the data we now assemble for each time a matrix and a right hand side collecting the equations.

The convex combination approach[edit | edit source]

As a second possibility to attack our problem, we propose an algorithm that handles point and integral data as distinct interpolation problems and afterwards takes a convex combination of the to solution to obtain a solution to the full problem.

The pointwise interpolation[edit | edit source]

At each given point, we first combine all data we have at this point into one by taking a convex combination of the measured values, where each value is weighted with as in the first algorithm. That is to a node in our lattice, we look at all values given for that point with and form the average

of the values measured at this particular point.

Interpolation of the integral values[edit | edit source]

The same way as we interpolated the point values, we now interpolate the given integral values. Hence, we first assign to each triangle of our lattice an unique integral value by weighting the given ones. Afterwards, we interpolate this values in a linear sense, assigning the value divided by the volume of the particular cell to the Center of Gravity of each triangle. That is, we approximate the integral by means of the midpoint rule, i. e.

where denotes 's center of gravity.

The value, that we want the integral to have is calculated exactly as in the point case, that is we have assign

Obtaining the solution to the full problem[edit | edit source]

Let denote the interpolating function obtained in step one and denote the interpolation of the integral values. We now let where and are the number of given point and integral values respectively.

Another idea that did not work out[edit | edit source]

As a second possibility to attack our problem, we propose an algorithm starts by interpolating the point data and than matches the integral data by a refinement of the lattice. The value at the center of gravity of each cell is choosen in a way to match the integral data.

The pointwise interpolation[edit | edit source]

At each given point, we first combine all data we have at this point into one by taking a convex combination of the measured values, where each value is weighted with as in the first algorithm. That is to a node in our lattice, we look at all values given for that point with and form the average

of the values measured at this particular point.

Interpolation of the integral values[edit | edit source]

The same way as we interpolated the point values, we now interpolate the given integral values. Hence, we first assign to each triangle of our lattice an unique integral value by weighting the given ones. Afterwards, we interpolate this values in a linear sense, assigning the value divided by the volume of the particular cell to the Center of Gravity of each triangle. That is, we approximate the integral by means of the midpoint rule, i. e.

where denotes 's center of gravity.

Full solution[edit | edit source]

We then choose a function that interpolates both the point and the integral values. That did not lead to a good result, because the fluctiation of the function was so large, that it could not be regarded as an approximation of a smooth function.

Examples[edit | edit source]

As an example we generate for a simple triangulation of , over 10 seconds 4000 random point data and 1000 random volume data. As timestep we choose , that is we interpolate our data every seconds. As random data, we start with a simple function, here

,

that creates a diagonal shift of the graph of in time. Furthermore we add some normally distributed noise. The points and time values to interpolate are also generated randomly.

.

Conclusion[edit | edit source]

Extending the ideas form[2], we allow for integral values to be given. Representing the data points as a weighted linear squares system would allow us to use an iterative least square solver to reuse the calculations we have already done, for example the solver LSQR discussed by[5].

Source code files[edit | edit source]

See also[edit | edit source]

References[edit | edit source]

  1. Franklin, Janet (1995). "Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients". Progress in Physical Geography: Earth and Environment 29 (4): 474-499. doi:10.1177/030913339501900403. http://journals.sagepub.com/doi/10.1177/030913339501900403. 
  2. 2.0 2.1 Li, Lixin; Revesz, Peter (2004-05). "Interpolation methods for spatio-temporal geographic data". Computers, Environment and Urban Systems 28 (3): 201–227. doi:10.1016/s0198-9715(03)00018-8. ISSN 0198-9715. https://doi.org/10.1016/S0198-9715(03)00018-8. 
  3. Schouten, Leo J.; Straatman, Huub; Kiemeney, Lambertus a. L. M.; Gimbrère, Charles H. F.; Verbeek, André L. M. (1994-12-01). "The Capture-Recapture Method for Estimation of Cancer Registry Completeness: A Useful Tool?". International Journal of Epidemiology 23 (6): 1111–1116. doi:10.1093/ije/23.6.1111. ISSN 0300-5771. https://academic.oup.com/ije/article/23/6/1111/660321. 
  4. Brenner, Hermann (1995-01). "USE AND LIMITATIONS OF THE CAPTURE-RECAPTURE METHOD IN DISEASE MONITORING WITH TWO DEPENDENT SOURCES". Epidemiology 6 (1): 42–48. doi:10.1097/00001648-199501000-00009. ISSN 1044-3983. https://insights.ovid.com/crossref?an=00001648-199501000-00009. 
  5. Paige, Christopher C.; Saunders, Michael A. (1982-03-01). "LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares". ACM Transactions on Mathematical Software (TOMS) 8 (1): 43–71. doi:10.1145/355984.355989. ISSN 0098-3500. http://dl.acm.org/citation.cfm?id=355984.355989.