Local convex hull

Last updated

Local convex hull (LoCoH) is a method for estimating size of the home range of an animal or a group of animals (e.g. a pack of wolves, a pride of lions, or herd of buffaloes), and for constructing a utilization distribution. [1] [2] The latter is a probability distribution that represents the probabilities of finding an animal within a given area of its home range at any point in time; or, more generally, at points in time for which the utilization distribution has been constructed. In particular, different utilization distributions can be constructed from data pertaining to particular periods of a diurnal or seasonal cycle.

Contents

Utilization distributions are constructed from data providing the location of an individual or several individuals in space at different points in time by associating a local distribution function with each point and then summing and normalizing these local distribution functions to obtain a distribution function that pertains to the data as a whole. [3] [4] [5] [6] If the local distribution function is a parametric distribution, such as a symmetric bivariate normal distribution then the method is referred to as a kernel method, but more correctly should be designated as a parametric kernel method. On the other hand, if the local kernel element associated with each point is a local convex polygon constructed from the point and its k-1 nearest neighbors, then the method is nonparametric and referred to as a k-LoCoH or fixed point LoCoH method. This is in contrast to r-LoCoH (fixed radius) and a-LoCoH (adaptive radius) methods.

In the case of LoCoH utilization distribution constructions, the home range can be taken as the outer boundary of the distribution (i.e. the 100th percentile). In the case of utilization distributions constructed from unbounded kernel elements, such as bivariate normal distributions, the utilization distribution is itself unbounded. In this case the most often used convention is to regard the 95th percentile of the utilization distribution as the boundary of the home range.

To construct a k-LoCoH utilization distribution:

  1. Locate the k  1 nearest neighbors for each point in the dataset.
  2. Construct a convex hull for each set of nearest neighbors and the original data point.
  3. Merge these hulls together from smallest to largest.
  4. Divide the merged hulls into isopleths where the 10% isopleth contains 10% of the original data points, the 100% isopleth contains all the points, etc.

In this sense, LoCoH methods are a generalization of the home range estimator method based on constructing the minimum convex polygon (MCP) associated with the data. The LoCoH method has a number of advantages over parametric kernel methods. In particular:

LoCoH has a number of implementations including a now-defunct LoCoH Web Application.

LoCoH was formerly known as k-NNCH, for k-nearest neighbor convex hulls. It has recently been shown that the a-LoCoH is the best of the three LoCoH methods mentioned above (see Getz et al. in the references below).

T-LoCoH

T-LoCoH (time local convex hull) is an enhanced version of LoCoH which incorporates time into the home range construction. [7] [8] Time is incorporated into the algorithm via an alternative measure of 'distance', called time scaled distance (TSD), which combines the spatial distance and temporal distance between any two points. This presumes that each point has a time stamp associated with it, as with GPS data. T-LoCoH uses TSD rather than Euclidean distance to identify each point's nearest neighbors, resulting in hulls that are localized in both space and time. Hulls are then sorted and progressively unioned into isopleths. Like LoCoH, UDs created by T-LoCoH generally do a good job modeling sharp edges in habitat such as water bodies; in addition T-LoCoH isopleths can delineate temporal partitions of space use. [7] T-LoCoH also offers additional sorting options for hulls, allowing it to generate isopleths that differentiate internal space by both intensity of use (the conventional UD) and a variety of behavioral proxies, including directionality and time use metrics.

Time scaled distance

The TSD for any two locations i and j separated in time by is given by

Conceptually, TSD transforms the period of time between two observations into spatial units by estimating how far the individual could have traveled during the time period if it had been moving at its maximum observed speed. This theoretical movement distance is then mapped onto a third axis of space, and distance calculated using standard Euclidean distance equations. The TSD equation also features a scaling parameter s which controls the degree to which the temporal difference scales to spatial units. When s=0, the temporal distance drops out and TSD is equivalent to Euclidean distance (thus T-LoCoH is backward compatible with LoCoH [8] ). As s increases, the temporal distance becomes more and more influential, eventually swamping out the distance in space. The TSD metric is not based on a mechanistic or diffusion model of movement, but merely serves to generate hulls that are local in space and/or time. [7]

Related Research Articles

Convex hull The smallest convex set containing a given set

In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space, or equivalently as the set of all convex combinations of points in the subset. For a bounded subset of the plane, the convex hull may be visualized as the shape enclosed by a rubber band stretched around the subset.

Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems are also considered to be part of computational geometry. While modern computational geometry is a recent development, it is one of the oldest fields of computing with a history stretching back to antiquity.

Nonlinear dimensionality reduction Summary of algorithms for nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming.

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric method proposed by Thomas Cover used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

Kernel density estimation

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy.

Spatial ecology studies the ultimate distributional or spatial unit occupied by a species. In a particular habitat shared by several species, each of the species is usually confined to its own microhabitat or spatial niche because two species in the same general territory cannot usually occupy the same ecological niche for any significant length of time.

Semi-supervised learning

Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning and supervised learning.

Nonparametric regression

Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship between predictors and dependent variable. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.

The term kernel is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics.

A home range is the area in which an animal lives and moves on a periodic basis. It is related to the concept of an animal's territory which is the area that is actively defended. The concept of a home range was introduced by W. H. Burt in 1943. He drew maps showing where the animal had been observed at different times. An associated concept is the utilization distribution which examines where the animal is likely to be at any given time. Data for mapping a home range used to be gathered by careful observation, but nowadays, the animal is fitted with a transmission collar or similar GPS device.

A utilization distribution is a probability distribution giving the probability density that an animal is found at a given point in space. It is estimated from data sampling the location of an individual or individuals in space over a period of time using, for example, telemetry or GPS based methods.

Isomap is a nonlinear dimensionality reduction method. It is one of several widely used low-dimensional embedding methods. Isomap is used for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points. The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate of each data point’s neighbors on the manifold. Isomap is highly efficient and generally applicable to a broad range of data sources and dimensionalities.

Neighbourhood components analysis is a supervised learning method for classifying multivariate data into distinct classes according to a given distance metric over the data. Functionally, it serves the same purposes as the K-nearest neighbors algorithm, and makes direct use of a related concept termed stochastic nearest neighbours.

There are many types of artificial neural networks (ANN).

Point Cloud Library

The Point Cloud Library (PCL) is an open-source library of algorithms for point cloud processing tasks and 3D geometry processing, such as occur in three-dimensional computer vision. The library contains algorithms for filtering, feature estimation, surface reconstruction, 3D registration, model fitting, object recognition, and segmentation. Each module is implemented as a smaller library that can be compiled separately. PCL has its own data format for storing point clouds - PCD, but also allows datasets to be loaded and saved in many other formats. It is written in C++ and released under the BSD license.

Crime hotspots are areas on a map that have high crime intensity. They are developed for researchers and analysts to examine geographic areas in relation to crime. Researchers and theorists examine the occurrence of hotspots in certain areas and why they happen, and analysts examine the techniques used to perform the research Developing maps that contain hotspots are becoming a critical and influential tool for policing; they help develop knowledge and understanding of different areas in a city and possibly why crime occurs there.

t-distributed stochastic neighbor embedding (t-SNE) is a machine learning algorithm for visualization based on Stochastic Neighbor Embedding originally developed by Sam Roweis and Geoffrey Hinton, where Laurens van der Maaten proposed the t-distributed variant. It is a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

References

  1. Getz, W. M. and C. C. Wilmers, 2004. A local nearest-neighbor convex-hull construction of home ranges and utilization distributions. Ecography 27: 489-505.View PDF
  2. Getz, W.M, S. Fortmann-Roe, P. C. Cross, A. J. Lyons, S. J. Ryan, C.C. Wilmers, PLoS ONE 2(2): e207. doi : 10.1371/journal.pone.0000207. LoCoH: nonparametric kernel methods for constructing home ranges and utilization distributions. View PDF
  3. Silverman BW. (1986) Density estimation for statistics and data analysis. London: Chapman and Hall. 176 p.
  4. Worton BJ. (1987). A review of models of home range for animal movement. Ecological Modelling, 38: 277–298.
  5. Worton BJ. (1989) Kernel methods for estimating the utilization distribution in home-range studies. Ecology 70: 164–168.
  6. Seaman DE, Powell RA. (1996) An evaluation of the accuracy of kernel density estimators for home range analysis. Ecology 77: 2075–2085.
  7. 1 2 3 Lyons, A., Turner, W.C., and WM Getz. 2013. Home range plus: A space-time characterization of movement over real landscapes. BMC Movement Ecology 1:2. doi : 10.1186/2051-3933-1-2.
  8. 1 2 http://tlocoh.r-forge.r-project.org