Simplicial depth

Last updated
Simplicial depth with respect to the six red sample points, using the modified definition of Burr et al. The large black numbers are the depths within each region, and the small blue numbers are the depths along the blue line segments. Simplicial depth.svg
Simplicial depth with respect to the six red sample points, using the modified definition of Burr et al. The large black numbers are the depths within each region, and the small blue numbers are the depths along the blue line segments.

In robust statistics and computational geometry, simplicial depth is a measure of central tendency determined by the simplices that contain a given point. For the Euclidean plane, it counts the number of triangles of sample points that contain a given point.

Contents

Definition

The simplicial depth of a point in -dimensional Euclidean space, with respect to a set of sample points in that space, is the number of -dimensional simplices (the convex hulls of sets of sample points) that contain . The same notion can be generalized to any probability distribution on points of the plane, not just the empirical distribution given by a set of sample points, by defining the depth to be the probability that a randomly chosen -tuple of points has a convex hull that contains . This probability can be calculated, from the number of simplices that contain , by dividing by where is the number of sample points. [L88] [L90]

Under the standard definition of simplicial depth, the simplices that have on their boundaries count equally much as the simplices with in their interiors. In order to avoid some problematic behavior of this definition, Burr, Rafalin & Souvaine (2004) proposed a modified definition of simplicial depth, in which the simplices with on their boundaries count only half as much. Equivalently, their definition is the average of the number of open simplices and the number of closed simplices that contain . [BRS]

Properties

Simplicial depth is robust against outliers: if a set of sample points is represented by the point of maximum depth, then up to a constant fraction of the sample points can be arbitrarily corrupted without significantly changing the location of the representative point. It is also invariant under affine transformations of the plane. [D] [ZS] [BRS]

However, simplicial depth fails to have some other desirable properties for robust measures of central tendency. When applied to centrally symmetric distributions, it is not necessarily the case that there is a unique point of maximum depth in the center of the distribution. And, along a ray from the point of maximum depth, it is not necessarily the case that the simplicial depth decreases monotonically. [ZS] [BRS]

Algorithms

For sets of sample points in the Euclidean plane (), the simplicial depth of any other point can be computed in time , [KM] [GSW] [RR] optimal in some models of computation. [ACG] In three dimensions, the same problem can be solved in time . [CO]

It possible to construct a data structure using ε-nets that can approximate the simplicial depth of a query point (given either a fixed set of samples, or a set of samples undergoing point insertions) in near-constant time per query, in any dimension, with an approximation whose error is a small fraction of the total number of triangles determined by the samples. [BCE] In two dimensions, a more accurate approximation algorithm is known, for which the approximation error is a small multiple of the simplicial depth itself. The same methods also lead to fast approximation algorithms in higher dimensions. [ASS]

Spherical depth, is defined to be the probability that a point is contained inside a random closed hyperball obtained from a pair of points from . While the time complexity of most other data depths grows exponentially, the spherical depth grows only linearly in the dimension – the straightforward algorithm for computing the spherical depth takes . Simplicial depth (SD) is linearly bounded by spherical depth (). [BS]

Related Research Articles

<span class="mw-page-title-main">Delaunay triangulation</span> Triangulation method

In mathematics and computational geometry, a Delaunay triangulation for a given set P of discrete points in a general position is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P). Delaunay triangulations maximize the minimum of all the angles of the triangles in the triangulation; they tend to avoid sliver triangles. The triangulation is named after Boris Delaunay for his work on this topic from 1934.

<span class="mw-page-title-main">Simplex</span> Multi-dimensional generalization of triangle

In geometry, a simplex is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. For example,

<span class="mw-page-title-main">Convex hull</span> Smallest convex set containing a given set

In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space, or equivalently as the set of all convex combinations of points in the subset. For a bounded subset of the plane, the convex hull may be visualized as the shape enclosed by a rubber band stretched around the subset.

<span class="mw-page-title-main">Simplicial complex</span>

In mathematics, a simplicial complex is a set composed of points, line segments, triangles, and their n-dimensional counterparts. Simplicial complexes should not be confused with the more abstract notion of a simplicial set appearing in modern simplicial homotopy theory. The purely combinatorial counterpart to a simplicial complex is an abstract simplicial complex. To distinguish a simplicial from an abstract simplicial complex, the former is often called a geometric simplicial complex.

In mathematics, a building is a combinatorial and geometric structure which simultaneously generalizes certain aspects of flag manifolds, finite projective planes, and Riemannian symmetric spaces. Buildings were initially introduced by Jacques Tits as a means to understand the structure of exceptional groups of Lie type. The more specialized theory of Bruhat–Tits buildings plays a role in the study of p-adic Lie groups analogous to that of the theory of symmetric spaces in the theory of Lie groups.

In the mathematical disciplines of topology and geometry, an orbifold is a generalization of a manifold. Roughly speaking, an orbifold is a topological space which is locally a finite group quotient of a Euclidean space.

<span class="mw-page-title-main">Barycentric subdivision</span>

In mathematics, the barycentric subdivision is a standard way to subdivide a given simplex into smaller ones. Its extension on simplicial complexes is a canonical method to refine them. Therefore, the barycentric subdivision is an important tool in algebraic topology.

<span class="mw-page-title-main">Arrangement of lines</span> Subdivision of the plane by lines

In geometry, an arrangement of lines is the subdivision of the plane formed by a collection of lines. Bounds on the complexity of arrangements have been studied in discrete geometry, and computational geometers have found algorithms for the efficient construction of arrangements.

<span class="mw-page-title-main">Euclidean minimum spanning tree</span> Shortest network connecting points

A Euclidean minimum spanning tree of a finite set of points in the Euclidean plane or higher-dimensional Euclidean space connects the points by a system of line segments with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the minimum spanning tree of a complete graph with the points as vertices and the Euclidean distances between points as edge weights.

In geometry, a triangulation is a subdivision of a planar object into triangles, and by extension the subdivision of a higher-dimension geometric object into simplices. Triangulations of a three-dimensional volume would involve subdividing it into tetrahedra packed together.

<span class="mw-page-title-main">Triangulation (topology)</span>

In mathematics, triangulation describes the replacement of topological spaces by piecewise linear spaces, i.e. the choice of a homeomorphism in a suitable simplicial complex. Spaces being homeomorphic to a simplicial complex are called triangulable. Triangulation has various uses in different branches of mathematics, for instance in algebraic topology, in complex analysis or in modeling.

<span class="mw-page-title-main">Convex polytope</span> Convex hull of a finite set of points in a Euclidean space

A convex polytope is a special case of a polytope, having the additional property that it is also a convex set contained in the -dimensional Euclidean space . Most texts use the term "polytope" for a bounded convex polytope, and the word "polyhedron" for the more general, possibly unbounded object. Others allow polytopes to be unbounded. The terms "bounded/unbounded convex polytope" will be used below whenever the boundedness is critical to the discussed issue. Yet other texts identify a convex polytope with its boundary.

Medoids are representative objects of a data set or a cluster within a data set whose sum of dissimilarities to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always restricted to be members of the data set. Medoids are most commonly used on data when a mean or centroid cannot be defined, such as graphs. They are also used in contexts where the centroid is not representative of the dataset like in images and 3-D trajectories and gene expression. These are also of interest while wanting to find a representative using some distance other than squared euclidean distance.

<span class="mw-page-title-main">Lloyd's algorithm</span>

In electrical engineering and computer science, Lloyd's algorithm, also known as Voronoi iteration or relaxation, is an algorithm named after Stuart P. Lloyd for finding evenly spaced sets of points in subsets of Euclidean spaces and partitions of these subsets into well-shaped and uniformly sized convex cells. Like the closely related k-means clustering algorithm, it repeatedly finds the centroid of each set in the partition and then re-partitions the input according to which of these centroids is closest. In this setting, the mean operation is an integral over a region of space, and the nearest centroid operation results in Voronoi diagrams.

<span class="mw-page-title-main">Closest pair of points problem</span>

The closest pair of points problem or closest pair problem is a problem of computational geometry: given points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms.

In statistics and computational geometry, the notion of centerpoint is a generalization of the median to data in higher-dimensional Euclidean space. Given a set of points in d-dimensional space, a centerpoint of the set is a point such that any hyperplane that goes through that point divides the set of points in two roughly equal subsets: the smaller part should have at least a 1/(d + 1) fraction of the points. Like the median, a centerpoint need not be one of the data points. Every non-empty set of points has at least one centerpoint.

An ε-net in computational geometry is the approximation of a general set by a collection of simpler subsets. In probability theory it is the approximation of one probability distribution by another.

In geometry, the moment curve is an algebraic curve in d-dimensional Euclidean space given by the set of points with Cartesian coordinates of the form

In mathematics, an abstract cell complex is an abstract set with Alexandrov topology in which a non-negative integer number called dimension is assigned to each point. The complex is called “abstract” since its points, which are called “cells”, are not subsets of a Hausdorff space as is the case in Euclidean and CW complexes. Abstract cell complexes play an important role in image analysis and computer graphics.

In geometry, it is an unsolved conjecture of Hugo Hadwiger that every simplex can be dissected into orthoschemes, using a number of orthoschemes bounded by a function of the dimension of the simplex. If true, then more generally every convex polytope could be dissected into orthoschemes.

References

ASS.
Afshani, Peyman; Sheehy, Donald R.; Stein, Yannik (2015), Approximating the simplicial depth, arXiv: 1512.04856 , Bibcode:2015arXiv151204856A
ACG.
Aloupis, Greg; Cortés, Carmen; Gómez, Francisco; Soss, Michael; Toussaint, Godfried (2002), "Lower bounds for computing statistical depth", Computational Statistics & Data Analysis , 40 (2): 223–229, doi:10.1016/S0167-9473(02)00032-4, MR   1924007, S2CID   94060939
BCE.
Bagchi, Amitabha; Chaudhary, Amitabh; Eppstein, David; Goodrich, Michael T. (2007), "Deterministic sampling and range counting in geometric data streams", ACM Transactions on Algorithms , 3 (2): Art. 16, 18, arXiv: cs/0307027 , doi:10.1145/1240233.1240239, MR   2335299, S2CID   123315817
BRS.
Burr, Michael A.; Rafalin, Eynat; Souvaine, Diane L. (2004), "Simplicial depth: An improved definition, analysis, and efficiency for the finite sample case" (PDF), Proceedings of the 16th Canadian Conference on Computational Geometry, CCCG'04, Concordia University, Montréal, Québec, Canada, August 9-11, 2004, pp. 136–139
BS.
Bremner, David; Shahsavarifar, Rasoul (2017), An Optimal Algorithm for Computing the Spherical Depth of Points in the Plane, arXiv: 1702.07399 , Bibcode:2017arXiv170207399B
CO.
Cheng, Andrew Y.; Ouyang, Ming (2001), "On algorithms for simplicial depth", Proceedings of the 13th Canadian Conference on Computational Geometry, University of Waterloo, Ontario, Canada, August 13-15, 2001, pp. 53–56
D.
Dümbgen, Lutz (1992), "Limit theorems for the simplicial depth", Statistics & Probability Letters, 14 (2): 119–128, doi:10.1016/0167-7152(92)90075-G, MR   1173409
GSW.
Gil, Joseph; Steiger, William; Wigderson, Avi (1992), "Geometric medians", Discrete Mathematics , 108 (1–3): 37–51, doi: 10.1016/0012-365X(92)90658-3 , MR   1189827
KM.
Khuller, Samir; Mitchell, Joseph S. B. (1990), "On a triangle counting problem", Information Processing Letters , 33 (6): 319–321, doi: 10.1016/0020-0190(90)90217-L , MR   1045522
L88.
L90.
Liu, Regina Y. (1990), "On a notion of data depth based on random simplices", Annals of Statistics , 18 (1): 405–414, doi: 10.1214/aos/1176347507 , MR   1041400
RR.
Rousseeuw, Peter J.; Ruts, Ida (1996), "Algorithm AS 307: Bivariate Location Depth", Applied Statistics, 45 (4): 516, doi:10.2307/2986073, JSTOR   2986073
ZS.
Zuo, Yijun; Serfling, Robert (2000), "General notions of statistical depth function", Annals of Statistics , 28 (2): 461–482, CiteSeerX   10.1.1.27.7358 , doi:10.1214/aos/1016218226, MR   1790005