Persistent homology

Last updated
See homology for an introduction to the notation.

Persistent homology is a method for computing topological features of a space at different spatial resolutions. More persistent features are detected over a wide range of spatial scales and are deemed more likely to represent true features of the underlying space rather than artifacts of sampling, noise, or particular choice of parameters. [1]

Contents

To find the persistent homology of a space, the space must first be represented as a simplicial complex. A distance function on the underlying space corresponds to a filtration of the simplicial complex, that is a nested sequence of increasing subsets. One common method of doing this is via taking the sublevel filtration of the distance to a point cloud, or equivalently, the offset filtration on the point cloud and taking its nerve in order to get the simplicial filtration known as Čech filtration. [2] A similar construction uses a nested sequence of Vietoris–Rips complexes known as the Vietoris–Rips filtration. [3]

Definition

Formally, consider a real-valued function on a simplicial complex that is non-decreasing on increasing sequences of faces, so whenever is a face of in . Then for every the sublevel set is a subcomplex of K, and the ordering of the values of on the simplices in (which is in practice always finite) induces an ordering on the sublevel complexes that defines a filtration

When , the inclusion induces a homomorphism on the simplicial homology groups for each dimension . The persistent homology groups are the images of these homomorphisms, and the persistent Betti numbers are the ranks of those groups. [4] Persistent Betti numbers for coincide with the size function, a predecessor of persistent homology. [5]

Any filtered complex over a field can be brought by a linear transformation preserving the filtration to so called canonical form, a canonically defined direct sum of filtered complexes of two types: one-dimensional complexes with trivial differential and two-dimensional complexes with trivial homology . [6]

A persistence module over a partially ordered set is a set of vector spaces indexed by , with a linear map whenever , with equal to the identity and for . Equivalently, we may consider it as a functor from considered as a category to the category of vector spaces (or -modules). There is a classification of persistence modules over a field indexed by :

Multiplication by corresponds to moving forward one step in the persistence module. Intuitively, the free parts on the right side correspond to the homology generators that appear at filtration level and never disappear, while the torsion parts correspond to those that appear at filtration level and last for steps of the filtration (or equivalently, disappear at filtration level ). [7] [6]

Each of these two theorems allows us to uniquely represent the persistent homology of a filtered simplicial complex with a persistence barcode or persistence diagram. A barcode represents each persistent generator with a horizontal line beginning at the first filtration level where it appears, and ending at the filtration level where it disappears, while a persistence diagram plots a point for each generator with its x-coordinate the birth time and its y-coordinate the death time. Equivalently the same data is represented by Barannikov's canonical form, [6] where each generator is represented by a segment connecting the birth and the death values plotted on separate lines for each .

Stability

Persistent homology is stable in a precise sense, which provides robustness against noise. The bottleneck distance is a natural metric on the space of persistence diagrams given by

where ranges over bijections. A small perturbation in the input filtration leads to a small perturbation of its persistence diagram in the bottleneck distance. For concreteness, consider a filtration on a space homeomorphic to a simplicial complex determined by the sublevel sets of a continuous tame function . The map taking to the persistence diagram of its th homology is 1-Lipschitz with respect to the -metric on functions and the bottleneck distance on persistence diagrams.

That is, . [8]

Computation

There are various software packages for computing persistence intervals of a finite filtration. [9] The principal algorithm is based on the bringing of the filtered complex to its canonical form by upper-triangular matrices. [6]

Software packageCreatorLatest releaseRelease dateSoftware license [10] Open sourceProgramming languageFeatures
OpenPH Rodrigo Mendoza-Smith, Jared Tanner 0.0.1 25 April 2019 Apache 2.0 Yes Matlab, CUDA GPU acceleration
javaPlex Andrew Tausz, Mikael Vejdemo-Johansson, Henry Adams 4.2.5 14 March 2016 Custom Yes Java, Matlab
Dionysus Dmitriy Morozov 2.0.8 24 November 2020 Modified BSD Yes C++, Python bindings
Perseus Vidit Nanda 4.0 beta GPL Yes C++
PHAT [11] Ulrich Bauer, Michael Kerber, Jan Reininghaus1.4.1Yes C++
DIPHA Jan ReininghausYes C++
Gudhi [12] INRIA 3.4.015 December 2020 MIT/GPLv3 Yes C++, Python bindings
CTL Ryan Lewis0.2 BSD Yes C++
phom Andrew TauszYes R
TDA Brittany T. Fasy, Jisu Kim, Fabrizio Lecci, Clement Maria, Vincent Rouvreau1.516 June 2016Yes R Provides R interface for GUDHI, Dionysus and PHAT
Eirene Gregory Henselman1.0.19 March 2019 GPLv3 Yes Julia
Ripser Ulrich Bauer1.0.115 September 2016 MIT Yes C++
the Topology ToolKit Julien Tierny, Guillaume Favelier, Joshua Levine, Charles Gueunet, Michael Michaux0.9.829 July 2019 BSD Yes C++, VTK and Python bindings
libstick Stefan Huber0.227 November 2014 MIT Yes C++
Ripser++ Simon Zhang, Mengbai Xiao, and Hao Wang1.0March 2020 MIT Yes CUDA, C++, Python bindingsGPU acceleration

See also

Related Research Articles

In mathematics, the Lp spaces are function spaces defined using a natural generalization of the p-norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

In mathematics, homology is a general way of associating a sequence of algebraic objects, such as abelian groups or modules, with other mathematical objects such as topological spaces. Homology groups were originally defined in algebraic topology. Similar constructions are available in a wide variety of other contexts, such as abstract algebra, groups, Lie algebras, Galois theory, and algebraic geometry.

<span class="mw-page-title-main">Barycentric subdivision</span>

In mathematics, the barycentric subdivision is a standard way to subdivide a given simplex into smaller ones. Its extension on simplicial complexes is a canonical method to refine them. Therefore, the barycentric subdivision is an important tool in algebraic topology.

In mathematics, the Riesz–Thorin theorem, often referred to as the Riesz–Thorin interpolation theorem or the Riesz–Thorin convexity theorem, is a result about interpolation of operators. It is named after Marcel Riesz and his student G. Olof Thorin.

In mathematics, specifically algebraic topology, an Eilenberg–MacLane space is a topological space with a single nontrivial homotopy group.

<span class="mw-page-title-main">Triangulation (topology)</span>

In mathematics, triangulation describes the replacement of topological spaces by piecewise linear spaces, i.e. the choice of a homeomorphism in a suitable simplicial complex. Spaces being homeomorphic to a simplicial complex are called triangulable. Triangulation has various uses in different branches of mathematics, for instance in algebraic topology, in complex analysis or in modeling.

In mathematics, we can define norms for the elements of a vector space. When the vector space in question consists of matrices, these are called matrix norms.

In mathematics, Kan complexes and Kan fibrations are part of the theory of simplicial sets. Kan fibrations are the fibrations of the standard model category structure on simplicial sets and are therefore of fundamental importance. Kan complexes are the fibrant objects in this model category. The name is in honor of Daniel Kan.

Given a size pair where is a manifold of dimension and is an arbitrary real continuous function defined on it, the -th size functor, with , denoted by , is the functor in , where is the category of ordered real numbers, and is the category of Abelian groups, defined in the following way. For , setting , , equal to the inclusion from into , and equal to the morphism in from to ,

In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.

This is a glossary of properties and concepts in algebraic topology in mathematics.

The degree-Rips bifiltration is a simplicial filtration used in topological data analysis for analyzing the shape of point cloud data. It is a multiparameter extension of the Vietoris–Rips filtration that possesses greater stability to data outliers than single-parameter filtrations, and which is more amenable to practical computation than other multiparameter constructions. Introduced in 2015 by Lesnick and Wright, the degree-Rips bifiltration is a parameter-free and density-sensitive vehicle for performing persistent homology computations on point cloud data.

<span class="mw-page-title-main">Offset filtration</span>

The offset filtration is a growing sequence of metric balls used to detect the size and scale of topological features of a data set. The offset filtration commonly arises in persistent homology and the field of topological data analysis. Utilizing a union of balls to approximate the shape of geometric objects was first suggested by Frosini in 1992 in the context of submanifolds of Euclidean space. The construction was independently explored by Robins in 1998, and expanded to considering the collection of offsets indexed over a series of increasing scale parameters, in order to observe the stability of topological features with respect to attractors. Homological persistence as introduced in these papers by Frosini and Robins was subsequently formalized by Edelsbrunner et al. in their seminal 2002 paper Topological Persistence and Simplification. Since then, the offset filtration has become a primary example in the study of computational topology and data analysis.

A persistence module is a mathematical structure in persistent homology and topological data analysis that formally captures the persistence of topological features of an object across a range of scale parameters. A persistence module often consists of a collection of homology groups corresponding to a filtration of topological spaces, and a collection of linear maps induced by the inclusions of the filtration. The concept of a persistence module was first introduced in 2005 as an application of graded modules over polynomial rings, thus importing well-developed algebraic ideas from classical commutative algebra theory to the setting of persistent homology. Since then, persistence modules have been one of the primary algebraic structures studied in the field of applied topology.

In topological data analysis, a subdivision bifiltration is a collection of filtered simplicial complexes, typically built upon a set of data points in a metric space, that captures shape and density information about the underlying data set. The subdivision bifiltration relies on a natural filtration of the barycentric subdivision of a simplicial complex by flags of minimum dimension, which encodes density information about the metric space upon which the complex is built. The subdivision bifiltration was first introduced by Donald Sheehy in 2011 as part of his doctoral thesis as a discrete model of the multicover bifiltration, a continuous construction whose underlying framework dates back to the 1970s. In particular, Sheehy applied the construction to both the Vietoris-Rips and Čech filtrations, two common objects in the field of topological data analysis. Whereas single parameter filtrations are not robust with respect to outliers in the data, the subdivision-Rips and -Cech bifiltrations satisfy several desirable stability properties.

In topological data analysis, the Vietoris–Rips filtration is the collection of nested Vietoris–Rips complexes on a metric space created by taking the sequence of Vietoris–Rips complexes over an increasing scale parameter. Often, the Vietoris–Rips filtration is used to create a discrete, simplicial model on point cloud data embedded in an ambient metric space. The Vietoris–Rips filtration is a multiscale extension of the Vietoris–Rips complex that enables researchers to detect and track the persistence of topological features, over a range of parameters, by way of computing the persistent homology of the entire filtration. It is named after Leopold Vietoris and Eliyahu Rips.

In persistent homology, a persistent Betti number is a multiscale analog of a Betti number that tracks the number of topological features that persist over multiple scale parameters in a filtration. Whereas the classical Betti number equals the rank of the homology group, the persistent Betti number is the rank of the persistent homology group. The concept of a persistent Betti number was introduced by Herbert Edelsbrunner, David Letscher, and Afra Zomorodian in the 2002 paper Topological Persistence and Simplification, one of the seminal papers in the field of persistent homology and topological data analysis. Applications of the persistent Betti number appear in a variety of fields including data analysis, machine learning, and physics.

In persistent homology, a persistent homology group is a multiscale analog of a homology group that captures information about the evolution of topological features across a filtration of spaces. While the ordinary homology group represents nontrivial homology classes of an individual topological space, the persistent homology group tracks only those classes that remain nontrivial across multiple parameters in the underlying filtration. Analogous to the ordinary Betti number, the ranks of the persistent homology groups are known as the persistent Betti numbers. Persistent homology groups were first introduced by Herbert Edelsbrunner, David Letscher, and Afra Zomorodian in a 2002 paper Topological Persistence and Simplification, one of the foundational papers in the fields of persistent homology and topological data analysis, based largely on the persistence barcodes and the persistence algorithm, that were first described by Serguei Barannikov in the 1994 paper. Since then, the study of persistent homology groups has led to applications in data science, machine learning, materials science, biology, and economics.

In topological data analysis, a persistence barcode, sometimes shortened to barcode, is an algebraic invariant associated with a filtered chain complex or a persistence module that characterizes the stability of topological features throughout a growing family of spaces. Formally, a persistence barcode consists of a multiset of intervals in the extended real line, where the length of each interval corresponds to the lifetime of a topological feature in a filtration, usually built on a point cloud, a graph, a function, or, more generally, a simplicial complex or a chain complex. Generally, longer intervals in a barcode correspond to more robust features, whereas shorter intervals are more likely to be noise in the data. A persistence barcode is a complete invariant that captures all the topological information in a filtration. In algebraic topology, the persistence barcodes were first introduced by Sergey Barannikov in 1994 as the "canonical forms" invariants consisting of a multiset of line segments with ends on two parallel lines, and later, in geometry processing, by Gunnar Carlsson et al. in 2004.

References

  1. Carlsson, Gunnar (2009). "Topology and data". AMS Bulletin46(2), 255–308.
  2. Kerber, Michael; Sharathkumar, R. (2013). "Approximate Čech Complex in Low and High Dimensions". In Cai, Leizhen; Cheng, Siu-Wing; Lam, Tak-Wah (eds.). Algorithms and Computation. Lecture Notes in Computer Science. Vol. 8283. Berlin, Heidelberg: Springer. pp. 666–676. doi:10.1007/978-3-642-45030-3_62. ISBN   978-3-642-45030-3. S2CID   5770506.
  3. Dey, Tamal K.; Shi, Dayu; Wang, Yusu (2019-01-30). "SimBa: An Efficient Tool for Approximating Rips-filtration Persistence via Simplicial Batch Collapse". ACM Journal of Experimental Algorithmics. 24: 1.5:1–1.5:16. doi: 10.1145/3284360 . ISSN   1084-6654. S2CID   216028146.
  4. Edelsbrunner, H and Harer, J (2010). Computational Topology: An Introduction . American Mathematical Society.
  5. Verri, A.; Uras, C.; Frosini, P.; Ferri, M. (1993). "On the use of size functions for shape analysis". Biological Cybernetics. 70 (2): 99–107. doi:10.1007/BF00200823. S2CID   39065932.
  6. 1 2 3 4 Barannikov, Sergey (1994). "Framed Morse complex and its invariants". Advances in Soviet Mathematics. 21: 93–115.
  7. Zomorodian, Afra; Carlsson, Gunnar (2004-11-19). "Computing Persistent Homology". Discrete & Computational Geometry . 33 (2): 249–274. doi: 10.1007/s00454-004-1146-y . ISSN   0179-5376.
  8. Cohen-Steiner, David; Edelsbrunner, Herbert; Harer, John (2006-12-12). "Stability of Persistence Diagrams". Discrete & Computational Geometry. 37 (1): 103–120. doi: 10.1007/s00454-006-1276-5 . ISSN   0179-5376.
  9. Otter, Nina; Porter, Mason A; Tillmann, Ulrike; et al. (2017-08-09). "A roadmap for the computation of persistent homology". EPJ Data Science. Springer. 6 (1): 17. doi: 10.1140/epjds/s13688-017-0109-5 . ISSN   2193-1127. PMC   6979512 . PMID   32025466.
  10. Licenses here are a summary, and are not taken to be complete statements of the licenses. Some packages may use libraries under different licenses.
  11. Bauer, Ulrich; Kerber, Michael; Reininghaus, Jan; Wagner, Hubert (2014). "PHAT – Persistent Homology Algorithms Toolbox". Mathematical Software – ICMS 2014. Springer Berlin Heidelberg. pp. 137–143. doi:10.1007/978-3-662-44199-2_24. ISBN   978-3-662-44198-5. ISSN   0302-9743.
  12. Maria, Clément; Boissonnat, Jean-Daniel; Glisse, Marc; et al. (2014). "The Gudhi Library: Simplicial Complexes and Persistent Homology". Mathematical Software – ICMS 2014 (PDF). Berlin, Heidelberg: Springer. pp. 167–174. doi:10.1007/978-3-662-44199-2_28. ISBN   978-3-662-44198-5. ISSN   0302-9743. S2CID   17810678.