Topological data analysis

In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.one being the study of homological invariants of data one individual data sets, and the other is the use of homological invariants in the study of databases where the data points themselves have geometric structure. In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools. The initial motivation is to study the shape of data. TDA has combined algebraic topology and other tools from pure mathematics to allow mathematically rigorous study of 'shape'. The main tool is persistent homology, an adaptation of homology to point cloud data. Persistent homology has been applied to many types of data across many fields. Moreover, its mathematical foundation is also of theoretical importance. The unique features of TDA make it a promising bridge between topology and geometry. The premise underlying TDA is that shape matters. Real data in high dimensions is nearly always sparse, and tends to have relevant low dimensional features. One task of TDA is to provide a precise characterization of this fact. An illustrative example is a simple predator-prey system governed by the Lotka–Volterra equations. One can easily observe that the trajectory of the system forms a closed circle in state space. TDA provides tools to detect and quantify such recurrent motion. Many algorithms for data analysis, including those used in TDA, require the choice of various parameters. Without prior domain knowledge, the correct collection of parameters for a data set is difficult to choose. The main insight of persistent homology is that we can use the information obtained from all values of a parameter. Of course this insight alone is easy to make; the hard part is encoding this huge amount of information into an understandable and easy-to-represent form. With TDA, there is a mathematical interpretation when the information is a homology group. In general, the assumption is that features that persist for a wide range of parameters are 'true' features. Features persisting for only a narrow range of parameters are presumed to be noise, although the theoretical justification for this is unclear. Precursors to the full concept of persistent homology appeared gradually over time. In 1990, Patrizio Frosini introduced the size function, which is equivalent to the 0th persistent homology. Nearly a decade later, Vanessa Robins studied the images of homomorphisms induced by inclusion. Finally, shortly thereafter, Edelsbrunner et al. introduced the concept of persistent homology together with an efficient algorithm and its visualization as a persistence diagram. Carlsson et al. reformulated the initial definition and gave an equivalent visualization method called persistence barcodes, interpreting persistence in the language of commutative algebra. In algebraic topology the persistent homology has emerged through the work of Barannikov on Morse theory. The set of critical values of smooth Morse function was canonically partitioned into pairs 'birth-death', filtered complexes were classified and the visualization of their invariants, equivalent to persistence diagram and persistence barcodes, was given in 1994 by Barannikov's canonical form.

Parent Topic

Child Topic

No Parent Topic