Topological Data Analysis

Topological Data Analysis

Visualizing Multi-dimensional Persistent Homology Matthew L. Wright in collaboration with Michael Lesnick Institute for Mathematics and its Applications University of Minnesota What is persistent homology? Persistent homology is an algebraic method for discerning topological features of data. e.g. components, holes, graph structure

e.g. set of discrete points, with a metric Persistent homology emerged in the past 20 years due to the work of: Frosini, Ferri, et. al. (Bologna, Italy) Robins (Boulder, Colorado, USA) Edelsbrunner (Duke, North Carolina, USA) Carlsson, de Silva, et. al. (Stanford, California, USA) Zomorodian (Dartmouth, New Hampshire, USA) and others Example: What is the shape of the data? Problem: Discrete points have trivial topology.

Idea: Connect nearby points. 1. Choose a distance . 2. Connect pairs of points that are no further apart than . Problem: A graph captures connectivity, but ignores higher-order features, such as holes.

Idea: Connect nearby points, build a simplicial complex. 1. Choose a distance . 2. Connect pairs of points that are no further apart than .

3. Fill in complete simplices. 4. Homology detects the hole. Problem: How do we choose distance ? If is too small then we detect noise. If is too large then we get a giant simplex (trivial homology).

Problem: How do we choose distance ? This looks good. How do we know this hole is significant and not noise? Idea: Consider all distances . Each hole appears at a particular value of and disappears at

another value of . 1 We can represent the persistence of this hole as a pair . We visualize this pair as a bar from to : : 1 2 A collection of bars is a barcode.

2 Example: Record the barcode: : 0 1 2 3

Example: Short bars represent noise. Long bars represent features. Record the barcode: : 0

1 2 3 A persistence diagram is an alternate depiction of a barcode. Instead of drawing as a bar from to , draw a dot at coordinates . Dots far from the diagonal represent

features. Dots near the diagonal represent noise. A barcode is a visualization of an algebraic structure. Consider the sequence of complexes associated to a point cloud for an sequence of distance values: 1 2

3 A barcode is a visualization of an algebraic structure. Consider the sequence of complexes associated to a point cloud for an sequence of distance values: 1 2 3 4 5 6 7 This sequence of complexes, with maps, is a filtration. A barcode is a visualization of an algebraic structure. Filtration:

1 2 Homology with coefficients from a field : ( 1 ) ( 2 ) ( ) Let . For , the map is induced by the inclusion . Let act on by for any . i.e. acts as a shift map Then is a graded -module, called a persistence module. A barcode is a visualization of an algebraic structure. Let . Then is a graded -module.

The structure theorem for finitely generated modules over PIDs implies: [ ] [ ] [ ] (

( )) homology generators that appear at and persist forever after homology generators that appear at and persist until i.e. bars of the form

i.e. bars of the form Thus, the barcode is a complete discrete invariant. Stability: Persistence barcodes are stable with respect to pertubations of the data. Cohen-Steiner, Edelsbrunner, Harer (2007) Computation: The barcode is computable via linear algebra on the boundary matrix. Runtime is , where is the number of simplices. Zomorodian and Carlsson (2005) Where has persistent homology been used?

Image Processing The space of 3x3 high-contrast patches from digital images has the topology of a Klein bottle. Gunnar Carlsson, Tigran Ishkhanov, Vin de Silva, Afra Zomorodian. On the Local Behavior of Spaces of Natural Images. Journal of Computer Vision. Vol. 76, No. 1, 2008, p. 1 12. Image credit: Robert Ghrist. Barcodes: The Persistent Topology of Data. Bulletin of the American Mathematical Society. Vol. 45, no. 1, 2008, p. 61-75. Where has persistent homology been used? Cancer Research

Topological analysis of very high-dimensional breast cancer data can distinguish between different types of cancer. Monica Nicolau, Arnold J. Levine, Gunnar Carlsson. Topology-Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival. Proceedings of the National Academy of Sciences. Vol. 108, No. 17, 2011, p. 7265 7270. Problem: Persistent homology is sensitive to outliers. Problem: Persistent homology is sensitive to

outliers. Red points in dense regions Do we have to threshold by density? Purple points in sparse regions Multi-dimensional persistence: Allows us to work with data indexed by two parameters, such as distance and density.

distance We obtain a bifiltration: a set of simplicial complexes indexed by two parameters.

density Example: A bifiltration indexed by curvature and radius . fixed radius fixed Carlsson and Zomorodian (2009) curvature Ordinary persistence requires fixing either or .

Algebraic Structure of Multi-dimensional Persistence The homology of a bifiltered simplicial complex is a finitelygenerated bigraded module: i.e. a 2-graded module over for a field . We call this a 2-dimensional persistence module. Problem: The structure of multi-graded modules is much more complicated than that of graded modules. There is no complete, discrete invariant for multi-dimensional persistence modules (Carlsson and Zomorodian, 2007). Thus, there is no multi-dimensional barcode. Question: How can we visualize multi-dimensional persistence? Concept: Visualize a barcode along any one-dimensional slice of a multi-dimensional parameter space.

Along any onedimensional slice, a barcode exists. distance Example: density Bi-graded Betti numbers and Example: 1st homology (holes) These are functions, indicates

coordinates at which homology appears ( , )= 3 2 1 1 2 3 Bi-graded Betti numbers and

Example: 1st homology (holes) These are functions, indicates coordinates at which homology appears 3 2 1 1 ( , )=

2 3 Bi-graded Betti numbers and Example: 1st homology (holes) These are functions, indicates coordinates at which homology appears values of in green

3 2

1 1 2 3 Bi-graded Betti numbers and Example: 1st homology (holes) These are functions,

indicates coordinates at which homology appears indicates coordinates at which homology disappears ( , )= 3 2 1 1

2 3 Bi-graded Betti numbers and Example: 1st homology (holes) These are functions, indicates coordinates at which homology appears indicates coordinates at

which homology disappears ( , )= 3 2 1 1 2 3 Bi-graded Betti numbers and Example: 1st homology (holes)

These are functions, indicates coordinates at which homology appears indicates coordinates at which homology disappears values of in red 3

2

1 1 2 3 R ank I nvariant V isualization and E xploration Tool Mike Lesnick

and Matthew Wright How RIVET Works RIVET pre-computes a relatively small number of discrete barcodes, from which it draws barcodes in real-time. Endpoints of bars appear in the same order in each of these two barcodes. Endpoints of bars in this barcode have a different order. Endpoints of bars

are the projections of support points of the bigraded Betti numbers onto the slice line. We can identify lines for which these projections agree. Data Structure At the core of RIVET is a line arrangement. Each line corresponds to a point where projections of two support points agree.

Cells correspond to families of lines with the same discrete barcode. When the user selects a slice line, the appropriate cell is found, and its discrete barcode is re-scaled and displayed. point-line duality: Performance Suppose we are interested in th homology. Let be the total number of simplices of dimensions , , and in the bifiltration. Let be the number of multigrades. Then the time required to compute the line

arrangement and all discrete barcodes is Then the time required to find a cell is . For more information: Robert Ghrist. Barcodes: The Persistent Topology of Data. Bulletin of the American Mathematical Society. Vol. 45, no. 1, 2008, p. 61-75. Gunnar Carlsson and Afra Zomorodian. The Theory of Multidimensional Persistence. Discrete and Computational Geometry. Vol. 42, 2009, p. 71-93. Michael Lesnick and Matthew Wright. Efficient Representation and Visualization of 2-D Persistent Homology. in preparation.

Recently Viewed Presentations

  • MODERN PHYSICS By 1932, the basic building blocks

    MODERN PHYSICS By 1932, the basic building blocks

    5.3b Charge is quantized on two levels. On the atomic level, charge is restricted to multiplesof the elementary charge (charge on the electron or proton). On the subnuclearlevel, charge appears as fractional values of the elementary charge (quarks). 5.3j The...
  • 14.1 Church Reform and the Crusades

    14.1 Church Reform and the Crusades

    Founded in 910, this is the Benedictine Abbey of Cluny as it looked in 2004. Coat of Arms of Cluny Abbey . Problems in the Church. ... Says don't follow priests that are married. Fights against simony. Defines Church's position...
  • John Steinbeck's Of Mice and Men - Tri-Valley Local Schools

    John Steinbeck's Of Mice and Men - Tri-Valley Local Schools

    John Steinbeck's Of Mice and Men INTRODUCTION NOTES + Journal / Notes Writing Assignment John Steinbeck 1902 - 1968 Spent teen years working on a CA ranch - the combination of the work and people served as an inspiration for...
  • Cu o politic fiscal permanent prociclic am putea

    Cu o politic fiscal permanent prociclic am putea

    Celelate partide (adică partidele care nu reușesc să guverneze în fazele ascendente), nu reușesc să producă un refuz coordonat de a guverna în recesiuni. Cauza profundă a acestui eșec este dată de faptul că instituțiile din democrația noastră sunt slabe.
  • Técnicas de programación y control de proyectos PERT-CPM

    Técnicas de programación y control de proyectos PERT-CPM

    TÉCNICAS DE PLANEACIÓN DE PROYECTOS OCTUBRE 2012 * TÉCNICAS DE PLANEACIÓN DE PROYECTOS PERT-CPM Program Evaluation and Review Technique (PERT) Programa de Evaluación y Revisión Técnica Critical Path Method (CPM) Método del Camino Crítico Diagrama de Gantt * HISTORIA DEL...
  • Pedagogy, Andragogy, and online course design

    Pedagogy, Andragogy, and online course design

    Plato's idea of paidagogos as "leader" and "custodian" of children ... Progressive Era. Need for more and better education (urbanization) 1930-1950. ... Pedagogy, Andragogy, and online course design
  • The Modern Indian independence Movement 53:00-1:05:00 Indian National

    The Modern Indian independence Movement 53:00-1:05:00 Indian National

    The Modern Indian independence Movement 53:00-1:05:00 Indian National Congress Founded in 1885 - Meant to increase the power of educated Indians in government By the 20th century the party advocated independence Gandhi became president of congress in 1915 - Formed...
  • The Cold War - Rosholt High School

    The Cold War - Rosholt High School

    The Cold War Expands. By the time Truman left office in 1953 the Cold War had taken global proportions. Two goals of the military system: U.S. armed forces should be unified into an integrated military system. New institutions to coordinate...