2nd Workshop on
Topological Methods in Data Analysis

4th - 6th October 2021, Heidelberg University


Topological data analysis (TDA) is a rapidly growing field of applied mathematics, that promises an exciting approach to the analysis of highly complex systems. The initial idea of TDA is to study the “shape of data”, which is typically not accessible in standard approaches of data analysis and has the potential to reveal additional deep insights into a given system. The last years have seen an active back and forth between pure mathematics and applications in the natural sciences, demonstrating that methods of topology, geometry, and beyond offer an effective tool for data exploration and data analysis. Crucially, there are by now several implementations of the main techniques in the TDA tool-box, ready for use by the initiated researcher.

This three-day workshop includes introductions into the powerful data analysis machinery of persistent homology, extensive tutorials on the versatile GUDHI Library, and in particular features invited Colloquium Talks by well-known experts in the field, aimed at a broader audience. In addition, participants will have the opportunity to give a short presentation on their own TDA-related work.

In response to COVID-19, the workshop will be held online. Zoom links will be provided at a later time, along with further instructions on how to participate. You must register for the workshop to receive the password.

Programme

Mon 4th Tue 5th Wed 6th
11.15 - 12.45 Carrière Carrière Carrière
14.15 - 15.45 Carrière Carrière Lightning Talks
16.15 - 17.15 Baas van de Weygaert Chazal

    All times are CEST (UTC+2). Zoom rooms will open a few minutes in advance.
    Recordings are linked in the schedule; for access get in touch with the organizers.

Mathieu Carrière - Gudhi Tutorials

The purpose of this series of tutorials is to provide an introduction to the basic concepts of Topological Data Analysis (TDA), as well as their practical manipulations with the Python Gudhi library. The sessions will alternate between classes accompanied with Gudhi tutorials, and practical sessions where students can gather and work out small coding and/or data science projects using TDA tools.

Class 1: Simplicial complexes and (Persistent) Homology The purpose of this class is to introduce the most basic concepts of computational topology. We will present the definition and properties of particular topological spaces, called simplicial complexes, that can be constructed from data sets and/or point clouds. In particular, we will show how they are stored in Gudhi with the so-called simplex trees. Then, we will introduce the main tool of topological data analysis, called the persistence diagram. We will present its definition, computation algorithm, as well as its theoretical properties, some of which being open problems in the literature.
Practical session 1

This practical session is dedicated to the manipulation of simplex trees and persistence computation in Gudhi.

P1: Extended persistent homology
P2: Robust geometric filtrations
P3: ToMATo for protein conformation

Class 2: Topological machine learning

In this class, we will show through several examples and applications how persistence theory can be used to build relevant topological descriptors/signatures from data sets, that encode useful topological information that is often complementary to other usual descriptors. Then, we will show how these signatures can be converted into features for further data analysis and machine learning tasks, by using either finite or infinite-dimensional vectorizations into reproducing kernel Hilbert spaces, i.e., kernel methods. We will finally present different applications of topological optimization, in particular how topological penalties can be introduced in gradient descent and/or loss functions of classifiers.

Practical session 2

This practical session is dedicated to the manipulation of persistence diagrams in machine learning frameworks such as Scikit-Learn and TensorFlow

P4: Persistence-based 3D shape segmentations
P5: Topological regularization of point clouds and classifiers
P6: Mapper for contact maps

Class 3: Topological exploratory data analysis with Mapper

Visualizing and exploring data sets can be a difficult task, especially when dealing with very high-dimensional data. In this class, we will present Reeb spaces and Mappers, which are simple yet convenient approaches to process data sets with topological methods, that depend only on the low-dimensional subspace/manifold from which data is sampled. They are based on grouping together similar observations into patches and providing a description of their intersection patterns. We will demonstrate how to use them in Gudhi, as well as some well-founded statistical results about their convergence and confidence properties.

Further Material

Slides: Class 1, Class 2, Class 3
Projects: P1, P2, P3, P4, P5, P6
TDA Tutorials (Jupyter Notebooks)

Mathieu Carrière (INRIA Sophia Antipolis - Méditerranée) is a member of the Editorial Board of the GUDHI Library and an active contributor to its implementation.

Nils Baas - A new approach to higher structures

In the talk I will motivate by examples the introduction of a new framework for studying higher structures, namely what I call Hyperstructures. Furthermore, discuss “The Hyperstructure Principle” and show how it offers a way to organize objects and structures. Finally, I will suggest a generalization of field theories.

Nils Baas (NTNU – Norwegian University of Science and Technology) is a pioneer in the study of higher structures that occur in mathematics and across all sciences and has applied his insights e.g. to systems in biology, chemistry, neuroscience and physics.

Frédéric Chazal - A framework to differentiate persistent homology with applications in Machine Learning and Statistics

Understanding the differentiable structure of persistent homology and solving optimization tasks based on functions and losses with a topological flavor is a very active, growing field of research in data science and Topological Data Analysis, with applications in non-convex optimization, statistics and machine learning. However, the approaches proposed in the literature are usually anchored to a specific application and/or topological construction, and do not come with theoretical guarantees. In this talk, we will study the differentiability of a general map associated with the most common topological construction, that is, the persistence map. Building on real analytic geometry arguments, we propose a general framework that allows to define and compute gradients for persistence-based functions in a very simple way. As an application, we also provide a simple, explicit and sufficient condition for convergence of stochastic subgradient methods for such functions. If time permits, as another application, we will also show how this framework combined with standard geometric measure theory arguments leads to results on the statistical behavior of persistence diagrams of filtrations built on top of random point clouds.

Frédéric Chazal (INRIA Saclay – Île-de-France) is a leading expert in topological data analysis, in particular having made major contributions to the theory and various applications of persistence.

Rien van de Weygaert - The cosmic web: Complexity and connectivity of the largest structure in the universe

The Cosmic Web is the fundamental spatial organization of matter on scales of a few up to a hundred millions of lightyears. Galaxies, intergalactic gas and dark matter have aggregated in a wispy weblike network of dense compact clusters, elongated filaments, and sheetlike walls, amidst large near-empty void regions. An important additional aspect of this mass distribution is that it is marked by substructure over a wide range of scales and densities.

A unique aspect of the cosmic web is its connectivity, the way in which its various structural components are spatially organized in a weblike network. Persistent Topology has provided us with the mathematical foundation and concepts to describe and quantify this key aspect of the large scale cosmic matter distribution.

Over the past decades, we have quantified the outcome of computer simulations in terms of their betti numbers and persistence diagrams. Most of this has been based on the analysis of the multiscale spatial density distribution and/or the distance field in terms of alpha shapes.

In the final part of the presentation I will describe recent developments in which we have coupled the dynamics of cosmic web formation to its multiscale topology. The cosmic web has been formed and shaped by large scale tidal forces induced by the cosmic mass distribution. For the connectivity of the cosmic web, it therefore turns out to be of key importance to understand the persistent topology of the cosmic tidal field. To be able to model this we have turned to a full phase-space analysis to understand the growth of structural complexity in terms of the emergence of singularities and caustics (work in collaboration with J. Feldbruuge, G. Wilding and J. Hidding). To this end, we follow the mathematical description in terms of the caustic structure implied by the corresponding (initial) deformation and tidal field, in the context of the caustic conditions that we derived in Feldbrugge et al. (2018). We discuss how the spatial distribution of the primordial fields of deformation tensor eigenvalues defines the spatial outline - or caustic spine - of the cosmic web. This allows us to analyze the connectivity and hierarchical evolution of the cosmic web in terms of the deformation field persistence diagrams, and their information content on the physics of the structure formation process.

Slides

Rien van de Weygaert (Kapteyn Institute, University of Groningen) is a leading expert in studying cosmic structures and their formation with tools from stochastic and computational geometry as well as persistent homology.

Lightning Talks

On Wedensday, Oct 6th, there is the possibility to give a brief presentation (~ 10 minutes) on your own TDA-related research project. In case you want to use this opportunity to present your work to others, please let us know during registration.

14.15h   Bastian Rieck - Topological Graph Neural Networks

14.25h   Oleg Kachan - Persistence Homology-based Projection Pursuit

14.35h   Erik Amezquita - Quantifying the shape of barley using the Euler characteristic

14.45h   Eva Lymberopoulos - Topological data analysis reveals population-level microbiome signatures
                        associated with Parkinson’s disease

14.55h   Dhananjay Bhaskar - TDA of dynamical phase transitions in cancer & developmental biology

15.05h   Kem-Meka Tiotsop Kadzue Péguy - Topological Data Analysis for Time Series: Case of the Temperature
                                 in Cameroon

15.15h   Waqar Hussain Shah - Statistics on the space of persistent diagrams with applications

15.25h   Ximena Fernandez - Density-based persistent homology

15.35h   Rolando Kindelan - Compact simplicial complex representation

Recordings

Mathieu Carrière - Class 1
Mathieu Carrière - Projects 1-3
Mathieu Carrière - Class 2
Mathieu Carrière - Projects 4-6
Mathieu Carrière - Class 3

Nils Baas - A new approach to higher structures
Rien van de Weygaert - The cosmic web: Complexity and connectivity of the largest structure in the universe Frédéric Chazal - A framework to differentiate persistent homology with applications in Machine Learning and Statistics

Registration

Participation is free. If you wish to attend, please register until Friday October 1st, 2021.
Registration is now closed.

In case of questions, feel free to contact the organizers via email.

Poster

Organizational Commitee

Michael Bleher, Maximilian Schmahl, Daniel Spitz, Anna Wienhard
STRUCTURES Heidelberg Ruprecht-Karls-Universität Heidelberg