+ Summary
+ multipers is a Python library for
+ Topological Data Analysis, focused on Multiparameter
+ Persistence computation and visualizations for Machine
+ Learning. It features several efficient computational and
+ visualization tools, with integrated, easy to use, auto-differentiable
+ Machine Learning pipelines, that can be seamlessly interfaced with
+ scikit-learn
+ (Pedregosa
+ et al., 2011) and PyTorch
+ (Paszke
+ et al., 2019). This library is meant to be usable for
+ non-experts in Topological or Geometrical Machine Learning.
+ Performance-critical functions are implemented in
+ C++ or in Cython
+ (Behnel
+ et al., 2011-03/2011-04), are parallelizable with
+ TBB
+ (Robison,
+ 2011), and have Python bindings and
+ interface. It can handle a very diverse range of datasets that can be
+ framed into a (finite) multi-filtered simplicial or cell complex,
+ including, e.g., point clouds, graphs, time series, images, etc.
+
+
+ (Left) Topological 2-filtration grid. The color
+ corresponds to the density estimation of the sampling measure of the
+ point cloud. More formally, a point
+
+ x∈ℝ2
+ belongs to the grid cell with coordinates
+
+
+ (r,d)
+ iff
+
+ d(x,pointcloud)≤r
+ and
+
+ density(x)≥d.
+ The green background shape corresponds to the lifetime of the
+ annulus in this 2-parameter grid. (Right) A
+ visualization of the lifetimes of geometric structures given by
+ multipers; here each colored shape
+ corresponds to a cycle appearing in the bi-filtration on the left,
+ and the shape represents its lifetime. The biggest green shape on
+ the right is the same as the one on the left.
+
+
+ Some motivation. In the example of Figure
+ [1], a point cloud is given from
+ sampling a probability measure whose mass is, for the most part,
+ located on an annulus, with some diffuse background noise. The goal
+ here is to recover this information in a topological descriptor. For
+ this, the point cloud can be analyzed at some geometric scale
+
+ 0]]>
+ r>0
+ and density scale
+
+ d
+ by centering balls of radius
+
+ r
+ around each point whose density is above
+
+
+ d,
+ and looking at the topology induced by the union of balls. However,
+ notice that neither a fixed geometric scale nor density scale alone
+ can retrieve (canonically) meaningful information due to the diffuse
+ noise in the background; which is the main limitation of the prevalent
+ approach. Nevertheless, by considering all possible
+ combinations of geometric or density scales, also called a
+ bi-filtration, it becomes straightforward with
+ multipers to retrieve some of the underlying
+ geometrical structures without relying on any arbitrary scale
+ choice.
+ Furthermore, multipers seamlessly integrates
+ several Rust and C++
+ libraries such as Gudhi
+ (TheGudhiProject,
+ 2023), filtration-domination
+ (Alonso
+ et al., 2023), mpfree
+ (Kerber
+ & Rolle, 2020), and
+ function-delaunay
+ (Alonso
+ et al., 2024), and leverages on state-of-the-art Machine
+ Learning libraries for fast computations, such as
+ scikit-learn
+ (Pedregosa
+ et al., 2011), Python Optimal Transport
+ (Flamary
+ et al., 2021), PyKeops
+ (Charlier
+ et al., 2021), or PyTorch
+ (Paszke
+ et al., 2019). This makes multipers a
+ very efficient and fully-featured library, showcasing a wide variety
+ of mathematically-grounded multiparameter topological invariants,
+ including, e.g., Multiparameter Module Approximation
+ (Loiseaux
+ et al., 2022), Euler, Hilbert, and Rectangle Signed Barcodes
+ (Botnan
+ et al., 2022;
+ Oudot
+ & Scoccola, 2024), Multiparameter Persistent Landscapes
+ (Vipond,
+ 2020); each of them computable from several multi-filtrations,
+ e.g., Rips-Density-like filtrations, Cubical, Degree-Rips,
+ Function-Delaunay, or any
+
+ k-critical
+ multi-filtration. These topological descriptors can then directly be
+ used in auto-differentiable Machine Learning pipelines, using the
+ differentiability framework developed in
+ (Scoccola
+ et al., 2024), through several methods, such as, e.g.,
+ Decomposable Module Representations
+ (Loiseaux,
+ Carrière, et al., 2023), Sliced Wasserstein Kernels or
+ Convolutions from Signed Measures
+ (Loiseaux,
+ Scoccola, et al., 2023). As a result,
+ multipers is capable of handling, within a
+ single minute of computation, datasets of
+
+
+ ∼50k
+ points with only 5 lines of Python code. See Figures
+ [2],
+ [3].
+
+ Typical
+ interpretation of a “Geometric & Density” bi-filtration with
+ multipers. (Left) Point cloud
+ with color induced by density estimation (same as Figure
+ [1]). (Right) A
+ visualization of the topological structure lifetimes computed from a
+ Delaunay-Codensity bi-filtration; here the three cycles can be
+ retrieved using their radii (x-axis) and their co-densities
+ (y-axis). The first cycle is the densiest, and smallest, and thus
+ corresponds to the one that appears in the
+ bottom(high-density)-left(small-radius) of the bi-filtration. The
+ second is less dense (thus above the first one) and bigger (thus
+ more on the right). The same goes for the last one.
+
+
+
+ Different
+ Signed Barcodes from the same dataset as Figure
+ [2]. (Left) Euler
+ Decomposition Signed Barcode, and the Euler Surface in the
+ background. (Middle) Hilbert Decomposition Signed
+ Barcode, with its Hilbert Function surface. (Right)
+ Rank invariant Signed Barcode, with the Hilbert Function as a
+ background.
+
+
+ The core functions of the Python library are automatically tested
+ on Linux and macOS, using pytest
+ (Krekel
+ et al., 2004) alongside GitHub Actions.
+
+
+ Related work and statement of need
+ There exists several libraries for computation or pre-processing of
+ very specific tasks related to multiparameter persistence. However, to
+ the best of our knowledge, none of them are able to tackle the
+ challenges that multipers is dealing with,
+ i.e., (1) computing and unifying the computations of
+ multiparameter persistent structures, in a non-expert friendly
+ approach, and (2) provide ready-to-use general tools to
+ use these descriptors for Machine Learning pipelines and projects.
+ Eulearning.
+ This library features different approaches for computing and using the
+ Euler Characteristic of a multiparameter filtration
+ (Hacquard
+ & Lebovici, 2023). Although relying on distinct methods,
+ multipers can also be used to compute Machine
+ Learning descriptors from the Euler Characteristic, i.e., the Euler
+ Decomposition Signed Barcode, or Euler Surfaces. Moreover,
+ multipers computations are faster (especially
+ on point cloud datasets), easier to use, and available on a wider
+ range of multi-filtrations.
+ Multiparameter
+ Persistent Landscape. Implemented on top of
+ Rivet
+ (Lesnick
+ & Wright, 2015), this library computes a multiparameter
+ persistent descriptor by computing 1-parameter persistence landscape
+ vectorizations of slices in multi-filtrations
+ (Vipond,
+ 2020), called Multiparameter Persistent Landscape (MPL). This
+ library also features some multiparameter persistence visualizations.
+ However, it is limited to Rivet capabilities
+ and landscapes computations, which on one hand does not leverage on
+ recently developed optimizations, e.g.,
+ (Alonso
+ et al., 2023), or
+ (Kerber
+ & Rolle, 2020), and on the other hand can only work with
+ very specific text file inputs.
+ GRIL.
+ This library provides code to compute a specific, generalized version
+ of the Multiparameter Persistent Landscapes
+ (Xin et
+ al., 2023), relying on 1-paramter persistence zigzag
+ computations. This library however is limited to this invariant, can
+ only deal with 2-parameter persistence, and is not as much integrated
+ as multipers with other multiparameter
+ persistence and Machine Learning libraries.
+ Elder
+ Rule Staircode. This library features a descriptor
+ for 2-parameter, degree-0 homology, rips-densitity-like filtrations
+ (Cai
+ et al., 2021). Once again, this library is very specific and
+ not linked with other libraries.
+ Persistable.
+ is a GUI interactive library for clustering, using degree-0
+ multiparameter persistence
+ (Rolle
+ & Scoccola, 2020;
+ Scoccola
+ & Rolle, 2023). Although aiming at distinct goals and using
+ very different approaches, multipers can also
+ be used for clustering, by computing (differentiable) descriptors that
+ can be used afterward with standard clustering methods, e.g.,
+ K-means.
+ We contribute to this variety of task-specific libraries by
+ providing a general purpose library,
+ multipers, with novel and efficient topological
+ invariant computations, integrated state-of-the art Machine Learning
+ topological pipelines, and interfaces to standard Machine Learning and
+ Deep Learning libraries.
+
+