From aafe7799770ae18aabdccc364fd8e79db6f8d9d6 Mon Sep 17 00:00:00 2001 From: The Open Journals editorial robot <89919391+editorialbot@users.noreply.github.com> Date: Thu, 14 Nov 2024 00:00:44 +0000 Subject: [PATCH] Creating 10.21105.joss.06773.jats --- .../paper.jats/10.21105.joss.06773.jats | 771 ++++++++++++++++++ 1 file changed, 771 insertions(+) create mode 100644 joss.06773/paper.jats/10.21105.joss.06773.jats diff --git a/joss.06773/paper.jats/10.21105.joss.06773.jats b/joss.06773/paper.jats/10.21105.joss.06773.jats new file mode 100644 index 0000000000..c073a44d44 --- /dev/null +++ b/joss.06773/paper.jats/10.21105.joss.06773.jats @@ -0,0 +1,771 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6773 +10.21105/joss.06773 + +multipers : Multiparameter +Persistence for Machine Learning + + + + +Loiseaux +David + + + + + +Schreiber +Hannah + + + + + +Centre Inria d’Université Côte d’Azur, France + + + +9 +103 +6773 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2024 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +machine learning +topological data analysis + + + + + + Summary +

multipers is a Python library for + Topological Data Analysis, focused on Multiparameter + Persistence computation and visualizations for Machine + Learning. It features several efficient computational and + visualization tools, with integrated, easy to use, auto-differentiable + Machine Learning pipelines, that can be seamlessly interfaced with + scikit-learn + (Pedregosa + et al., 2011) and PyTorch + (Paszke + et al., 2019). This library is meant to be usable for + non-experts in Topological or Geometrical Machine Learning. + Performance-critical functions are implemented in + C++ or in Cython + (Behnel + et al., 2011-03/2011-04), are parallelizable with + TBB + (Robison, + 2011), and have Python bindings and + interface. It can handle a very diverse range of datasets that can be + framed into a (finite) multi-filtered simplicial or cell complex, + including, e.g., point clouds, graphs, time series, images, etc.

+ +

+ (Left) Topological 2-filtration grid. The color + corresponds to the density estimation of the sampling measure of the + point cloud. More formally, a point + + x2 + belongs to the grid cell with coordinates + + + (r,d) + iff + + d(x,pointcloud)r + and + + density(x)d. + The green background shape corresponds to the lifetime of the + annulus in this 2-parameter grid. (Right) A + visualization of the lifetimes of geometric structures given by + multipers; here each colored shape + corresponds to a cycle appearing in the bi-filtration on the left, + and the shape represents its lifetime. The biggest green shape on + the right is the same as the one on the left.

+ +
+

Some motivation. In the example of Figure + [1], a point cloud is given from + sampling a probability measure whose mass is, for the most part, + located on an annulus, with some diffuse background noise. The goal + here is to recover this information in a topological descriptor. For + this, the point cloud can be analyzed at some geometric scale + + 0]]> + r>0 + and density scale + + d + by centering balls of radius + + r + around each point whose density is above + + + d, + and looking at the topology induced by the union of balls. However, + notice that neither a fixed geometric scale nor density scale alone + can retrieve (canonically) meaningful information due to the diffuse + noise in the background; which is the main limitation of the prevalent + approach. Nevertheless, by considering all possible + combinations of geometric or density scales, also called a + bi-filtration, it becomes straightforward with + multipers to retrieve some of the underlying + geometrical structures without relying on any arbitrary scale + choice.

+

Furthermore, multipers seamlessly integrates + several Rust and C++ + libraries such as Gudhi + (TheGudhiProject, + 2023), filtration-domination + (Alonso + et al., 2023), mpfree + (Kerber + & Rolle, 2020), and + function-delaunay + (Alonso + et al., 2024), and leverages on state-of-the-art Machine + Learning libraries for fast computations, such as + scikit-learn + (Pedregosa + et al., 2011), Python Optimal Transport + (Flamary + et al., 2021), PyKeops + (Charlier + et al., 2021), or PyTorch + (Paszke + et al., 2019). This makes multipers a + very efficient and fully-featured library, showcasing a wide variety + of mathematically-grounded multiparameter topological invariants, + including, e.g., Multiparameter Module Approximation + (Loiseaux + et al., 2022), Euler, Hilbert, and Rectangle Signed Barcodes + (Botnan + et al., 2022; + Oudot + & Scoccola, 2024), Multiparameter Persistent Landscapes + (Vipond, + 2020); each of them computable from several multi-filtrations, + e.g., Rips-Density-like filtrations, Cubical, Degree-Rips, + Function-Delaunay, or any + + k-critical + multi-filtration. These topological descriptors can then directly be + used in auto-differentiable Machine Learning pipelines, using the + differentiability framework developed in + (Scoccola + et al., 2024), through several methods, such as, e.g., + Decomposable Module Representations + (Loiseaux, + Carrière, et al., 2023), Sliced Wasserstein Kernels or + Convolutions from Signed Measures + (Loiseaux, + Scoccola, et al., 2023). As a result, + multipers is capable of handling, within a + single minute of computation, datasets of + + + 50k + points with only 5 lines of Python code. See Figures + [2], + [3].

+ +

Typical + interpretation of a “Geometric & Density” bi-filtration with + multipers. (Left) Point cloud + with color induced by density estimation (same as Figure + [1]). (Right) A + visualization of the topological structure lifetimes computed from a + Delaunay-Codensity bi-filtration; here the three cycles can be + retrieved using their radii (x-axis) and their co-densities + (y-axis). The first cycle is the densiest, and smallest, and thus + corresponds to the one that appears in the + bottom(high-density)-left(small-radius) of the bi-filtration. The + second is less dense (thus above the first one) and bigger (thus + more on the right). The same goes for the last one.

+ +
+ +

Different + Signed Barcodes from the same dataset as Figure + [2]. (Left) Euler + Decomposition Signed Barcode, and the Euler Surface in the + background. (Middle) Hilbert Decomposition Signed + Barcode, with its Hilbert Function surface. (Right) + Rank invariant Signed Barcode, with the Hilbert Function as a + background.

+ +
+

The core functions of the Python library are automatically tested + on Linux and macOS, using pytest + (Krekel + et al., 2004) alongside GitHub Actions.

+
+ + Related work and statement of need +

There exists several libraries for computation or pre-processing of + very specific tasks related to multiparameter persistence. However, to + the best of our knowledge, none of them are able to tackle the + challenges that multipers is dealing with, + i.e., (1) computing and unifying the computations of + multiparameter persistent structures, in a non-expert friendly + approach, and (2) provide ready-to-use general tools to + use these descriptors for Machine Learning pipelines and projects.

+

Eulearning. + This library features different approaches for computing and using the + Euler Characteristic of a multiparameter filtration + (Hacquard + & Lebovici, 2023). Although relying on distinct methods, + multipers can also be used to compute Machine + Learning descriptors from the Euler Characteristic, i.e., the Euler + Decomposition Signed Barcode, or Euler Surfaces. Moreover, + multipers computations are faster (especially + on point cloud datasets), easier to use, and available on a wider + range of multi-filtrations.

+

Multiparameter + Persistent Landscape. Implemented on top of + Rivet + (Lesnick + & Wright, 2015), this library computes a multiparameter + persistent descriptor by computing 1-parameter persistence landscape + vectorizations of slices in multi-filtrations + (Vipond, + 2020), called Multiparameter Persistent Landscape (MPL). This + library also features some multiparameter persistence visualizations. + However, it is limited to Rivet capabilities + and landscapes computations, which on one hand does not leverage on + recently developed optimizations, e.g., + (Alonso + et al., 2023), or + (Kerber + & Rolle, 2020), and on the other hand can only work with + very specific text file inputs.

+

GRIL. + This library provides code to compute a specific, generalized version + of the Multiparameter Persistent Landscapes + (Xin et + al., 2023), relying on 1-paramter persistence zigzag + computations. This library however is limited to this invariant, can + only deal with 2-parameter persistence, and is not as much integrated + as multipers with other multiparameter + persistence and Machine Learning libraries.

+

Elder + Rule Staircode. This library features a descriptor + for 2-parameter, degree-0 homology, rips-densitity-like filtrations + (Cai + et al., 2021). Once again, this library is very specific and + not linked with other libraries.

+

Persistable. + is a GUI interactive library for clustering, using degree-0 + multiparameter persistence + (Rolle + & Scoccola, 2020; + Scoccola + & Rolle, 2023). Although aiming at distinct goals and using + very different approaches, multipers can also + be used for clustering, by computing (differentiable) descriptors that + can be used afterward with standard clustering methods, e.g., + K-means.

+

We contribute to this variety of task-specific libraries by + providing a general purpose library, + multipers, with novel and efficient topological + invariant computations, integrated state-of-the art Machine Learning + topological pipelines, and interfaces to standard Machine Learning and + Deep Learning libraries.

+
+ + Acknowledgements +

David Loiseaux was supported by ANR grant 3IA Côte d’Azur + (ANR-19-P3IA-0002). The authors would like to thank Mathieu Carrière, + and Luis Scoccola for their help on Sliced Wasserstein, and Möbius + inversion code.

+
+ + + + + + + + ScoccolaLuis + RolleAlexander + + Persistable: Persistent and stable clustering + Journal of Open Source Software + 202303 + 20240509 + 8 + 83 + 2475-9066 + 10.21105/joss.05022 + 5022 + + + + + + + CaiChen + KimWoojin + MemoliFacundo + WangYusu + + Elder-Rule-Staircodes for Augmented Metric Spaces + SIAM Journal on Applied Algebra and Geometry + 202101 + 20220520 + 5 + 3 + 2470-6566 + https://arxiv.org/abs/2003.04523 + 10.1137/20M1353605 + 417 + 454 + + + + + + HacquardOlympio + LeboviciVadim + + Euler Characteristic Tools For Topological Data Analysis + arXiv.org + 202303 + 20240416 + 10.48550/arxiv.2303.14040 + + + + + + AlonsoÁngel Javier + KerberMichael + PritamSiddharth + + Filtration-domination in bifiltered graphs + 2023 proceedings of the symposium on algorithm engineering and experiments (ALENEX ) + 2023 + 10.1137/1.9781611977561.ch3 + 27 + 38 + + + + + + AlonsoÁngel Javier + KerberMichael + LamTung + LesnickMichael + + Delaunay Bifiltrations of Functions on Point Clouds + Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms ( SODA) + Society for Industrial and Applied Mathematics + 202401 + 20240514 + 10.1137/1.9781611977912.173 + 4872 + 4891 + + + + + + XinCheng + MukherjeeSoham + SamagaShreyas N. + DeyTamal K. + + GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning + Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) + PMLR + 202309 + 20240416 + 2640-3498 + 313 + 333 + + + + + + TheGudhiProject + + GUDHI + GUDHI Editorial Board + 2023 + + + + + + LoiseauxDavid + CarrièreMathieu + BlumbergAndrew J. + + Fast, Stable and Efficient Approximation of Multi-parameter Persistence Modules with MMA + 202206 + 20240416 + 10.48550/arXiv.2206.02026 + + + + + + LoiseauxDavid + CarrièreMathieu + BlumbergAndrew + + A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions + Advances in Neural Information Processing Systems + 202312 + 20240319 + 36 + 35774 + 35798 + + + + + + KerberMichael + RolleAlexander + + Fast Minimal Presentations of Bi-graded Persistence Modules + arXiv:2010.15623 [cs, math] + 202010 + 20210721 + https://arxiv.org/abs/2010.15623 + 10.1137/1.9781611976472.16 + + + + + + VipondOliver + + Multiparameter persistence landscapes + Journal of Machine Learning Research + 2020 + 21 + 61:1 + 61:38 + + + + + + RolleAlexander + ScoccolaLuis + + Stable and consistent density-based clustering via multiparameter persistence + arXiv.org + 202005 + 20240416 + 10.48550/arXiv.2005.09048 + + + + + + FlamaryRémi + CourtyNicolas + GramfortAlexandre + AlayaMokhtar Z. + BoisbunonAurélie + ChambonStanislas + ChapelLaetitia + CorenflosAdrien + FatrasKilian + FournierNemo + GautheronLéo + GayraudNathalie T. H. + JanatiHicham + RakotomamonjyAlain + RedkoIevgen + RoletAntoine + SchutzAntony + SeguyVivien + SutherlandDanica J. + TavenardRomain + TongAlexander + VayerTitouan + + POT: Python optimal transport + The Journal of Machine Learning Research + 202101 + 22 + 1 + 1532-4435 + 78:3571 + 78:3578 + + + + + + CharlierBenjamin + FeydyJean + GlaunèsJoan Alexis + CollinFranç ois-David + DurifGhislain + + Kernel Operations on the GPU, with Autodiff, without Memory Overflows + Journal of Machine Learning Research + 2021 + 20240416 + 22 + 74 + 1533-7928 + 1 + 6 + + + + + + PaszkeAdam + GrossSam + MassaFrancisco + LererAdam + BradburyJames + ChananGregory + KilleenTrevor + LinZeming + GimelsheinNatalia + AntigaLuca + DesmaisonAlban + KöpfAndreas + YangEdward + DeVitoZach + RaisonMartin + TejaniAlykhan + ChilamkurthySasank + SteinerBenoit + FangLu + BaiJunjie + ChintalaSoumith + + PyTorch: An imperative style, high-performance deep learning library + Proceedings of the 33rd International Conference on Neural Information Processing Systems + Curran Associates Inc. + Red Hook, NY, USA + 201912 + 20240416 + 8026 + 8037 + + + + + + LesnickMichael + WrightMatthew + + Interactive visualization of 2-D persistence modules + arXiv:1512.00180 [cs, math] + 201512 + https://arxiv.org/abs/1512.00180 + 10.48550/arXiv.1512.00180 + + + + + + LoiseauxDavid + ScoccolaLuis + CarrièreMathieu + BotnanMagnus Bakke + OudotSteve + + Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures + Advances in Neural Information Processing Systems + 202312 + 20240319 + 36 + 68316 + 68342 + + + + + + PedregosaFabian + VaroquauxGaël + GramfortAlexandre + MichelVincent + ThirionBertrand + GriselOlivier + BlondelMathieu + PrettenhoferPeter + WeissRon + DubourgVincent + VanderplasJake + PassosAlexandre + CournapeauDavid + BrucherMatthieu + PerrotMatthieu + DuchesnayÉdouard + + Scikit-learn: Machine Learning in Python + Journal of Machine Learning Research + 2011 + 20240416 + 12 + 85 + 1533-7928 + 2825 + 2830 + + + + + + BotnanMagnus Bakke + OppermannSteffen + OudotSteve + + Signed Barcodes for Multi-Parameter Persistence via Rank Decompositions + 38th International Symposium on Computational Geometry (SoCG 2022) + + GoaocXavier + KerberMichael + + Schloss Dagstuhl – Leibniz-Zentrum für Informatik + Dagstuhl, Germany + 2022 + 20220614 + 224 + 978-3-95977-227-3 + 1868-8969 + 10.4230/LIPIcs.SoCG.2022.19 + 19:1 + 19:18 + + + + + + OudotSteve + ScoccolaLuis + + On the Stability of Multigraded Betti Numbers and Hilbert Functions + SIAM Journal on Applied Algebra and Geometry + Society for Industrial and Applied Mathematics + 202403 + 20240514 + 8 + 1 + 10.1137/22M1489150 + 54 + 88 + + + + + + ScoccolaLuis + SetlurSiddharth + LoiseauxDavid + CarrièreMathieu + OudotSteve + + Differentiability and Optimization of Multiparameter Persistent Homology + Proceedings of the 41st International Conference on Machine Learning + PMLR + 202407 + 20241002 + 235 + 2640-3498 + 43986 + 44011 + + + + + + KrekelHolger + OliveiraBruno + PfannschmidtRonny + BruynoogheFloris + LaugherBrianna + BruhinFlorian + + Pytest 8.3 + 2004 + + + + + + BehnelS. + BradshawR. + CitroC. + DalcinL. + SeljebotnD. S. + SmithK. + + Cython: The best of both worlds + Computing in Science Engineering + 13 + 2 + 1521-9615 + 10.1109/MCSE.2010.118 + 31 + 39 + + + + + + RobisonArch D. + + Intel Threading Building Blocks (TBB) + Encyclopedia of Parallel Computing + + PaduaDavid + + Springer US + Boston, MA + 2011 + 20241009 + 978-0-387-09766-4 + 10.1007/978-0-387-09766-4_51 + 955 + 964 + + + + +