diff --git a/joss.06467/10.21105.joss.06467.crossref.xml b/joss.06467/10.21105.joss.06467.crossref.xml new file mode 100644 index 0000000000..e6f4a78e34 --- /dev/null +++ b/joss.06467/10.21105.joss.06467.crossref.xml @@ -0,0 +1,215 @@ + + + + 20240520T124657-95fc0cee8540ae947a503a9d8f5a1cc15aba0d25 + 20240520124657 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 05 + 2024 + + + 9 + + 97 + + + + HealpixMPI.jl: an MPI-parallel implementation of the +Healpix tessellation scheme in Julia + + + + Leo A. + Bianchi + https://orcid.org/0009-0002-6351-5426 + + + + 05 + 20 + 2024 + + + 6467 + + + 10.21105/joss.06467 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.11192548 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/6467 + + + + 10.21105/joss.06467 + https://joss.theoj.org/papers/10.21105/joss.06467 + + + https://joss.theoj.org/papers/10.21105/joss.06467.pdf + + + + + + Healpix.jl: Julia-only port of the HEALPix +library + Tomasi + 2021 + Tomasi, M., & Li, Z. (2021). +Healpix.jl: Julia-only port of the HEALPix library (Version 3.0, p. +ascl:2109.028). + + + COSMOGLOBE DR1 results: I. Improved Wilkinson +microwave anisotropy probe maps through Bayesian end-to-end +analysis + Watts + Astronomy & +Astrophysics + 679 + 10.1051/0004-6361/202346414 + 1432-0746 + 2023 + Watts, D. J., Basyrov, A., Eskilt, J. +R., Galloway, M., Gjerløw, E., Hergt, L. T., Herman, D., Ihle, H. T., +Paradiso, S., Rahman, F., Thommesen, H., Aurlien, R., Bersanelli, M., +Bianchi, L. A., Brilenkov, M., Colombo, L. P. L., Eriksen, H. K., +Franceschet, C., Fuskeland, U., … Zhou, Y. (2023). COSMOGLOBE DR1 +results: I. Improved Wilkinson microwave anisotropy probe maps through +Bayesian end-to-end analysis. Astronomy &Amp; Astrophysics, 679, +A143. +https://doi.org/10.1051/0004-6361/202346414 + + + HEALPix: A framework for high-resolution +discretization and fast analysis of data distributed on the +sphere + Gorski + The Astrophysical Journal + 2 + 622 + 10.1086/427976 + 2005 + Gorski, K. M., Hivon, E., Banday, A. +J., Wandelt, B. D., Hansen, F. K., Reinecke, M., & Bartelmann, M. +(2005). HEALPix: A framework for high-resolution discretization and fast +analysis of data distributed on the sphere. The Astrophysical Journal, +622(2), 759–771. https://doi.org/10.1086/427976 + + + Power spectrum estimation from +high-resolution maps by Gibbs sampling + Eriksen + The Astrophysical Journal Supplement +Series + 2 + 155 + 10.1086/425219 + 2004 + Eriksen, H. K., O’Dwyer, I. J., +Jewell, J. B., Wandelt, B. D., Larson, D. L., Gorski, K. M., Levin, S., +Banday, A. J., & Lilje, P. B. (2004). Power spectrum estimation from +high-resolution maps by Gibbs sampling. The Astrophysical Journal +Supplement Series, 155(2), 227–241. +https://doi.org/10.1086/425219 + + + Libsharp spherical harmonic transforms +revisited + Reinecke + Astronomy & Astrophysics + 554 + 10.1051/0004-6361/201321494 + 2013 + Reinecke, M., & Seljebotn, D. S. +(2013). Libsharp spherical harmonic transforms revisited. Astronomy +& Astrophysics, 554, A112. +https://doi.org/10.1051/0004-6361/201321494 + + + DUCC + Reinecke + GitLab repository + 2019 + Reinecke, M. (2019). DUCC. In GitLab +repository. GitLab. +https://gitlab.mpcdf.mpg.de/mtr/ducc + + + MPI.jl: Julia bindings for the message +passing interface + Byrne + Proceedings of the JuliaCon +Conferences + 1 + 1 + 10.21105/jcon.00068 + 2021 + Byrne, S., Wilcox, L. C., & +Churavy, V. (2021). MPI.jl: Julia bindings for the message passing +interface. Proceedings of the JuliaCon Conferences, 1(1), 68. +https://doi.org/10.21105/jcon.00068 + + + Euclid preparation: XXVIII. Modelling of the +weak lensing angular power spectrum + Euclid Collaboration + 2023 + Euclid Collaboration. (2023). Euclid +preparation: XXVIII. Modelling of the weak lensing angular power +spectrum. https://arxiv.org/abs/2302.04507 + + + Almanac: Weak lensing power spectra and map +inference on the masked sphere + Loureiro + The Open Journal of +Astrophysics + 6 + 10.21105/astro.2210.13260 + 2565-6120 + 2023 + Loureiro, A., Whiteaway, L., +Sellentin, E., Lafaurie, J. S., Jaffe, A. H., & Heavens, A. F. +(2023). Almanac: Weak lensing power spectra and map inference on the +masked sphere. The Open Journal of Astrophysics, 6. +https://doi.org/10.21105/astro.2210.13260 + + + + + + diff --git a/joss.06467/10.21105.joss.06467.pdf b/joss.06467/10.21105.joss.06467.pdf new file mode 100644 index 0000000000..a50bf464df Binary files /dev/null and b/joss.06467/10.21105.joss.06467.pdf differ diff --git a/joss.06467/paper.jats/10.21105.joss.06467.jats b/joss.06467/paper.jats/10.21105.joss.06467.jats new file mode 100644 index 0000000000..4a4dc9f088 --- /dev/null +++ b/joss.06467/paper.jats/10.21105.joss.06467.jats @@ -0,0 +1,495 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6467 +10.21105/joss.06467 + +HealpixMPI.jl: an MPI-parallel implementation of the +Healpix tessellation scheme in Julia + + + +https://orcid.org/0009-0002-6351-5426 + +Bianchi +Leo A. + + + + + + +Dipartimento di Fisica Aldo Pontremoli, Università degli +Studi di Milano, Milan, Italy + + + + +Institute of Theoretical Astrophysics, University of Oslo, +Blindern, Oslo, Norway + + + + +10 +1 +2024 + +9 +97 +6467 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +Julia +SHT +Healpix +parallel computing +cosmology + + + + + + Summary +

Spherical Harmonic Transforms (SHTs) can be seen as Fourier + Transforms’ spherical, two-dimensional counterparts, casting + real-space data to the spectral domain and vice versa. As in Fourier + analysis where a function is decomposed into a set of amplitude + coefficients, an SHT allows any spherically-symmetric field, defined + in real space, to be decomposed into a set of complex harmonic + coefficients + + a,m, + commonly referred to as alms, where each quantifies the contribution + of the corresponding spherical harmonic function.

+

SHTs are important for a wide variety of theoretical and practical + scientific applications, including particle physics, astrophysics, and + cosmology. However, SHTs are generally computationally expensive + operations and thus often constitute the bottleneck + of the scientific software they are part of. For this reason, much + effort has been spent over the last couple of decades to obtain fast + and efficient SHT implementations. In such a setting, parallel + computing naturally comes into play, especially for time-consuming + software to be run on large High-Performance Computing (HPC) + clusters.

+

The Julia package HealpixMPI.jl constitutes + an extension package of Healpix.jl + (Tomasi + & Li, 2021), efficiently parallelizing its SHT-related + functionalities. Healpix.jl is a Julia-only + implementation of the HEALPix + (Gorski + et al., 2005) library, which provides one of the most used + two-sphere tessellation schemes and a series of SHTs-related + functions.

+

The main goal of the Julia package presented in this paper, + HealpixMPI.jl, is to efficiently employ a large + number of computing cores to perform fast spherical harmonic + transforms. This paper presents the key features implemented to + achieve this, together with a statement of need and the results of a + parallel scaling test.

+

+ +

HealpixMPI.jl’s logo +

+ +
+
+ + Statement of need +

Together with a variety of applications, spherical harmonic + transforms are extremely relevant in different cosmological research + topics, e.g., Loureiro et al. + (2023) + and Euclid Collaboration + (2023). + Among those, SHTs are essential for the analysis of cosmic microwave + background (CMB) radiation, which is one of the most active cosmology + research areas. CMB radiation is, in fact, very conveniently described + as a temperature (and polarization) field on the celestial sphere, + making spherical harmonics the most natural mathematical tool for + analyzing its measured signal. On the other hand, from a computational + point of view, CMB field measurements need, of course, to be + discretized, requiring a mathematically consistent pixelization of the + sphere and the functions defined on it. This is exactly the goal + HEALPix was targeting when it was released more than two decades ago; + it quickly became the standard library for CMB numerical analysis. + HEALPix code can be, of course, used for a wider variety of + applications, but its bond with CMB analysis has always been + particularly strong, especially given the research focus of its main + authors. Not surprisingly, the cosmic microwave background is also the + research context wherein HealpixMPI.jl was + born.

+

SHTs are often the computational bottleneck of CMB data analysis + pipelines, as the one implemented by Cosmoglobe + (Watts + et al., 2023) based on the software Commander + (Eriksen + et al., 2004). Given the significantly increasing amount of + data produced by the most recent observational experiments, efficient + algorithms alone are no longer enough to perform SHTs within + acceptable run times, and a parallel architecture must be implemented. + In the specific case of Cosmoglobe and Commander, the goal for the + next years is to be able to run a full pipeline, and thus the SHTs + performed in it, on large HPC clusters efficiently + employing at least + + 104 + cores.

+

To achieve this, an implementation of massively parallel spherical + harmonic transforms beyond machine-size limitations is unavoidably + needed. The concept of HealpixMPI.jl was born + as a contribution to Cosmoglobe’s pipeline targeting this exact + goal.

+
+ + The latest SHT engine: DUCC +

As of the time this paper was submitted, + Healpix.jl relied on the SHTs provided by the C + library libsharp + (Reinecke + & Seljebotn, 2013). However, + libsharp’s development ceased a few years ago, + and its functionalities have been included as an SHT sub-module in + DUCC + (Reinecke, + 2019), an acronym of “Distinctively Useful Code + Collection.”

+

The timing between the development of + HealpixMPI.jl and a Julia interface for + DUCC has been quite fortunate. This allowed + HealpixMPI.jl to be up-to-date with the state + of the art of spherical harmonics upon its first release. In fact, + DUCC’s code is derived directly from + libsharp, but has been significantly enhanced + with the latest algorithmical improvements as well as the employment + of standard C++ multithreading for shared-memory + parallelization of the core operations.

+
+ + Hybrid parallelization of the SHT +

To run SHTs on a large number of cores, i.e., on an HPC cluster, + HealpixMPI.jl provides a hybrid parallel + design, based on simultaneous usage of multithreading and MPI, for + shared- and distributed-memory parallelization respectively, as shown + in + [fig:hybrid].

+ +

Multi-node computing cluster representation. The optimal + way to parallelize operations such as the SHTs on a cluster of + computers is to employ MPI to share the computation + between the available nodes, assigning one MPI task + per node, and multithreading to parallelize within + each node, involving as many CPUs as locally available. Figure taken + from www.comsol.com. +

+ +
+

In the case of ‘HealpixMPI.jl’, native C++ multithreading is + provided by DUCC for its spherical harmonic + transforms by default; while the MPI interface is entirely coded in + Julia and based on the package MPI.jl + (Byrne et + al., 2021).

+

Moreover, the MPI parallelization requires data to be distributed + across the MPI tasks. As shown in the usage examples, this is + implemented by mirroring Healpix.jl’s classes + with two new distributed data types: + DAlm and DMap, encoding + the harmonic coefficients and a pixelized representation of the + spherical field respectively.

+
+ + Usage example +

An usage example with all the necessary steps to set up and perform + an MPI-parallel alm2map SHT can be found on the + front page of HealpixMPI.jl’s + repository.

+

In addition, refer to + Jommander, + a parallel and Julia-only CMB Gibbs Sampler, for an example of code + based on HealpixMPI.jl.

+
+ + Scaling results +

This section shows the results of parallel benchmark tests + conducted on HealpixMPI.jl. In particular, a + strong-scaling scenario is analyzed: given a problem of fixed size, + the wall time improvement is measured as the number of cores exploited + in the computation is increased.

+

To obtain a reliable measurement of massively parallel spherical + harmonics wall time is certainly nontrivial, especially for tests + employing a high number of cores; intermittent operating system + activity (aka, jitter) can significantly distort the measurement of + short time scales. For this reason, the benchmark tests were carried + out by timing a batch of 20 alm2map + + adjoint_alm2map SHT pairs. For reference, the + scaling shown here is relative to unpolarized spherical harmonics with + + + Nside=4096 + and + + max=12287 + and were carried out on the + Hyades + cluster of the University of Oslo. The benchmark results + are quantified as the wall time multiplied by the total number of + cores, shown in a 3D plot + ([fig:bench]) as a + function of the number of local threads and MPI tasks (always one per + node).

+ +

The measured wall time is multiplied by the total number + of cores used and plotted as a function of the number of local + threads and MPI tasks used. The total number of cores corresponding + to each column is given by the product of these two quantities. +

+ +
+

Increasing the number of threads on a single core, for which no MPI + communication is needed, the scaling results nearly ideal up to + + + 50 + cores. For 60 and higher local threads we start observing a slight + slowdown, probably given by the many threads simultaneously trying to + access the same memory, hitting its bandwidth limit.

+

While switching to a multi-node setup, we introduce, as expected, + an overhead given by the necessary MPI communication whose size, + unfortunately, remains constant as we increase the number of local + threads. This leads to the ramp-like shape along the “local threads” + axis shown by the plot. However, the overhead size scales down, even + if not perfectly, when we increase the number of nodes, as the size of + the locally stored data will linearly decrease. This is shown by the + relatively flat shape of the plot along the “nodes”-axis.

+
+ + Acknowledgements +

The development of HealpixMPI.jl, which is + part of my master’s thesis, has been funded by the University of Milan + through a “Thesis Abroad Grant.” Moreover, I acknowledge significant + contributions to this project from Maurizio Tomasi, Martin Reinecke, + Hans Kristian Eriksen, and Sigurd Næss, as well as the support I + received from all the members of Cosmoglobe collaboration during my + stay at the Institute of Theoretical Astrophysics of the University of + Oslo.

+
+ + + + + + + TomasiMaurizio + LiZack + + Healpix.jl: Julia-only port of the HEALPix library + 202109 + ascl:2109.028 + + + + + + + WattsD. J. + BasyrovA. + EskiltJ. R. + GallowayM. + GjerløwE. + HergtL. T. + HermanD. + IhleH. T. + ParadisoS. + RahmanF. + ThommesenH. + AurlienR. + BersanelliM. + BianchiL. A. + BrilenkovM. + ColomboL. P. L. + EriksenH. K. + FranceschetC. + FuskelandU. + HensleyB. + HoerningG. A. + LeeK. + LundeJ. G. S. + MarinsA. + NervalS. K. + PatelS. K. + RegnierM. + SanM. + SanyalS. + StutzerN.-O. + VermaA. + WehusI. K. + ZhouY. + + COSMOGLOBE DR1 results: I. Improved Wilkinson microwave anisotropy probe maps through Bayesian end-to-end analysis + Astronomy & Astrophysics + EDP Sciences + 202311 + 679 + 1432-0746 + http://dx.doi.org/10.1051/0004-6361/202346414 + 10.1051/0004-6361/202346414 + A143 + + + + + + + GorskiK. M. + HivonE. + BandayA. J. + WandeltB. D. + HansenF. K. + ReineckeM. + BartelmannM. + + HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere + The Astrophysical Journal + American Astronomical Society + 200504 + 622 + 2 + https://doi.org/10.1086%2F427976 + 10.1086/427976 + 759 + 771 + + + + + + EriksenH. K. + O’DwyerI. J. + JewellJ. B. + WandeltB. D. + LarsonD. L. + GorskiK. M. + LevinS. + BandayA. J. + LiljeP. B. + + Power spectrum estimation from high-resolution maps by Gibbs sampling + The Astrophysical Journal Supplement Series + American Astronomical Society + 200412 + 155 + 2 + https://doi.org/10.1086 + 10.1086/425219 + 227 + 241 + + + + + + ReineckeM. + SeljebotnD. S. + + Libsharp spherical harmonic transforms revisited + Astronomy & Astrophysics + EDP Sciences + 201306 + 554 + https://doi.org/10.1051%2F0004-6361%2F201321494 + 10.1051/0004-6361/201321494 + A112 + + + + + + + ReineckeM. + + DUCC + GitLab repository + GitLab + 2019 + https://gitlab.mpcdf.mpg.de/mtr/ducc + + + + + + ByrneSimon + WilcoxLucas C. + ChuravyValentin + + MPI.jl: Julia bindings for the message passing interface + Proceedings of the JuliaCon Conferences + The Open Journal + 2021 + 1 + 1 + https://doi.org/10.21105/jcon.00068 + 10.21105/jcon.00068 + 68 + + + + + + + Euclid Collaboration + + Euclid preparation: XXVIII. Modelling of the weak lensing angular power spectrum + 2023 + https://arxiv.org/abs/2302.04507 + + + + + + LoureiroArthur + WhiteawayLorne + SellentinElena + LafaurieJavier Silva + JaffeAndrew H. + HeavensAlan F. + + Almanac: Weak lensing power spectra and map inference on the masked sphere + The Open Journal of Astrophysics + Maynooth University + 202302 + 6 + 2565-6120 + http://dx.doi.org/10.21105/astro.2210.13260 + 10.21105/astro.2210.13260 + + + + +
diff --git a/joss.06467/paper.jats/3DBench.png b/joss.06467/paper.jats/3DBench.png new file mode 100644 index 0000000000..3c1fe8a927 Binary files /dev/null and b/joss.06467/paper.jats/3DBench.png differ diff --git a/joss.06467/paper.jats/hybrid_parallel.png b/joss.06467/paper.jats/hybrid_parallel.png new file mode 100644 index 0000000000..a9428ea6e0 Binary files /dev/null and b/joss.06467/paper.jats/hybrid_parallel.png differ diff --git a/joss.06467/paper.jats/logo.png b/joss.06467/paper.jats/logo.png new file mode 100644 index 0000000000..35402c97a8 Binary files /dev/null and b/joss.06467/paper.jats/logo.png differ