diff --git a/joss.06467/10.21105.joss.06467.crossref.xml b/joss.06467/10.21105.joss.06467.crossref.xml
new file mode 100644
index 0000000000..e6f4a78e34
--- /dev/null
+++ b/joss.06467/10.21105.joss.06467.crossref.xml
@@ -0,0 +1,215 @@
+
+
+
+ 20240520T124657-95fc0cee8540ae947a503a9d8f5a1cc15aba0d25
+ 20240520124657
+
+ JOSS Admin
+ admin@theoj.org
+
+ The Open Journal
+
+
+
+
+ Journal of Open Source Software
+ JOSS
+ 2475-9066
+
+ 10.21105/joss
+ https://joss.theoj.org
+
+
+
+
+ 05
+ 2024
+
+
+ 9
+
+ 97
+
+
+
+ HealpixMPI.jl: an MPI-parallel implementation of the
+Healpix tessellation scheme in Julia
+
+
+
+ Leo A.
+ Bianchi
+ https://orcid.org/0009-0002-6351-5426
+
+
+
+ 05
+ 20
+ 2024
+
+
+ 6467
+
+
+ 10.21105/joss.06467
+
+
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+
+
+
+ Software archive
+ 10.5281/zenodo.11192548
+
+
+ GitHub review issue
+ https://github.com/openjournals/joss-reviews/issues/6467
+
+
+
+ 10.21105/joss.06467
+ https://joss.theoj.org/papers/10.21105/joss.06467
+
+
+ https://joss.theoj.org/papers/10.21105/joss.06467.pdf
+
+
+
+
+
+ Healpix.jl: Julia-only port of the HEALPix
+library
+ Tomasi
+ 2021
+ Tomasi, M., & Li, Z. (2021).
+Healpix.jl: Julia-only port of the HEALPix library (Version 3.0, p.
+ascl:2109.028).
+
+
+ COSMOGLOBE DR1 results: I. Improved Wilkinson
+microwave anisotropy probe maps through Bayesian end-to-end
+analysis
+ Watts
+ Astronomy &
+Astrophysics
+ 679
+ 10.1051/0004-6361/202346414
+ 1432-0746
+ 2023
+ Watts, D. J., Basyrov, A., Eskilt, J.
+R., Galloway, M., Gjerløw, E., Hergt, L. T., Herman, D., Ihle, H. T.,
+Paradiso, S., Rahman, F., Thommesen, H., Aurlien, R., Bersanelli, M.,
+Bianchi, L. A., Brilenkov, M., Colombo, L. P. L., Eriksen, H. K.,
+Franceschet, C., Fuskeland, U., … Zhou, Y. (2023). COSMOGLOBE DR1
+results: I. Improved Wilkinson microwave anisotropy probe maps through
+Bayesian end-to-end analysis. Astronomy &Amp; Astrophysics, 679,
+A143.
+https://doi.org/10.1051/0004-6361/202346414
+
+
+ HEALPix: A framework for high-resolution
+discretization and fast analysis of data distributed on the
+sphere
+ Gorski
+ The Astrophysical Journal
+ 2
+ 622
+ 10.1086/427976
+ 2005
+ Gorski, K. M., Hivon, E., Banday, A.
+J., Wandelt, B. D., Hansen, F. K., Reinecke, M., & Bartelmann, M.
+(2005). HEALPix: A framework for high-resolution discretization and fast
+analysis of data distributed on the sphere. The Astrophysical Journal,
+622(2), 759–771. https://doi.org/10.1086/427976
+
+
+ Power spectrum estimation from
+high-resolution maps by Gibbs sampling
+ Eriksen
+ The Astrophysical Journal Supplement
+Series
+ 2
+ 155
+ 10.1086/425219
+ 2004
+ Eriksen, H. K., O’Dwyer, I. J.,
+Jewell, J. B., Wandelt, B. D., Larson, D. L., Gorski, K. M., Levin, S.,
+Banday, A. J., & Lilje, P. B. (2004). Power spectrum estimation from
+high-resolution maps by Gibbs sampling. The Astrophysical Journal
+Supplement Series, 155(2), 227–241.
+https://doi.org/10.1086/425219
+
+
+ Libsharp spherical harmonic transforms
+revisited
+ Reinecke
+ Astronomy & Astrophysics
+ 554
+ 10.1051/0004-6361/201321494
+ 2013
+ Reinecke, M., & Seljebotn, D. S.
+(2013). Libsharp spherical harmonic transforms revisited. Astronomy
+& Astrophysics, 554, A112.
+https://doi.org/10.1051/0004-6361/201321494
+
+
+ DUCC
+ Reinecke
+ GitLab repository
+ 2019
+ Reinecke, M. (2019). DUCC. In GitLab
+repository. GitLab.
+https://gitlab.mpcdf.mpg.de/mtr/ducc
+
+
+ MPI.jl: Julia bindings for the message
+passing interface
+ Byrne
+ Proceedings of the JuliaCon
+Conferences
+ 1
+ 1
+ 10.21105/jcon.00068
+ 2021
+ Byrne, S., Wilcox, L. C., &
+Churavy, V. (2021). MPI.jl: Julia bindings for the message passing
+interface. Proceedings of the JuliaCon Conferences, 1(1), 68.
+https://doi.org/10.21105/jcon.00068
+
+
+ Euclid preparation: XXVIII. Modelling of the
+weak lensing angular power spectrum
+ Euclid Collaboration
+ 2023
+ Euclid Collaboration. (2023). Euclid
+preparation: XXVIII. Modelling of the weak lensing angular power
+spectrum. https://arxiv.org/abs/2302.04507
+
+
+ Almanac: Weak lensing power spectra and map
+inference on the masked sphere
+ Loureiro
+ The Open Journal of
+Astrophysics
+ 6
+ 10.21105/astro.2210.13260
+ 2565-6120
+ 2023
+ Loureiro, A., Whiteaway, L.,
+Sellentin, E., Lafaurie, J. S., Jaffe, A. H., & Heavens, A. F.
+(2023). Almanac: Weak lensing power spectra and map inference on the
+masked sphere. The Open Journal of Astrophysics, 6.
+https://doi.org/10.21105/astro.2210.13260
+
+
+
+
+
+
diff --git a/joss.06467/10.21105.joss.06467.pdf b/joss.06467/10.21105.joss.06467.pdf
new file mode 100644
index 0000000000..a50bf464df
Binary files /dev/null and b/joss.06467/10.21105.joss.06467.pdf differ
diff --git a/joss.06467/paper.jats/10.21105.joss.06467.jats b/joss.06467/paper.jats/10.21105.joss.06467.jats
new file mode 100644
index 0000000000..4a4dc9f088
--- /dev/null
+++ b/joss.06467/paper.jats/10.21105.joss.06467.jats
@@ -0,0 +1,495 @@
+
+
+
+
+
+
+
+Journal of Open Source Software
+JOSS
+
+2475-9066
+
+Open Journals
+
+
+
+6467
+10.21105/joss.06467
+
+HealpixMPI.jl: an MPI-parallel implementation of the
+Healpix tessellation scheme in Julia
+
+
+
+https://orcid.org/0009-0002-6351-5426
+
+Bianchi
+Leo A.
+
+
+
+
+
+
+Dipartimento di Fisica Aldo Pontremoli, Università degli
+Studi di Milano, Milan, Italy
+
+
+
+
+Institute of Theoretical Astrophysics, University of Oslo,
+Blindern, Oslo, Norway
+
+
+
+
+10
+1
+2024
+
+9
+97
+6467
+
+Authors of papers retain copyright and release the
+work under a Creative Commons Attribution 4.0 International License (CC
+BY 4.0)
+2022
+The article authors
+
+Authors of papers retain copyright and release the work under
+a Creative Commons Attribution 4.0 International License (CC BY
+4.0)
+
+
+
+Julia
+SHT
+Healpix
+parallel computing
+cosmology
+
+
+
+
+
+ Summary
+
Spherical Harmonic Transforms (SHTs) can be seen as Fourier
+ Transforms’ spherical, two-dimensional counterparts, casting
+ real-space data to the spectral domain and vice versa. As in Fourier
+ analysis where a function is decomposed into a set of amplitude
+ coefficients, an SHT allows any spherically-symmetric field, defined
+ in real space, to be decomposed into a set of complex harmonic
+ coefficients
+
+ aℓ,m,
+ commonly referred to as alms, where each quantifies the contribution
+ of the corresponding spherical harmonic function.
+
SHTs are important for a wide variety of theoretical and practical
+ scientific applications, including particle physics, astrophysics, and
+ cosmology. However, SHTs are generally computationally expensive
+ operations and thus often constitute the bottleneck
+ of the scientific software they are part of. For this reason, much
+ effort has been spent over the last couple of decades to obtain fast
+ and efficient SHT implementations. In such a setting, parallel
+ computing naturally comes into play, especially for time-consuming
+ software to be run on large High-Performance Computing (HPC)
+ clusters.
+
The Julia package HealpixMPI.jl constitutes
+ an extension package of Healpix.jl
+ (Tomasi
+ & Li, 2021), efficiently parallelizing its SHT-related
+ functionalities. Healpix.jl is a Julia-only
+ implementation of the HEALPix
+ (Gorski
+ et al., 2005) library, which provides one of the most used
+ two-sphere tessellation schemes and a series of SHTs-related
+ functions.
+
The main goal of the Julia package presented in this paper,
+ HealpixMPI.jl, is to efficiently employ a large
+ number of computing cores to perform fast spherical harmonic
+ transforms. This paper presents the key features implemented to
+ achieve this, together with a statement of need and the results of a
+ parallel scaling test.
+
+
+
HealpixMPI.jl’s logo
+
+
+
+
+
+ Statement of need
+
Together with a variety of applications, spherical harmonic
+ transforms are extremely relevant in different cosmological research
+ topics, e.g., Loureiro et al.
+ (2023)
+ and Euclid Collaboration
+ (2023).
+ Among those, SHTs are essential for the analysis of cosmic microwave
+ background (CMB) radiation, which is one of the most active cosmology
+ research areas. CMB radiation is, in fact, very conveniently described
+ as a temperature (and polarization) field on the celestial sphere,
+ making spherical harmonics the most natural mathematical tool for
+ analyzing its measured signal. On the other hand, from a computational
+ point of view, CMB field measurements need, of course, to be
+ discretized, requiring a mathematically consistent pixelization of the
+ sphere and the functions defined on it. This is exactly the goal
+ HEALPix was targeting when it was released more than two decades ago;
+ it quickly became the standard library for CMB numerical analysis.
+ HEALPix code can be, of course, used for a wider variety of
+ applications, but its bond with CMB analysis has always been
+ particularly strong, especially given the research focus of its main
+ authors. Not surprisingly, the cosmic microwave background is also the
+ research context wherein HealpixMPI.jl was
+ born.
+
SHTs are often the computational bottleneck of CMB data analysis
+ pipelines, as the one implemented by Cosmoglobe
+ (Watts
+ et al., 2023) based on the software Commander
+ (Eriksen
+ et al., 2004). Given the significantly increasing amount of
+ data produced by the most recent observational experiments, efficient
+ algorithms alone are no longer enough to perform SHTs within
+ acceptable run times, and a parallel architecture must be implemented.
+ In the specific case of Cosmoglobe and Commander, the goal for the
+ next years is to be able to run a full pipeline, and thus the SHTs
+ performed in it, on large HPC clusters efficiently
+ employing at least
+
+ 104
+ cores.
+
To achieve this, an implementation of massively parallel spherical
+ harmonic transforms beyond machine-size limitations is unavoidably
+ needed. The concept of HealpixMPI.jl was born
+ as a contribution to Cosmoglobe’s pipeline targeting this exact
+ goal.
+
+
+ The latest SHT engine: DUCC
+
As of the time this paper was submitted,
+ Healpix.jl relied on the SHTs provided by the C
+ library libsharp
+ (Reinecke
+ & Seljebotn, 2013). However,
+ libsharp’s development ceased a few years ago,
+ and its functionalities have been included as an SHT sub-module in
+ DUCC
+ (Reinecke,
+ 2019), an acronym of “Distinctively Useful Code
+ Collection.”
+
The timing between the development of
+ HealpixMPI.jl and a Julia interface for
+ DUCC has been quite fortunate. This allowed
+ HealpixMPI.jl to be up-to-date with the state
+ of the art of spherical harmonics upon its first release. In fact,
+ DUCC’s code is derived directly from
+ libsharp, but has been significantly enhanced
+ with the latest algorithmical improvements as well as the employment
+ of standard C++ multithreading for shared-memory
+ parallelization of the core operations.
+
+
+ Hybrid parallelization of the SHT
+
To run SHTs on a large number of cores, i.e., on an HPC cluster,
+ HealpixMPI.jl provides a hybrid parallel
+ design, based on simultaneous usage of multithreading and MPI, for
+ shared- and distributed-memory parallelization respectively, as shown
+ in
+ [fig:hybrid].
+
+
Multi-node computing cluster representation. The optimal
+ way to parallelize operations such as the SHTs on a cluster of
+ computers is to employ MPI to share the computation
+ between the available nodes, assigning one MPI task
+ per node, and multithreading to parallelize within
+ each node, involving as many CPUs as locally available. Figure taken
+ from www.comsol.com.
+
+
+
+
In the case of ‘HealpixMPI.jl’, native C++ multithreading is
+ provided by DUCC for its spherical harmonic
+ transforms by default; while the MPI interface is entirely coded in
+ Julia and based on the package MPI.jl
+ (Byrne et
+ al., 2021).
+
Moreover, the MPI parallelization requires data to be distributed
+ across the MPI tasks. As shown in the usage examples, this is
+ implemented by mirroring Healpix.jl’s classes
+ with two new distributed data types:
+ DAlm and DMap, encoding
+ the harmonic coefficients and a pixelized representation of the
+ spherical field respectively.
+
+
+ Usage example
+
An usage example with all the necessary steps to set up and perform
+ an MPI-parallel alm2map SHT can be found on the
+ front page of HealpixMPI.jl’s
+ repository.
+
In addition, refer to
+ Jommander,
+ a parallel and Julia-only CMB Gibbs Sampler, for an example of code
+ based on HealpixMPI.jl.
+
+
+ Scaling results
+
This section shows the results of parallel benchmark tests
+ conducted on HealpixMPI.jl. In particular, a
+ strong-scaling scenario is analyzed: given a problem of fixed size,
+ the wall time improvement is measured as the number of cores exploited
+ in the computation is increased.
+
To obtain a reliable measurement of massively parallel spherical
+ harmonics wall time is certainly nontrivial, especially for tests
+ employing a high number of cores; intermittent operating system
+ activity (aka, jitter) can significantly distort the measurement of
+ short time scales. For this reason, the benchmark tests were carried
+ out by timing a batch of 20 alm2map +
+ adjoint_alm2map SHT pairs. For reference, the
+ scaling shown here is relative to unpolarized spherical harmonics with
+
+
+ Nside=4096
+ and
+
+ ℓmax=12287
+ and were carried out on the
+ Hyades
+ cluster of the University of Oslo. The benchmark results
+ are quantified as the wall time multiplied by the total number of
+ cores, shown in a 3D plot
+ ([fig:bench]) as a
+ function of the number of local threads and MPI tasks (always one per
+ node).
+
+
The measured wall time is multiplied by the total number
+ of cores used and plotted as a function of the number of local
+ threads and MPI tasks used. The total number of cores corresponding
+ to each column is given by the product of these two quantities.
+
+
+
+
Increasing the number of threads on a single core, for which no MPI
+ communication is needed, the scaling results nearly ideal up to
+
+
+ ∼50
+ cores. For 60 and higher local threads we start observing a slight
+ slowdown, probably given by the many threads simultaneously trying to
+ access the same memory, hitting its bandwidth limit.
+
While switching to a multi-node setup, we introduce, as expected,
+ an overhead given by the necessary MPI communication whose size,
+ unfortunately, remains constant as we increase the number of local
+ threads. This leads to the ramp-like shape along the “local threads”
+ axis shown by the plot. However, the overhead size scales down, even
+ if not perfectly, when we increase the number of nodes, as the size of
+ the locally stored data will linearly decrease. This is shown by the
+ relatively flat shape of the plot along the “nodes”-axis.
+
+
+ Acknowledgements
+
The development of HealpixMPI.jl, which is
+ part of my master’s thesis, has been funded by the University of Milan
+ through a “Thesis Abroad Grant.” Moreover, I acknowledge significant
+ contributions to this project from Maurizio Tomasi, Martin Reinecke,
+ Hans Kristian Eriksen, and Sigurd Næss, as well as the support I
+ received from all the members of Cosmoglobe collaboration during my
+ stay at the Institute of Theoretical Astrophysics of the University of
+ Oslo.
+
+
+
+
+
+
+
+ TomasiMaurizio
+ LiZack
+
+ Healpix.jl: Julia-only port of the HEALPix library
+ 202109
+ ascl:2109.028
+
+
+
+
+
+
+ WattsD. J.
+ BasyrovA.
+ EskiltJ. R.
+ GallowayM.
+ GjerløwE.
+ HergtL. T.
+ HermanD.
+ IhleH. T.
+ ParadisoS.
+ RahmanF.
+ ThommesenH.
+ AurlienR.
+ BersanelliM.
+ BianchiL. A.
+ BrilenkovM.
+ ColomboL. P. L.
+ EriksenH. K.
+ FranceschetC.
+ FuskelandU.
+ HensleyB.
+ HoerningG. A.
+ LeeK.
+ LundeJ. G. S.
+ MarinsA.
+ NervalS. K.
+ PatelS. K.
+ RegnierM.
+ SanM.
+ SanyalS.
+ StutzerN.-O.
+ VermaA.
+ WehusI. K.
+ ZhouY.
+
+ COSMOGLOBE DR1 results: I. Improved Wilkinson microwave anisotropy probe maps through Bayesian end-to-end analysis
+
+ EDP Sciences
+ 202311
+ 679
+ 1432-0746
+ http://dx.doi.org/10.1051/0004-6361/202346414
+ 10.1051/0004-6361/202346414
+ A143
+
+
+
+
+
+
+ GorskiK. M.
+ HivonE.
+ BandayA. J.
+ WandeltB. D.
+ HansenF. K.
+ ReineckeM.
+ BartelmannM.
+
+ HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere
+
+ American Astronomical Society
+ 200504
+ 622
+ 2
+ https://doi.org/10.1086%2F427976
+ 10.1086/427976
+ 759
+ 771
+
+
+
+
+
+ EriksenH. K.
+ O’DwyerI. J.
+ JewellJ. B.
+ WandeltB. D.
+ LarsonD. L.
+ GorskiK. M.
+ LevinS.
+ BandayA. J.
+ LiljeP. B.
+
+ Power spectrum estimation from high-resolution maps by Gibbs sampling
+
+ American Astronomical Society
+ 200412
+ 155
+ 2
+ https://doi.org/10.1086
+ 10.1086/425219
+ 227
+ 241
+
+
+
+
+
+ ReineckeM.
+ SeljebotnD. S.
+
+ Libsharp spherical harmonic transforms revisited
+
+ EDP Sciences
+ 201306
+ 554
+ https://doi.org/10.1051%2F0004-6361%2F201321494
+ 10.1051/0004-6361/201321494
+ A112
+
+
+
+
+
+
+ ReineckeM.
+
+ DUCC
+
+ GitLab
+ 2019
+ https://gitlab.mpcdf.mpg.de/mtr/ducc
+
+
+
+
+
+ ByrneSimon
+ WilcoxLucas C.
+ ChuravyValentin
+
+ MPI.jl: Julia bindings for the message passing interface
+
+ The Open Journal
+ 2021
+ 1
+ 1
+ https://doi.org/10.21105/jcon.00068
+ 10.21105/jcon.00068
+ 68
+
+
+
+
+
+
+ Euclid Collaboration
+
+ Euclid preparation: XXVIII. Modelling of the weak lensing angular power spectrum
+ 2023
+ https://arxiv.org/abs/2302.04507
+
+
+
+
+
+ LoureiroArthur
+ WhiteawayLorne
+ SellentinElena
+ LafaurieJavier Silva
+ JaffeAndrew H.
+ HeavensAlan F.
+
+ Almanac: Weak lensing power spectra and map inference on the masked sphere
+
+ Maynooth University
+ 202302
+ 6
+ 2565-6120
+ http://dx.doi.org/10.21105/astro.2210.13260
+ 10.21105/astro.2210.13260
+
+
+
+
+
diff --git a/joss.06467/paper.jats/3DBench.png b/joss.06467/paper.jats/3DBench.png
new file mode 100644
index 0000000000..3c1fe8a927
Binary files /dev/null and b/joss.06467/paper.jats/3DBench.png differ
diff --git a/joss.06467/paper.jats/hybrid_parallel.png b/joss.06467/paper.jats/hybrid_parallel.png
new file mode 100644
index 0000000000..a9428ea6e0
Binary files /dev/null and b/joss.06467/paper.jats/hybrid_parallel.png differ
diff --git a/joss.06467/paper.jats/logo.png b/joss.06467/paper.jats/logo.png
new file mode 100644
index 0000000000..35402c97a8
Binary files /dev/null and b/joss.06467/paper.jats/logo.png differ