diff --git a/joss.06598/10.21105.joss.06598.crossref.xml b/joss.06598/10.21105.joss.06598.crossref.xml new file mode 100644 index 0000000000..b35bb71381 --- /dev/null +++ b/joss.06598/10.21105.joss.06598.crossref.xml @@ -0,0 +1,307 @@ + + + + 20240621235143-d0b2c26b7a27bbbc4bf3b844b44718707ab5946f + 20240621235143 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 06 + 2024 + + + 9 + + 98 + + + + Delta-Rice: A HDF5 Compression Plugin optimized for +Digitized Detector Data + + + + D. G. + Mathews + https://orcid.org/0000-0002-4897-4379 + + + C. B. + Crawford + https://orcid.org/0000-0002-1932-4334 + + + S. + Baeßler + https://orcid.org/0000-0001-7732-9873 + + + N. + Birge + https://orcid.org/0000-0003-1894-5494 + + + L. J. + Broussard + https://orcid.org/0000-0001-9182-2808 + + + F. + Gonzalez + https://orcid.org/0000-0002-5954-4155 + + + L. + Hayen + https://orcid.org/0000-0002-9471-0964 + + + A. + Jezghani + https://orcid.org/0000-0002-4302-4227 + + + H. + Li + https://orcid.org/0000-0003-3726-9663 + + + R. + Mammei + https://orcid.org/0009-0005-3481-4832 + + + A. + Mendelsohn + https://orcid.org/0000-0002-4847-2133 + + + G. + Randall + https://orcid.org/0000-0002-9713-8465 + + + G. V. + Riley + https://orcid.org/0000-0001-7323-8448 + + + D. C. + Schaper + https://orcid.org/0000-0002-6219-650X + + + + 06 + 21 + 2024 + + + 6598 + + + 10.21105/joss.06598 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.11490673 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/6598 + + + + 10.21105/joss.06598 + https://joss.theoj.org/papers/10.21105/joss.06598 + + + https://joss.theoj.org/papers/10.21105/joss.06598.pdf + + + + + + A modular apparatus for use in high-precision +measurements of parity violation in polarized eV neutron +transmission + Schaper + Nuclear Instruments and Methods in Physics +Research Section A: Accelerators, Spectrometers, Detectors and +Associated Equipment + 969 + 10.1016/j.nima.2020.163961 + 0168-9002 + 2020 + Schaper, D. C., Auton, C., +Barrón-Palos, L., Borrego, M., Chavez, A., Cole, L., Crawford, C. B., +Curole, J., Dhahri, H., Dickerson, K. A., Doskow, J., Fox, W., Gervais, +M. H., Goodson, B. M., Knickerbocker, K., Jiang, C., King, P. M., Lu, +H., Mocko, M., … Visser, G. (2020). A modular apparatus for use in +high-precision measurements of parity violation in polarized eV neutron +transmission. Nuclear Instruments and Methods in Physics Research +Section A: Accelerators, Spectrometers, Detectors and Associated +Equipment, 969, 163961. +https://doi.org/10.1016/j.nima.2020.163961 + + + Run-length encodings +(corresp.) + Golomb + IEEE Transactions on Information +Theory + 3 + 12 + 10.1109/TIT.1966.1053907 + 1966 + Golomb, S. (1966). Run-length +encodings (corresp.). IEEE Transactions on Information Theory, 12(3), +399–401. +https://doi.org/10.1109/TIT.1966.1053907 + + + Adaptive variable-length coding for efficient +compression of spacecraft television data + Rice + IEEE Transactions on Communication +Technology + 6 + 19 + 10.1109/TCOM.1971.1090789 + 1971 + Rice, R., & Plaunt, J. (1971). +Adaptive variable-length coding for efficient compression of spacecraft +television data. IEEE Transactions on Communication Technology, 19(6), +889–897. +https://doi.org/10.1109/TCOM.1971.1090789 + + + A compression scheme for radio data in high +performance computing + Masui + Astronomy and Computing + 12 + 10.1016/j.ascom.2015.07.002 + 2213-1337 + 2015 + Masui, K., Amiri, M., Connor, L., +Deng, M., Fandino, M., Höfer, C., Halpern, M., Hanna, D., Hincks, A. D., +Hinshaw, G., Parra, J. M., Newburgh, L. B., Shaw, J. R., & +Vanderlinde, K. (2015). A compression scheme for radio data in high +performance computing. Astronomy and Computing, 12, 181–190. +https://doi.org/10.1016/j.ascom.2015.07.002 + + + HDF5-version 1.12.0 + The HDF Group + 10.11578/dc.20180330.1 + 2020 + The HDF Group, N., Koziol, Q., & +Science, U. O. of. (2020). HDF5-version 1.12.0. +https://doi.org/10.11578/dc.20180330.1 + + + High performance data acquisition and +analysis routines for the Nab experiment + Mathews + 10.13023/etd.2022.446 + 2022 + Mathews, D. (2022). High performance +data acquisition and analysis routines for the Nab experiment [PhD +thesis, University of Kentucky]. +https://doi.org/10.13023/etd.2022.446 + + + The Nab experiment: A precision measurement +of unpolarized neutron beta decay + Fry + EPJ Web of Conferences + 219 + 10.1051/epjconf/201921904002 + 2019 + Fry, J., Alarcon, R., Baeßler, S., +Balascuta, S., Palos, L. B., Bailey, T., Bass, K., Birge, N., Blose, A., +Borissenko, D., Bowman, J. D., Broussard, L. J., Bryant, A. T., Byrne, +J., Calarco, J. R., Caylor, J., Chang, K., Chupp, T., Cianciolo, T. V., +… Zeck, B. (2019). The Nab experiment: A precision measurement of +unpolarized neutron beta decay. EPJ Web of Conferences, 219, 04002. +https://doi.org/10.1051/epjconf/201921904002 + + + OpenMP: An industry standard API for +shared-memory programming + Dagum + Computational Science & Engineering, +IEEE + 1 + 5 + 10.1109/99.660313 + 1998 + Dagum, L., & Menon, R. (1998). +OpenMP: An industry standard API for shared-memory programming. +Computational Science & Engineering, IEEE, 5(1), 46–55. +https://doi.org/10.1109/99.660313 + + + A new cryogenic apparatus to search for the +neutron electric dipole moment + Ahmed + Journal of Instrumentation + 11 + 14 + 10.1088/1748-0221/14/11/P11017 + 2019 + Ahmed, M. W., Alarcon, R., +Aleksandrova, A., Baeßler, S., Barron-Palos, L., Bartoszek, L. M., Beck, +D. H., Behzadipour, M., Berkutov, I., Bessuille, J., Blatnik, M., +Broering, M., Broussard, L. J., Busch, M., Carr, R., Cianciolo, V., +Clayton, S. M., Cooper, M. D., Crawford, C., … Young, A. R. (2019). A +new cryogenic apparatus to search for the neutron electric dipole +moment. Journal of Instrumentation, 14(11), P11017. +https://doi.org/10.1088/1748-0221/14/11/P11017 + + + + + + diff --git a/joss.06598/10.21105.joss.06598.pdf b/joss.06598/10.21105.joss.06598.pdf new file mode 100644 index 0000000000..25e084a002 Binary files /dev/null and b/joss.06598/10.21105.joss.06598.pdf differ diff --git a/joss.06598/paper.jats/10.21105.joss.06598.jats b/joss.06598/paper.jats/10.21105.joss.06598.jats new file mode 100644 index 0000000000..44886952aa --- /dev/null +++ b/joss.06598/paper.jats/10.21105.joss.06598.jats @@ -0,0 +1,820 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6598 +10.21105/joss.06598 + +Delta-Rice: A HDF5 Compression Plugin optimized for +Digitized Detector Data + + + +https://orcid.org/0000-0002-4897-4379 + +Mathews +D. G. + + + +* + + +https://orcid.org/0000-0002-1932-4334 + +Crawford +C. B. + + + + +https://orcid.org/0000-0001-7732-9873 + +Baeßler +S. + + + + + +https://orcid.org/0000-0003-1894-5494 + +Birge +N. + + + + +https://orcid.org/0000-0001-9182-2808 + +Broussard +L. J. + + + + +https://orcid.org/0000-0002-5954-4155 + +Gonzalez +F. + + + + +https://orcid.org/0000-0002-9471-0964 + +Hayen +L. + + + + + + +https://orcid.org/0000-0002-4302-4227 + +Jezghani +A. + + + + + +https://orcid.org/0000-0003-3726-9663 + +Li +H. + + + + +https://orcid.org/0009-0005-3481-4832 + +Mammei +R. + + + + + +https://orcid.org/0000-0002-4847-2133 + +Mendelsohn +A. + + + + +https://orcid.org/0000-0002-9713-8465 + +Randall +G. + + + + +https://orcid.org/0000-0001-7323-8448 + +Riley +G. V. + + + + +https://orcid.org/0000-0002-6219-650X + +Schaper +D. C. + + + + + + +Oak Ridge National Laboratory, Oak Ridge, TN, +USA + + + + +Department of Physics and Astronomy, University of +Kentucky, Lexington, KY, USA + + + + +Department of Physics, University of Virginia, +Charlottesvile, VA, USA + + + + +Department of Physics, University of Tennessee, Knoxville, +TN, USA + + + + +Department of Physics, North Carolina State University, +Raleigh, NC, USA + + + + +Triangle Universities Nuclear Laboratory, Durham, NC, +USA + + + + +Normandie University, Rouen, France + + + + +Georgia Institute of Technology, Atlanta, GA, +USA + + + + +University of Manitoba, Winnipeg, Canada + + + + +University of Winnipeg, Winnipeg, Canada + + + + +Arizona State University, Tempe, AZ, USA + + + + +Los Alamos National Laboratory, Los Alamos, NM, +USA + + + + +* E-mail: + + +3 +10 +2023 + +9 +98 +6598 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +h5py +HDF5 +compression +digitization +GPU + + + + + + Summary +

Delta-Rice is an HDF5 + (The + HDF Group et al., 2020) filter plugin that was developed to + compress digitized detector signals recorded by the Nab experiment + (Fry et + al., 2019), a fundamental neutron physics experiment. This is a + two-step process where incoming data is passed through a + pre-processing filter and then compressed with Rice coding. A routine + for determining the optimal pre-processing filter for a dataset is + provided along with an example GPU deployment. When applied to data + collected by the Nab data acquisition system, this method produced + output files 29% their initial size, and was able to do so with an + average read/write throughput in excess of 2 GB/s on a single CPU. + Compared to the widely used Gzip compression routine, Delta-Rice + reduces the file size by 33% more with over an order of magnitude + increase in read/write throughput. Delta-Rice is available on CPU to + users through the HDF5 library.

+
+ + Statement of Need +

Many modern nuclear physics experiments, such as the Nab + experiment, will produce petabytes of data. The cost and complexity of + storing such a datasets motivated the development of a compression + routine tailored specifically to the type of signals commonly recorded + in these experiments. In these experiments, any compression routine + must be fast enough to support real-time compression while also being + lossless to prevent any reduction in the precision of offline + analysis. Additionally, any candidate routine must be easily + accessible to the various members of the collaboration and should not + restrict users to a particular programming language to allow for a + variety of analysis methods. `Delta-Rice’ was designed to meet these + requirements and was implemented as an HDF5 filter plugin to ensure + that each user can easily access data with minimal additional + requirements in multiple programming languages + (Mathews, + 2022). While many other filter plugins exist for HDF5 files, + such as Bitshuffle + (Masui + et al., 2015) and Gzip, Delta-Rice offers improved throughput + and reduction in data size for many experimental efforts such as the + Nab, NOPTREX + (Schaper + et al., 2020), and nEDM@SNS + (Ahmed + et al., 2019) efforts.

+
+ + Algorithm Overview +

This algorithm is a two-step process: the digitized signal is first + passed through an encoding operation, such as delta encoding, to + de-correlate the data and prepare it for the second step of Rice + coding + (Rice + & Plaunt, 1971). These methods were chosen for this + compression routine specifically for their simplicity, throughput, and + storage efficiency. They also do not require a significant amount of + additional information to be stored alongside the compressed data in + order for the decompression routine to function, which improves + storage efficiency further.

+ + Rice Coding +

Rice coding functions by encoding a value + + + x + in 2 pieces: + + q, + the result of a division by a tunable parameter + + + m, + and + + r, + the remainder of that division. + + q + is stored in Unary coding, with + + r + in truncated binary. In this routine, signed values are handled by + interleaving positive and negative values as follows: + + + x=2*x + for + =0]]> + x>=0 + and + + x=2|x|1 + for + + x<0. + Rice coding is used instead of the more general Golomb coding + (Golomb, + 1966) because the restriction to powers of + + + 2 + for + + m + allows for more efficient calculations. For information about the + optimization of + + m, + see + Optimization. + In the case that + =8]]> + q>=8, + the output will be + + q=8 + followed by the original number in 16-bit signed representation. + This is done to ensure that the amount a value can fail to be + compressed is fixed. The outputs from this method are packed + sequentially into 32 bit containers ensuring that no bits are wasted + for any containers but the last one for a dataset.

+ +

A demonstration of rice coding and bit packing when + writing + + x=2 + and + + x=25 + with + + m=8 + for a + + 8 + bit output container with a 16 bit temporary cache. Any remaining + data in the temporary buffer is retained for the next write of + + + x, + or output at the end of the compression when no more values of + + + x + are provided.

+ +
+
+ + Preparatory Encoding +

Preparatory encoding is done to adjust the dataset to a form more + optimal for Rice Coding. By default, this is done with delta + encoding, which stores the difference between subsequent values. The + image below shows an example of this when applied to a signal from + the Nab experiment. A simple optimization routine for determining + the ideal filter is discussed in + Optimization.

+ +

Left: A waveform before and after delta encoding. + Applying Rice coding with + + m=8 + on the original signal expands the size of the waveform from 14 kB + to 18.2 kB. The same Rice coding operation on the delta encoded + waveform compresses the waveform to 4.6 kB, 33% the original size. + Right: A histogram of a sample dataset before and after delta + encoding. Note the clear reduction in the distribution width and + that the most probable values are centered around 0.

+ +
+
+
+ + Implementation +

Delta-Rice is accessible to users through the HDF5 library + (The + HDF Group et al., 2020) as filter ID + + + 32025. + The user can specify + + m, + the encoding filter, and the length of the smallest axis of the data + being stored + + l. + If + + l + is specified and OpenMP + (Dagum + & Menon, 1998) is available, then the algorithm will + utilize multiple threads to compress/decompress the data. Note that + datasets written in parallel can be read by either serial or parallel + decoding operations, but a dataset written serially will be read + serially unless + + l + was specified. For performance information and a discussion of using + this routine on GPUs and FPGAs, see + Performance.

+
+ + Acknowledgements +

This research was sponsored by the U.S. Department of Energy (DOE), + Office of Science, Office of Nuclear Physics [contracts + DE-AC05-00OR22725, DE-SC0014622, DE-FG02-03ER41258] and National + Science Foundation (NSF) [award PHY-1812367]. This research was also + sponsored by the U.S. Department of Energy, Office of Science, Office + of Workforce Development for Teachers and Scientists (WDTS) Graduate + Student Research (SCGSR) program. This research was supported in part + through research cyberinfrastructure resources and services provided + by the Partnership for an Advanced Computing Environment (PACE) at the + Georgia Institute of Technology, Atlanta, Georgia, USA.

+
+ + + + + + + + SchaperD. C. + AutonC. + Barrón-PalosL. + BorregoM. + ChavezA. + ColeL. + CrawfordC. B. + CuroleJ. + DhahriH. + DickersonK. A. + DoskowJ. + FoxW. + GervaisM. H. + GoodsonB. M. + KnickerbockerK. + JiangC. + KingP. M. + LuH. + MockoM. + Olivera-VelardeD. + Otero MunozJ. G. + PenttiläS. I. + Pérez-MartínA. + ShortB. + SnowW. M. + SteffenK. + VanderwerpJ. + VisserG. + + A modular apparatus for use in high-precision measurements of parity violation in polarized eV neutron transmission + Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment + 2020 + 969 + 0168-9002 + 10.1016/j.nima.2020.163961 + 163961 + + + + + + + GolombS. + + Run-length encodings (corresp.) + IEEE Transactions on Information Theory + 1966 + 12 + 3 + 10.1109/TIT.1966.1053907 + 399 + 401 + + + + + + RiceR. + PlauntJ. + + Adaptive variable-length coding for efficient compression of spacecraft television data + IEEE Transactions on Communication Technology + 1971 + 19 + 6 + 10.1109/TCOM.1971.1090789 + 889 + 897 + + + + + + MasuiK. + AmiriM. + ConnorL. + DengM. + FandinoM. + HöferC. + HalpernM. + HannaD. + HincksA. D. + HinshawG. + ParraJ. M. + NewburghL. B. + ShawJ. R. + VanderlindeK. + + A compression scheme for radio data in high performance computing + Astronomy and Computing + 2015 + 12 + 2213-1337 + 10.1016/j.ascom.2015.07.002 + 181 + 190 + + + + + + The HDF GroupNone + KoziolQuincey + ScienceUSDOE Office of + + HDF5-version 1.12.0 + 202002 + https://www.osti.gov/biblio/1631295 + 10.11578/dc.20180330.1 + + + + + + MathewsDavid + + High performance data acquisition and analysis routines for the Nab experiment + University of Kentucky + 2022 + 10.13023/etd.2022.446 + + + + + + FryJ. + AlarconR. + BaeßlerS. + BalascutaS. + PalosL. Barrón + BaileyT. + BassK. + BirgeN. + BloseA. + BorissenkoD. + BowmanJ. D. + BroussardL. J. + BryantA. T. + ByrneJ. + CalarcoJ. R. + CaylorJ. + ChangK. + ChuppT. + CiancioloT. V. + CrawfordC. + DingX. + DoyleM. + FanW. + FarrarW. + FominN. + FrležE. + GerickeM. T. + GervaisM. + GlückF. + GreeneG. L. + GrzywaczR. K. + GudkovV. + HamblenJ. + HayesC. + HendrusC. + ItoT. + JezghaniA. + LiH. + MakelaM. + MacsaiN. + MammeiJ. + MammeiR. + MartinezM. + MatthewsD. G. + McCreaM. + McGaugheyP. + McLaughlinC. D. + MuellerP. + PettenD. van + PenttiläS. I. + PerrymanD. E. + PickerR. + PierceJ. + PočanićD. + QianY. + RamseyJ. + RandallG. + RileyG. + RykaczewskiK. P. + Salas-BacciA. + SamieiS. + ScottE. M. + SheltonT. + SjueS. K. + SmithA. + SmithE. + StevensE. + WexlerJ. + WhiteheadR. + WilburnW. S. + YoungA. + ZeckB. + + The Nab experiment: A precision measurement of unpolarized neutron beta decay + EPJ Web of Conferences + + JenkeT. + DegenkolbS. + GeltenbortP. + JentschelM. + NesvizhevskyV. V. + RebreyendD. + RocciaS. + SoldnerT. + StutzA. + ZimmerO. + + EDP Sciences + 2019 + 219 + 10.1051/epjconf/201921904002 + 04002 + + + + + + + DagumLeonardo + MenonRamesh + + OpenMP: An industry standard API for shared-memory programming + Computational Science & Engineering, IEEE + IEEE + 1998 + 5 + 1 + 10.1109/99.660313 + 46 + 55 + + + + + + AhmedM. W. + AlarconR. + AleksandrovaA. + BaeßlerS. + Barron-PalosL. + BartoszekL. M. + BeckD. H. + BehzadipourM. + BerkutovI. + BessuilleJ. + BlatnikM. + BroeringM. + BroussardL. J. + BuschM. + CarrR. + CiancioloV. + ClaytonS. M. + CooperM. D. + CrawfordC. + CurrieS. A. + DaurerC. + DipertR. + DowK. + DuttaD. + EfremenkoY. + EricksonC. B. + FilipponeB. W. + FominN. + GaoH. + GolubR. + GouldC. R. + GreeneG. + HaaseD. G. + HasellD. + HawariA. I. + HaydenM. E. + HolleyA. + HoltR. J. + HuffmanP. R. + IhloffE. + ImamS. K. + ItoT. M. + KarczM. + KelseyJ. + KendellenD. P. + KimY. J. + KorobkinaE. + KorschW. + LamoreauxS. K. + LeggettE. + LeungK. K. H. + LipmanA. + LiuC. Y. + LongJ. + MacDonaldS. W. T. + MakelaM. + MatlashovA. + MaxwellJ. D. + MendenhallM. + MeyerH. O. + MilnerR. G. + MuellerP. E. + NouriN. + O’ShaughnessyC. M. + OsthelderC. + PengJ. C. + PenttilaS. I. + PhanN. S. + PlasterB. + RamseyJ. C. + RaoT. M. + RedwineR. P. + ReidA. + SaftahA. + SeidelG. M. + SilveraI. + SlutskyS. + SmithE. + SnowW. M. + SondheimW. + SosothikulS. + StanislausT. D. S. + SunX. + SwankC. M. + TangZ. + DinaniR. Tavakoli + TsentalovichE. + VidalC. + WeiW. + WhiteC. R. + WilliamsonS. E. + YangL. + YaoW. + YoungA. R. + + A new cryogenic apparatus to search for the neutron electric dipole moment + Journal of Instrumentation + 201911 + 14 + 11 + https://dx.doi.org/10.1088/1748-0221/14/11/P11017 + 10.1088/1748-0221/14/11/P11017 + P11017 + + + + + +
diff --git a/joss.06598/paper.jats/61948a2cdcc3d5f960e9337376a0aca8147bd7fd.png b/joss.06598/paper.jats/61948a2cdcc3d5f960e9337376a0aca8147bd7fd.png new file mode 100644 index 0000000000..2d4184100a Binary files /dev/null and b/joss.06598/paper.jats/61948a2cdcc3d5f960e9337376a0aca8147bd7fd.png differ diff --git a/joss.06598/paper.jats/a9770336c02725b3820d8f20bc9c97235b02cf1c.png b/joss.06598/paper.jats/a9770336c02725b3820d8f20bc9c97235b02cf1c.png new file mode 100644 index 0000000000..06f651e6fd Binary files /dev/null and b/joss.06598/paper.jats/a9770336c02725b3820d8f20bc9c97235b02cf1c.png differ