From 81cb9899ea12a6b28ea2920b59176bd2b5bfd4ab Mon Sep 17 00:00:00 2001 From: The Open Journals editorial robot <89919391+editorialbot@users.noreply.github.com> Date: Sat, 22 Jun 2024 00:52:05 +0100 Subject: [PATCH] Creating 10.21105.joss.06598.jats --- .../paper.jats/10.21105.joss.06598.jats | 820 ++++++++++++++++++ 1 file changed, 820 insertions(+) create mode 100644 joss.06598/paper.jats/10.21105.joss.06598.jats diff --git a/joss.06598/paper.jats/10.21105.joss.06598.jats b/joss.06598/paper.jats/10.21105.joss.06598.jats new file mode 100644 index 0000000000..44886952aa --- /dev/null +++ b/joss.06598/paper.jats/10.21105.joss.06598.jats @@ -0,0 +1,820 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6598 +10.21105/joss.06598 + +Delta-Rice: A HDF5 Compression Plugin optimized for +Digitized Detector Data + + + +https://orcid.org/0000-0002-4897-4379 + +Mathews +D. G. + + + +* + + +https://orcid.org/0000-0002-1932-4334 + +Crawford +C. B. + + + + +https://orcid.org/0000-0001-7732-9873 + +Baeßler +S. + + + + + +https://orcid.org/0000-0003-1894-5494 + +Birge +N. + + + + +https://orcid.org/0000-0001-9182-2808 + +Broussard +L. J. + + + + +https://orcid.org/0000-0002-5954-4155 + +Gonzalez +F. + + + + +https://orcid.org/0000-0002-9471-0964 + +Hayen +L. + + + + + + +https://orcid.org/0000-0002-4302-4227 + +Jezghani +A. + + + + + +https://orcid.org/0000-0003-3726-9663 + +Li +H. + + + + +https://orcid.org/0009-0005-3481-4832 + +Mammei +R. + + + + + +https://orcid.org/0000-0002-4847-2133 + +Mendelsohn +A. + + + + +https://orcid.org/0000-0002-9713-8465 + +Randall +G. + + + + +https://orcid.org/0000-0001-7323-8448 + +Riley +G. V. + + + + +https://orcid.org/0000-0002-6219-650X + +Schaper +D. C. + + + + + + +Oak Ridge National Laboratory, Oak Ridge, TN, +USA + + + + +Department of Physics and Astronomy, University of +Kentucky, Lexington, KY, USA + + + + +Department of Physics, University of Virginia, +Charlottesvile, VA, USA + + + + +Department of Physics, University of Tennessee, Knoxville, +TN, USA + + + + +Department of Physics, North Carolina State University, +Raleigh, NC, USA + + + + +Triangle Universities Nuclear Laboratory, Durham, NC, +USA + + + + +Normandie University, Rouen, France + + + + +Georgia Institute of Technology, Atlanta, GA, +USA + + + + +University of Manitoba, Winnipeg, Canada + + + + +University of Winnipeg, Winnipeg, Canada + + + + +Arizona State University, Tempe, AZ, USA + + + + +Los Alamos National Laboratory, Los Alamos, NM, +USA + + + + +* E-mail: + + +3 +10 +2023 + +9 +98 +6598 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +h5py +HDF5 +compression +digitization +GPU + + + + + + Summary +

Delta-Rice is an HDF5 + (The + HDF Group et al., 2020) filter plugin that was developed to + compress digitized detector signals recorded by the Nab experiment + (Fry et + al., 2019), a fundamental neutron physics experiment. This is a + two-step process where incoming data is passed through a + pre-processing filter and then compressed with Rice coding. A routine + for determining the optimal pre-processing filter for a dataset is + provided along with an example GPU deployment. When applied to data + collected by the Nab data acquisition system, this method produced + output files 29% their initial size, and was able to do so with an + average read/write throughput in excess of 2 GB/s on a single CPU. + Compared to the widely used Gzip compression routine, Delta-Rice + reduces the file size by 33% more with over an order of magnitude + increase in read/write throughput. Delta-Rice is available on CPU to + users through the HDF5 library.

+
+ + Statement of Need +

Many modern nuclear physics experiments, such as the Nab + experiment, will produce petabytes of data. The cost and complexity of + storing such a datasets motivated the development of a compression + routine tailored specifically to the type of signals commonly recorded + in these experiments. In these experiments, any compression routine + must be fast enough to support real-time compression while also being + lossless to prevent any reduction in the precision of offline + analysis. Additionally, any candidate routine must be easily + accessible to the various members of the collaboration and should not + restrict users to a particular programming language to allow for a + variety of analysis methods. `Delta-Rice’ was designed to meet these + requirements and was implemented as an HDF5 filter plugin to ensure + that each user can easily access data with minimal additional + requirements in multiple programming languages + (Mathews, + 2022). While many other filter plugins exist for HDF5 files, + such as Bitshuffle + (Masui + et al., 2015) and Gzip, Delta-Rice offers improved throughput + and reduction in data size for many experimental efforts such as the + Nab, NOPTREX + (Schaper + et al., 2020), and nEDM@SNS + (Ahmed + et al., 2019) efforts.

+
+ + Algorithm Overview +

This algorithm is a two-step process: the digitized signal is first + passed through an encoding operation, such as delta encoding, to + de-correlate the data and prepare it for the second step of Rice + coding + (Rice + & Plaunt, 1971). These methods were chosen for this + compression routine specifically for their simplicity, throughput, and + storage efficiency. They also do not require a significant amount of + additional information to be stored alongside the compressed data in + order for the decompression routine to function, which improves + storage efficiency further.

+ + Rice Coding +

Rice coding functions by encoding a value + + + x + in 2 pieces: + + q, + the result of a division by a tunable parameter + + + m, + and + + r, + the remainder of that division. + + q + is stored in Unary coding, with + + r + in truncated binary. In this routine, signed values are handled by + interleaving positive and negative values as follows: + + + x=2*x + for + =0]]> + x>=0 + and + + x=2|x|1 + for + + x<0. + Rice coding is used instead of the more general Golomb coding + (Golomb, + 1966) because the restriction to powers of + + + 2 + for + + m + allows for more efficient calculations. For information about the + optimization of + + m, + see + Optimization. + In the case that + =8]]> + q>=8, + the output will be + + q=8 + followed by the original number in 16-bit signed representation. + This is done to ensure that the amount a value can fail to be + compressed is fixed. The outputs from this method are packed + sequentially into 32 bit containers ensuring that no bits are wasted + for any containers but the last one for a dataset.

+ +

A demonstration of rice coding and bit packing when + writing + + x=2 + and + + x=25 + with + + m=8 + for a + + 8 + bit output container with a 16 bit temporary cache. Any remaining + data in the temporary buffer is retained for the next write of + + + x, + or output at the end of the compression when no more values of + + + x + are provided.

+ +
+
+ + Preparatory Encoding +

Preparatory encoding is done to adjust the dataset to a form more + optimal for Rice Coding. By default, this is done with delta + encoding, which stores the difference between subsequent values. The + image below shows an example of this when applied to a signal from + the Nab experiment. A simple optimization routine for determining + the ideal filter is discussed in + Optimization.

+ +

Left: A waveform before and after delta encoding. + Applying Rice coding with + + m=8 + on the original signal expands the size of the waveform from 14 kB + to 18.2 kB. The same Rice coding operation on the delta encoded + waveform compresses the waveform to 4.6 kB, 33% the original size. + Right: A histogram of a sample dataset before and after delta + encoding. Note the clear reduction in the distribution width and + that the most probable values are centered around 0.

+ +
+
+
+ + Implementation +

Delta-Rice is accessible to users through the HDF5 library + (The + HDF Group et al., 2020) as filter ID + + + 32025. + The user can specify + + m, + the encoding filter, and the length of the smallest axis of the data + being stored + + l. + If + + l + is specified and OpenMP + (Dagum + & Menon, 1998) is available, then the algorithm will + utilize multiple threads to compress/decompress the data. Note that + datasets written in parallel can be read by either serial or parallel + decoding operations, but a dataset written serially will be read + serially unless + + l + was specified. For performance information and a discussion of using + this routine on GPUs and FPGAs, see + Performance.

+
+ + Acknowledgements +

This research was sponsored by the U.S. Department of Energy (DOE), + Office of Science, Office of Nuclear Physics [contracts + DE-AC05-00OR22725, DE-SC0014622, DE-FG02-03ER41258] and National + Science Foundation (NSF) [award PHY-1812367]. This research was also + sponsored by the U.S. Department of Energy, Office of Science, Office + of Workforce Development for Teachers and Scientists (WDTS) Graduate + Student Research (SCGSR) program. This research was supported in part + through research cyberinfrastructure resources and services provided + by the Partnership for an Advanced Computing Environment (PACE) at the + Georgia Institute of Technology, Atlanta, Georgia, USA.

+
+ + + + + + + + SchaperD. C. + AutonC. + Barrón-PalosL. + BorregoM. + ChavezA. + ColeL. + CrawfordC. B. + CuroleJ. + DhahriH. + DickersonK. A. + DoskowJ. + FoxW. + GervaisM. H. + GoodsonB. M. + KnickerbockerK. + JiangC. + KingP. M. + LuH. + MockoM. + Olivera-VelardeD. + Otero MunozJ. G. + PenttiläS. I. + Pérez-MartínA. + ShortB. + SnowW. M. + SteffenK. + VanderwerpJ. + VisserG. + + A modular apparatus for use in high-precision measurements of parity violation in polarized eV neutron transmission + Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment + 2020 + 969 + 0168-9002 + 10.1016/j.nima.2020.163961 + 163961 + + + + + + + GolombS. + + Run-length encodings (corresp.) + IEEE Transactions on Information Theory + 1966 + 12 + 3 + 10.1109/TIT.1966.1053907 + 399 + 401 + + + + + + RiceR. + PlauntJ. + + Adaptive variable-length coding for efficient compression of spacecraft television data + IEEE Transactions on Communication Technology + 1971 + 19 + 6 + 10.1109/TCOM.1971.1090789 + 889 + 897 + + + + + + MasuiK. + AmiriM. + ConnorL. + DengM. + FandinoM. + HöferC. + HalpernM. + HannaD. + HincksA. D. + HinshawG. + ParraJ. M. + NewburghL. B. + ShawJ. R. + VanderlindeK. + + A compression scheme for radio data in high performance computing + Astronomy and Computing + 2015 + 12 + 2213-1337 + 10.1016/j.ascom.2015.07.002 + 181 + 190 + + + + + + The HDF GroupNone + KoziolQuincey + ScienceUSDOE Office of + + HDF5-version 1.12.0 + 202002 + https://www.osti.gov/biblio/1631295 + 10.11578/dc.20180330.1 + + + + + + MathewsDavid + + High performance data acquisition and analysis routines for the Nab experiment + University of Kentucky + 2022 + 10.13023/etd.2022.446 + + + + + + FryJ. + AlarconR. + BaeßlerS. + BalascutaS. + PalosL. Barrón + BaileyT. + BassK. + BirgeN. + BloseA. + BorissenkoD. + BowmanJ. D. + BroussardL. J. + BryantA. T. + ByrneJ. + CalarcoJ. R. + CaylorJ. + ChangK. + ChuppT. + CiancioloT. V. + CrawfordC. + DingX. + DoyleM. + FanW. + FarrarW. + FominN. + FrležE. + GerickeM. T. + GervaisM. + GlückF. + GreeneG. L. + GrzywaczR. K. + GudkovV. + HamblenJ. + HayesC. + HendrusC. + ItoT. + JezghaniA. + LiH. + MakelaM. + MacsaiN. + MammeiJ. + MammeiR. + MartinezM. + MatthewsD. G. + McCreaM. + McGaugheyP. + McLaughlinC. D. + MuellerP. + PettenD. van + PenttiläS. I. + PerrymanD. E. + PickerR. + PierceJ. + PočanićD. + QianY. + RamseyJ. + RandallG. + RileyG. + RykaczewskiK. P. + Salas-BacciA. + SamieiS. + ScottE. M. + SheltonT. + SjueS. K. + SmithA. + SmithE. + StevensE. + WexlerJ. + WhiteheadR. + WilburnW. S. + YoungA. + ZeckB. + + The Nab experiment: A precision measurement of unpolarized neutron beta decay + EPJ Web of Conferences + + JenkeT. + DegenkolbS. + GeltenbortP. + JentschelM. + NesvizhevskyV. V. + RebreyendD. + RocciaS. + SoldnerT. + StutzA. + ZimmerO. + + EDP Sciences + 2019 + 219 + 10.1051/epjconf/201921904002 + 04002 + + + + + + + DagumLeonardo + MenonRamesh + + OpenMP: An industry standard API for shared-memory programming + Computational Science & Engineering, IEEE + IEEE + 1998 + 5 + 1 + 10.1109/99.660313 + 46 + 55 + + + + + + AhmedM. W. + AlarconR. + AleksandrovaA. + BaeßlerS. + Barron-PalosL. + BartoszekL. M. + BeckD. H. + BehzadipourM. + BerkutovI. + BessuilleJ. + BlatnikM. + BroeringM. + BroussardL. J. + BuschM. + CarrR. + CiancioloV. + ClaytonS. M. + CooperM. D. + CrawfordC. + CurrieS. A. + DaurerC. + DipertR. + DowK. + DuttaD. + EfremenkoY. + EricksonC. B. + FilipponeB. W. + FominN. + GaoH. + GolubR. + GouldC. R. + GreeneG. + HaaseD. G. + HasellD. + HawariA. I. + HaydenM. E. + HolleyA. + HoltR. J. + HuffmanP. R. + IhloffE. + ImamS. K. + ItoT. M. + KarczM. + KelseyJ. + KendellenD. P. + KimY. J. + KorobkinaE. + KorschW. + LamoreauxS. K. + LeggettE. + LeungK. K. H. + LipmanA. + LiuC. Y. + LongJ. + MacDonaldS. W. T. + MakelaM. + MatlashovA. + MaxwellJ. D. + MendenhallM. + MeyerH. O. + MilnerR. G. + MuellerP. E. + NouriN. + O’ShaughnessyC. M. + OsthelderC. + PengJ. C. + PenttilaS. I. + PhanN. S. + PlasterB. + RamseyJ. C. + RaoT. M. + RedwineR. P. + ReidA. + SaftahA. + SeidelG. M. + SilveraI. + SlutskyS. + SmithE. + SnowW. M. + SondheimW. + SosothikulS. + StanislausT. D. S. + SunX. + SwankC. M. + TangZ. + DinaniR. Tavakoli + TsentalovichE. + VidalC. + WeiW. + WhiteC. R. + WilliamsonS. E. + YangL. + YaoW. + YoungA. R. + + A new cryogenic apparatus to search for the neutron electric dipole moment + Journal of Instrumentation + 201911 + 14 + 11 + https://dx.doi.org/10.1088/1748-0221/14/11/P11017 + 10.1088/1748-0221/14/11/P11017 + P11017 + + + + + +