diff --git a/joss.06372/10.21105.joss.06372.crossref.xml b/joss.06372/10.21105.joss.06372.crossref.xml
new file mode 100644
index 0000000000..cc000669cc
--- /dev/null
+++ b/joss.06372/10.21105.joss.06372.crossref.xml
@@ -0,0 +1,555 @@
+
+
+
+ 20240506T171906-f9de69fcc0c18123fd7efa3ff6c28cf2b5477e00
+ 20240506171906
+
+ JOSS Admin
+ admin@theoj.org
+
+ The Open Journal
+
+
+
+
+ Journal of Open Source Software
+ JOSS
+ 2475-9066
+
+ 10.21105/joss
+ https://joss.theoj.org
+
+
+
+
+ 05
+ 2024
+
+
+ 9
+
+ 97
+
+
+
+ CalibrateEmulateSample.jl: Accelerated Parametric
+Uncertainty Quantification
+
+
+
+ Oliver R. A.
+ Dunbar
+ https://orcid.org/0000-0001-7374-0382
+
+
+ Melanie
+ Bieli
+
+
+ Alfredo
+ Garbuno-Iñigo
+ https://orcid.org/0000-0003-3279-619X
+
+
+ Michael
+ Howland
+ https://orcid.org/0000-0002-2878-3874
+
+
+ Andre Nogueira
+ de Souza
+ https://orcid.org/0000-0002-9906-7824
+
+
+ Laura Anne
+ Mansfield
+ https://orcid.org/0000-0002-6285-6045
+
+
+ Gregory L.
+ Wagner
+ https://orcid.org/0000-0001-5317-2445
+
+
+ N.
+ Efrat-Henrici
+
+
+
+ 05
+ 06
+ 2024
+
+
+ 6372
+
+
+ 10.21105/joss.06372
+
+
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+
+
+
+ Software archive
+ 10.5281/zenodo.10946875
+
+
+ GitHub review issue
+ https://github.com/openjournals/joss-reviews/issues/6372
+
+
+
+ 10.21105/joss.06372
+ https://joss.theoj.org/papers/10.21105/joss.06372
+
+
+ https://joss.theoj.org/papers/10.21105/joss.06372.pdf
+
+
+
+
+
+ Julia: A fresh approach to numerical
+computing
+ Bezanson
+ SIAM Review
+ 1
+ 59
+ 10.1137/141000671
+ 2017
+ Bezanson, J., Edelman, A., Karpinski,
+S., & Shah, V. B. (2017). Julia: A fresh approach to numerical
+computing. SIAM Review, 59(1), 65–98.
+https://doi.org/10.1137/141000671
+
+
+ High-Dimensional ABC
+ Nott
+ Handbook of Approximate Bayesian
+Computation
+ 10.1201/9781315117195-8
+ 978-1-315-11719-5
+ 2018
+ Nott, D. J., Ong, V. M.-H., Fan, Y.,
+& Sisson, S. A. (2018). High-Dimensional ABC. In Handbook of
+Approximate Bayesian Computation (pp. 211–241). CRC Press.
+https://doi.org/10.1201/9781315117195-8
+
+
+ Calibrate, emulate, sample
+ Cleary
+ Journal of Computational
+Physics
+ 424
+ 10.1016/j.jcp.2020.109716
+ 0021-9991
+ 2021
+ Cleary, E., Garbuno-Inigo, A., Lan,
+S., Schneider, T., & Stuart, A. M. (2021). Calibrate, emulate,
+sample. Journal of Computational Physics, 424, 109716.
+https://doi.org/10.1016/j.jcp.2020.109716
+
+
+ EnsembleKalmanProcesses.jl: Derivative-free
+ensemble-based model calibration
+ Dunbar
+ Journal of Open Source
+Software
+ 80
+ 7
+ 10.21105/joss.04869
+ 2022
+ Dunbar, O. R. A., Lopez-Gomez, I.,
+Garbuno-Iñigo, A. G.-I., Huang, D. Z., Bach, E., & Wu, J. (2022).
+EnsembleKalmanProcesses.jl: Derivative-free ensemble-based model
+calibration. Journal of Open Source Software, 7(80), 4869.
+https://doi.org/10.21105/joss.04869
+
+
+ An efficient Bayesian approach to learning
+droplet collision kernels: Proof of concept using “Cloudy,” a new
+n-moment bulk microphysics scheme
+ Bieli
+ Journal of Advances in Modeling Earth
+Systems
+ 8
+ 14
+ 10.1029/2022MS002994
+ 2022
+ Bieli, M., Dunbar, O. R. A., Jong, E.
+K. de, Jaruga, A., Schneider, T., & Bischoff, T. (2022). An
+efficient Bayesian approach to learning droplet collision kernels: Proof
+of concept using “Cloudy,” a new n-moment bulk microphysics scheme.
+Journal of Advances in Modeling Earth Systems, 14(8), e2022MS002994.
+https://doi.org/10.1029/2022MS002994
+
+
+ Supervised calibration and uncertainty
+quantification of subgrid closure parameters using ensemble Kalman
+inversion
+ Hillier
+ 1721.1/145140
+ 2022
+ Hillier, A. (2022). Supervised
+calibration and uncertainty quantification of subgrid closure parameters
+using ensemble Kalman inversion [Master’s thesis, Massachusetts
+Institute of Technology. Department of Electrical Engineering; Computer
+Science]. https://doi.org/1721.1/145140
+
+
+ Gaussian processes for machine
+learning
+ Williams
+ 2
+ 10.1142/S0129065704001899
+ 2006
+ Williams, C. K., & Rasmussen, C.
+E. (2006). Gaussian processes for machine learning (Vol. 2). MIT press
+Cambridge, MA.
+https://doi.org/10.1142/S0129065704001899
+
+
+ Ensemble kalman methods for inverse
+problems
+ Iglesias
+ Inverse Problems
+ 4
+ 29
+ 10.1088/0266-5611/29/4/045001
+ 2013
+ Iglesias, M. A., Law, K. J., &
+Stuart, A. M. (2013). Ensemble kalman methods for inverse problems.
+Inverse Problems, 29(4), 045001.
+https://doi.org/10.1088/0266-5611/29/4/045001
+
+
+ Random features for large-scale kernel
+machines.
+ Rahimi
+ NIPS
+ 3
+ 2007
+ Rahimi, A., Recht, B., & others.
+(2007). Random features for large-scale kernel machines. NIPS, 3, 5.
+https://proceedings.neurips.cc/paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf
+
+
+ Uniform approximation of functions with
+random bases
+ Rahimi
+ 2008 46th annual allerton conference on
+communication, control, and computing
+ 10.1109/allerton.2008.4797607
+ 2008
+ Rahimi, A., & Recht, B. (2008).
+Uniform approximation of functions with random bases. 2008 46th Annual
+Allerton Conference on Communication, Control, and Computing, 555–561.
+https://doi.org/10.1109/allerton.2008.4797607
+
+
+ Random features for kernel approximation: A
+survey on algorithms, theory, and beyond
+ Liu
+ IEEE Transactions on Pattern Analysis and
+Machine Intelligence
+ 10
+ 44
+ 10.1109/TPAMI.2021.3097011
+ 2022
+ Liu, F., Huang, X., Chen, Y., &
+Suykens, J. A. K. (2022). Random features for kernel approximation: A
+survey on algorithms, theory, and beyond. IEEE Transactions on Pattern
+Analysis and Machine Intelligence, 44(10), 7128–7148.
+https://doi.org/10.1109/TPAMI.2021.3097011
+
+
+ MCMC Methods for Functions: Modifying Old
+Algorithms to Make Them Faster
+ Cotter
+ Statistical Science
+ 3
+ 28
+ 10.1214/13-STS421
+ 2013
+ Cotter, S. L., Roberts, G. O.,
+Stuart, A. M., & White, D. (2013). MCMC Methods for Functions:
+Modifying Old Algorithms to Make Them Faster. Statistical Science,
+28(3), 424–446.
+https://doi.org/10.1214/13-STS421
+
+
+ The random walk metropolis: Linking theory
+and practice through a case study
+ Sherlock
+ Statistical Science
+ 2
+ 25
+ 10.1214/10-sts327
+ 2010
+ Sherlock, C., Fearnhead, P., &
+Roberts, G. O. (2010). The random walk metropolis: Linking theory and
+practice through a case study. Statistical Science, 25(2), 172–190.
+https://doi.org/10.1214/10-sts327
+
+
+ Calibration and uncertainty quantification of
+convective parameters in an idealized GCM
+ Dunbar
+ Journal of Advances in Modeling Earth
+Systems
+ 9
+ 13
+ 10.1029/2020MS002454
+ 2021
+ Dunbar, O. R. A., Garbuno-Inigo, A.,
+Schneider, T., & Stuart, A. M. (2021). Calibration and uncertainty
+quantification of convective parameters in an idealized GCM. Journal of
+Advances in Modeling Earth Systems, 13(9), e2020MS002454.
+https://doi.org/10.1029/2020MS002454
+
+
+ Parameter uncertainty quantification in an
+idealized GCM with a seasonal cycle
+ Howland
+ Journal of Advances in Modeling Earth
+Systems
+ 3
+ 14
+ 10.1029/2021MS002735
+ 2022
+ Howland, M. F., Dunbar, O. R. A.,
+& Schneider, T. (2022). Parameter uncertainty quantification in an
+idealized GCM with a seasonal cycle. Journal of Advances in Modeling
+Earth Systems, 14(3), e2021MS002735.
+https://doi.org/10.1029/2021MS002735
+
+
+ Ensemble-based experimental design for
+targeting data acquisition to inform climate models
+ Dunbar
+ Journal of Advances in Modeling Earth
+Systems
+ 9
+ 14
+ 10.1029/2022MS002997
+ 2022
+ Dunbar, O. R. A., Howland, M. F.,
+Schneider, T., & Stuart, A. M. (2022). Ensemble-based experimental
+design for targeting data acquisition to inform climate models. Journal
+of Advances in Modeling Earth Systems, 14(9), e2022MS002997.
+https://doi.org/10.1029/2022MS002997
+
+
+ Scikit-learn: Machine learning in
+Python
+ Pedregosa
+ Journal of Machine Learning
+Research
+ 12
+ 2011
+ Pedregosa, F., Varoquaux, G.,
+Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
+Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
+Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011).
+Scikit-learn: Machine learning in Python. Journal of Machine Learning
+Research, 12, 2825–2830.
+
+
+ GaussianProcesses. Jl: A nonparametric bayes
+package for the julia language
+ Fairbrother
+ Journal of Statistical
+Software
+ 102
+ 10.18637/jss.v102.i01
+ 2022
+ Fairbrother, J., Nemeth, C.,
+Rischard, M., Brea, J., & Pinder, T. (2022). GaussianProcesses. Jl:
+A nonparametric bayes package for the julia language. Journal of
+Statistical Software, 102, 1–36.
+https://doi.org/10.18637/jss.v102.i01
+
+
+ GlobalSensitivity.jl: Performant and parallel
+global sensitivity analysis with julia
+ Dixit
+ Journal of Open Source
+Software
+ 76
+ 7
+ 10.21105/joss.04561
+ 2022
+ Dixit, V. K., & Rackauckas, C.
+(2022). GlobalSensitivity.jl: Performant and parallel global sensitivity
+analysis with julia. Journal of Open Source Software, 7(76), 4561.
+https://doi.org/10.21105/joss.04561
+
+
+ Affine invariant interacting Langevin
+dynamics for Bayesian inference
+ Garbuno-Inigo
+ SIAM Journal on Applied Dynamical
+Systems
+ 3
+ 19
+ 10.1137/19M1304891
+ 2020
+ Garbuno-Inigo, A., Nüsken, N., &
+Reich, S. (2020). Affine invariant interacting Langevin dynamics for
+Bayesian inference. SIAM Journal on Applied Dynamical Systems, 19(3),
+1633–1658. https://doi.org/10.1137/19M1304891
+
+
+ GpABC: a Julia package for approximate
+Bayesian computation with Gaussian process emulation
+ Tankhilevich
+ Bioinformatics
+ 10.1093/bioinformatics/btaa078
+ 1367-4803
+ 2020
+ Tankhilevich, E., Ish-Horowicz, J.,
+Hameed, T., Roesch, E., Kleijn, I., Stumpf, M. P. H., & He, F.
+(2020). GpABC: a Julia package for approximate Bayesian computation with
+Gaussian process emulation. Bioinformatics.
+https://doi.org/10.1093/bioinformatics/btaa078
+
+
+ Efficient derivative-free bayesian inference
+for large-scale inverse problems
+ Huang
+ Inverse Problems
+ 12
+ 38
+ 10.1088/1361-6420/ac99fa
+ 2022
+ Huang, D. Z., Huang, J., Reich, S.,
+& Stuart, A. M. (2022). Efficient derivative-free bayesian inference
+for large-scale inverse problems. Inverse Problems, 38(12), 125006.
+https://doi.org/10.1088/1361-6420/ac99fa
+
+
+ Calibration and uncertainty quantification of
+a gravity wave parameterization: A case study of the Quasi-Biennial
+Oscillation in an intermediate complexity climate model
+ Mansfield
+ Journal of Advances in Modeling Earth
+Systems
+ 11
+ 14
+ 10.1029/2022MS003245
+ 1942-2466
+ 2022
+ Mansfield, L. A., & Sheshadri, A.
+(2022). Calibration and uncertainty quantification of a gravity wave
+parameterization: A case study of the Quasi-Biennial Oscillation in an
+intermediate complexity climate model. Journal of Advances in Modeling
+Earth Systems, 14(11).
+https://doi.org/10.1029/2022MS003245
+
+
+ Bayesian history matching applied to the
+calibration of a gravity wave parameterization
+ King
+ 10.22541/essoar.170365299.96491153/v1
+ 2023
+ King, R. C., Mansfield, L. A., &
+Sheshadri, A. (2023). Bayesian history matching applied to the
+calibration of a gravity wave parameterization [Preprint].
+https://doi.org/10.22541/essoar.170365299.96491153/v1
+
+
+ Equation of state calculations by fast
+computing machines
+ Metropolis
+ The journal of chemical
+physics
+ 6
+ 21
+ 10.1063/1.1699114
+ 1953
+ Metropolis, N., Rosenbluth, A. W.,
+Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of
+state calculations by fast computing machines. The Journal of Chemical
+Physics, 21(6), 1087–1092.
+https://doi.org/10.1063/1.1699114
+
+
+ PyVBMC: Efficient bayesian inference in
+python
+ Huggins
+ Journal of Open Source
+Software
+ 86
+ 8
+ 10.21105/joss.05428
+ 2023
+ Huggins, B., Li, C., Tobaben, M.,
+Aarnos, M. J., & Acerbi, L. (2023). PyVBMC: Efficient bayesian
+inference in python. Journal of Open Source Software, 8(86), 5428.
+https://doi.org/10.21105/joss.05428
+
+
+ Fast and robust bayesian inference using
+gaussian processes with GPry
+ Gammal
+ Journal of Cosmology and Astroparticle
+Physics
+ 10
+ 2023
+ 10.1088/1475-7516/2023/10/021
+ 2023
+ Gammal, J. E., Schöneberg, N.,
+Torrado, J., & Fidler, C. (2023). Fast and robust bayesian inference
+using gaussian processes with GPry. Journal of Cosmology and
+Astroparticle Physics, 2023(10), 021.
+https://doi.org/10.1088/1475-7516/2023/10/021
+
+
+ The barker proposal: Combining robustness and
+efficiency in gradient-based MCMC
+ Livingstone
+ Journal of the Royal Statistical Society
+Series B: Statistical Methodology
+ 2
+ 84
+ 10.1111/rssb.12482
+ 2022
+ Livingstone, S., & Zanella, G.
+(2022). The barker proposal: Combining robustness and efficiency in
+gradient-based MCMC. Journal of the Royal Statistical Society Series B:
+Statistical Methodology, 84(2), 496–523.
+https://doi.org/10.1111/rssb.12482
+
+
+ The no-u-turn sampler: Adaptively setting
+path lengths in hamiltonian monte carlo.
+ Hoffman
+ J. Mach. Learn. Res.
+ 1
+ 15
+ 2014
+ Hoffman, M. D., Gelman, A., &
+others. (2014). The no-u-turn sampler: Adaptively setting path lengths
+in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1),
+1593–1623.
+
+
+
+
+
+
diff --git a/joss.06372/10.21105.joss.06372.jats b/joss.06372/10.21105.joss.06372.jats
new file mode 100644
index 0000000000..f467834b6a
--- /dev/null
+++ b/joss.06372/10.21105.joss.06372.jats
@@ -0,0 +1,1052 @@
+
+
+
+
+
+
+
+Journal of Open Source Software
+JOSS
+
+2475-9066
+
+Open Journals
+
+
+
+6372
+10.21105/joss.06372
+
+CalibrateEmulateSample.jl: Accelerated Parametric
+Uncertainty Quantification
+
+
+
+https://orcid.org/0000-0001-7374-0382
+
+Dunbar
+Oliver R. A.
+
+
+*
+
+
+
+Bieli
+Melanie
+
+
+
+
+https://orcid.org/0000-0003-3279-619X
+
+Garbuno-Iñigo
+Alfredo
+
+
+
+
+https://orcid.org/0000-0002-2878-3874
+
+Howland
+Michael
+
+
+
+
+https://orcid.org/0000-0002-9906-7824
+
+de Souza
+Andre Nogueira
+
+
+
+
+https://orcid.org/0000-0002-6285-6045
+
+Mansfield
+Laura Anne
+
+
+
+
+https://orcid.org/0000-0001-5317-2445
+
+Wagner
+Gregory L.
+
+
+
+
+
+Efrat-Henrici
+N.
+
+
+
+
+
+Geological and Planetary Sciences, California Institute of
+Technology
+
+
+
+
+Swiss Re Ltd.
+
+
+
+
+Department of Statistics, Mexico Autonomous Institute of
+Technology
+
+
+
+
+Civil and Environmental Engineering, Massachusetts
+Institute of Technology
+
+
+
+
+Earth, Atmospheric, and Planetary Sciences, Massachusetts
+Institute of Technology
+
+
+
+
+Earth System Science, Doerr School of Sustainability,
+Stanford University
+
+
+
+
+* E-mail:
+
+
+2
+1
+2024
+
+9
+97
+6372
+
+Authors of papers retain copyright and release the
+work under a Creative Commons Attribution 4.0 International License (CC
+BY 4.0)
+2022
+The article authors
+
+Authors of papers retain copyright and release the work under
+a Creative Commons Attribution 4.0 International License (CC BY
+4.0)
+
+
+
+machine learning
+optimization
+bayesian
+data assimilation
+
+
+
+
+
+ Summary
+
A Julia language
+ (Bezanson
+ et al., 2017) package providing practical and modular
+ implementation of ``Calibrate, Emulate, Sample”
+ (Cleary
+ et al., 2021), hereafter CES, an accelerated workflow for
+ obtaining model parametric uncertainty is presented. This is also
+ known as Bayesian inversion or uncertainty quantification. To apply
+ CES one requires a computer model (written in any programming
+ language) dependent on free parameters, a prior distribution encoding
+ some prior knowledge about the distribution over the free parameters,
+ and some data with which to constrain this prior distribution. The
+ pipeline has three stages, most easily explained in reverse:
+
+
+
The goal of the workflow is to draw samples (Sample) from the
+ Bayesian posterior distribution, that is, the prior distribution
+ conditioned on the observed data,
+
+
+
To accelerate and regularize sampling we train statistical
+ emulators to represent the user-provided parameter-to-data map
+ (Emulate),
+
+
+
The training points for these emulators are generated by the
+ computer model, and selected adaptively around regions of high
+ posterior mass (Calibrate).
+
+
+
We describe CES as an accelerated workflow, as it is often able to
+ use dramatically fewer evaluations of the computer model when compared
+ with applying sampling algorithms, such as Markov chain Monte Carlo
+ (MCMC), directly.
+
+
+
Calibration tools: We recommend choosing adaptive training
+ points with Ensemble Kalman methods such as EKI
+ (Iglesias
+ et al., 2013) and its variants
+ (Huang
+ et al., 2022); and CES provides explicit utilities from the
+ codebase EnsembleKalmanProcesses.jl
+ (Dunbar,
+ Lopez-Gomez, et al., 2022).
+
+
+
Emulation tools: CES integrates any statistical emulator,
+ currently implemented are Gaussian Processes (GP)
+ (Williams
+ & Rasmussen, 2006), explicitly provided through
+ packages SciKitLearn.jl
+ (Pedregosa
+ et al., 2011) and GaussianProcesses.jl
+ (Fairbrother
+ et al., 2022), and Random Features
+ (Liu
+ et al., 2022;
+ Rahimi
+ et al., 2007;
+ Rahimi
+ & Recht, 2008), explicitly provided through
+ RandomFeatures.jl
+ that can provide additional flexibility and scalability,
+ particularly in higher dimensions.
+
+
+
Sampling tools: The regularized and accelerated sampling
+ problem is solved with MCMC, and CES provides the variants of
+ Random Walk Metropolis
+ (Metropolis
+ et al., 1953;
+ Sherlock
+ et al., 2010), and preconditioned Crank-Nicholson
+ (Cotter
+ et al., 2013), using APIs from
+ Turing.jl.
+ Some regular emulator mean functions are differentiable, and
+ including accelerations of derivative-based MCMC into CES, (e.g.,
+ NUTS,
+ Hoffman
+ et al., 2014; Barker,
+ Livingstone
+ & Zanella, 2022); is an active direction of work.
+
+
+
To highlight code accessibility, we also provide a suite of
+ detailed scientifically-inspired examples, with documentation that
+ walks users through some use cases. Such use cases not only
+ demonstrate the capability of the CES pipeline, but also teach users
+ about typical interface and workflow experience.
+
+
+ Statement of need
+
Computationally expensive computer codes for predictive modelling
+ are ubiquitous across science and engineering disciplines. Free
+ parameter values that exist within these modelling frameworks are
+ typically constrained by observations to produce accurate and robust
+ predictions about the system they are approximating numerically. In a
+ Bayesian setting, this is viewed as evolving an initial parameter
+ distribution (based on prior information) with the input of observed
+ data, to a more informative data-consistent distribution (posterior).
+ Unfortunately, this task is intensely computationally expensive,
+ commonly requiring over
+
+ 105
+ evaluations of the expensive computer code (e.g., Random Walk
+ Metropolis), with accelerations relying on intrusive model
+ information, such as a derivative of the parameter-to-data map. CES is
+ able to approximate and accelerate this process in a non-intrusive
+ fashion and requiring only on the order of
+
+
+ 102
+ evaluations of the original computer model. This opens the doors for
+ quantifying parametric uncertainty for a class of numerically
+ intensive computer codes that has previously been unavailable.
+
+
+ State of the field
+
In Julia there are a few tools for performing non-accelerated
+ uncertainty quantification, from classical sensitivity analysis
+ approaches, for example,
+ UncertaintyQuantification.jl,
+ GlobalSensitivity.jl
+ (Dixit
+ & Rackauckas, 2022), and MCMC, for example,
+ Mamba.jl
+ or
+ Turing.jl.
+ For computational efficiency, ensemble methods also provide
+ approximate sampling,
+ (Dunbar,
+ Lopez-Gomez, et al., 2022; e.g., the Ensemble Kalman Sampler
+ Garbuno-Inigo
+ et al., 2020), though these only provide Gaussian
+ approximations of the posterior.
+
Accelerated uncertainty quantification tools also exist for the
+ related approach of Approximate Bayesian Computation (ABC), for
+ example, GpABC
+ (Tankhilevich
+ et al., 2020) or
+ ApproxBayes.jl;
+ these tools both approximately sample from the posterior distribution.
+ In ABC, this approximation comes from bypassing the likelihood that is
+ usually required in sampling methods, such as MCMC. Instead, the goal
+ of ABC is to replace the likelihood with a scalar-valued sampling
+ objective that compares model and data. In CES, the approximation
+ comes from learning the parameter-to-data map, then following this it
+ calculates an explicit likelihood and uses exact sampling via MCMC.
+ Some ABC algorithms also make use of statistical emulators to further
+ accelerate sampling (GpABC). Although flexible, ABC encounters
+ challenges due to the subjectivity of summary statistics and distance
+ metrics, that may lead to approximation errors particularly in
+ high-dimensional settings
+ (Nott
+ et al., 2018). CES is more restrictive due to use of an
+ explicit Gaussian likelihood, but also leverages this structure to
+ deal with high dimensional data.
+
Several other tools are available in other languages for a purpose
+ of accelerated learning of the posterior distribution or posterior
+ sampling. Two such examples, written in Python, approximate the
+ log-posterior distribution directly with a Gaussian process:
+ PyVBMC
+ (Huggins
+ et al., 2023) additionaly uses variational approximations to
+ calculate the normalization constant, and
+ GPry
+ (Gammal
+ et al., 2023), which iteratively trains the GP with an active
+ training point selection algorithm. Such algorithms are distinct from
+ CES, which approximates the parameter-to-data map with the Gaussian
+ process, and advocates ensemble Kalman methods to select training
+ points.
+
+
+ A simple example from the code documentation
+
We sketch an end-to-end example of the pipeline, with
+ fully-detailed walkthrough given in the online documentation.
+
We have a model of a sinusoidal signal that is a function of
+ parameters
+
+ θ=(A,v),
+ where
+
+ A
+ is the amplitude of the signal and
+
+ v
+ is vertical shift of the signal
+
+ f(A,v)=Asin(ϕ+t)+v,∀t∈[0,2π].
+ Here,
+
+ ϕ
+ is the random phase of each signal. The goal is to estimate not just
+ point estimates of the parameters
+
+ θ=(A,v),
+ but entire probability distributions of them, given some noisy
+ observations. We will use the range and mean of a signal as our
+ observable:
+
+ G(θ)=[range(f(θ)),mean(f(θ))]
+ Then, our noisy observations,
+
+ yobs,
+ can be written as:
+
+ yobs=G(θ†)+𝒩(0,Γ)
+ where
+
+ Γ
+ is the observational covariance matrix. We will assume the noise to be
+ independent for each observable, giving us a diagonal covariance
+ matrix.
+
+
The true and observed range and mean.
+
+
+
+
For this experiment
+
+ θ†=(A†,v†)=(3.0,7.0),
+ and the noisy observations are displayed in blue in
+ [fig:signal].
+
We define prior distributions on the two parameters. For the
+ amplitude, we define a prior with mean 2 and standard deviation 1. It
+ is additionally constrained to be nonnegative. For the vertical shift
+ we define a prior with mean 0 and standard deviation 5.
We now adaptively find input-output pairs from our map
+
+
+ G
+ in a region of interest using an inversion method (an ensemble Kalman
+ process). This is the Calibrate stage, and iteratively generates
+ parameter combinations, that refine around a region of high posterior
+ mass.
+ const EKP = CalibrateEmulateSample.EnsembleKalmanProcesses
+N_ensemble = 10
+N_iterations = 5
+initial_ensemble = EKP.construct_initial_ensemble(prior, N_ensemble)
+ensemble_kalman_process = EKP.EnsembleKalmanProcess(
+ initial_ensemble, y_obs, Γ, EKP.Inversion();
+)
+for i in 1:N_iterations
+ params_i = EKP.get_phi_final(prior, ensemble_kalman_process)
+ G_ens = hcat([G(params_i[:, i]) for i in 1:N_ensemble]...)
+ EKP.update_ensemble!(ensemble_kalman_process, G_ens)
+end
+
+
The resulting ensemble from a calibration.
+
+
+
+
The adaptively refined training points from EKP are displayed in
+ [fig:eki]. We now build
+ an basic Gaussian process emulator from the GaussianProcesses.jl
+ package to emulate the map
+
+ G
+ using these points.
The Gaussian process emulator of the range and mean
+ maps, trained on the re-used calibration pairs
+
+
+
+
We evaluate the mean of this emulator on a grid, and also show the
+ value of the true
+
+ G
+ at training point locations in
+ [fig:GP_emulator].
+
We can then sample with this emulator using an MCMC scheme. We
+ first choose a good step size (an algorithm parameter) by running some
+ short sampling runs (of length 2,000 steps). Then we run the 100,000
+ step sampling run to generate samples of the joint posterior
+ distribution.
+ const MC = CalibrateEmulateSample.MarkovChainMonteCarlo
+mcmc = MC.MCMCWrapper(
+ MC.RWMHSampling(), y_obs, prior, emulator,
+)
+# choose a step size
+new_step = MC.optimize_stepsize(
+ mcmc; init_stepsize = 0.1, N = 2000,
+)
+# Now begin the actual MCMC
+chain = MC.sample(
+ mcmc, 100_000; stepsize = new_step, discard_initial = 2_000,
+)
+
+
The joint posterior distribution histogram
+
+
+
+
A histogram of the samples from the CES algorithm is displayed in
+ [fig:GP_2d_posterior].
+ We see that the posterior distribution contains the true value
+
+
+ (3.0,7.0)
+ with high probability.
+
+
+ Research projects using the package
+
Some research projects that use this codebase, or modifications of
+ it, are
+
+
+
(Dunbar
+ et al., 2021)
+
+
+
(Bieli
+ et al., 2022)
+
+
+
(Hillier,
+ 2022)
+
+
+
(Howland
+ et al., 2022)
+
+
+
(Dunbar,
+ Howland, et al., 2022)
+
+
+
(Mansfield
+ & Sheshadri, 2022)
+
+
+
(King
+ et al., 2023)
+
+
+
+
+ Acknowledgements
+
We acknowledge contributions from several others who played a role
+ in the evolution of this package. These include Adeline Hillier,
+ Ignacio Lopez Gomez and Thomas Jackson. The development of this
+ package was supported by the generosity of Eric and Wendy Schmidt by
+ recommendation of the Schmidt Futures program, National Science
+ Foundation Grant AGS-1835860, the Defense Advanced Research Projects
+ Agency (Agreement No. HR00112290030), the Heising-Simons Foundation,
+ Audi Environmental Foundation, and the Cisco Foundation.
+
+
+
+
+
+
+
+ BezansonJeff
+ EdelmanAlan
+ KarpinskiStefan
+ ShahViral B.
+
+ Julia: A fresh approach to numerical computing
+
+ Society for Industrial & Applied Mathematics (SIAM)
+ 201701
+ 59
+ 1
+ 10.1137/141000671
+ 65
+ 98
+
+
+
+
+
+ NottDavid J.
+ OngVictor M.-H.
+ FanY.
+ SissonS. A.
+
+ High-Dimensional ABC
+
+ CRC Press
+ 2018
+ 978-1-315-11719-5
+ 10.1201/9781315117195-8
+ 211
+ 241
+
+
+
+
+
+ ClearyEmmet
+ Garbuno-InigoAlfredo
+ LanShiwei
+ SchneiderTapio
+ StuartAndrew M.
+
+ Calibrate, emulate, sample
+
+ 2021
+ 424
+ 0021-9991
+ 10.1016/j.jcp.2020.109716
+ 109716
+
+
+
+
+
+
+ DunbarOliver R. A.
+ Lopez-GomezIgnacio
+ Garbuno-IñigoAlfredo Garbuno-Iñigo
+ HuangDaniel Zhengyu
+ BachEviatar
+ WuJin-long
+
+ EnsembleKalmanProcesses.jl: Derivative-free ensemble-based model calibration
+
+ The Open Journal
+ 2022
+ 7
+ 80
+ 10.21105/joss.04869
+ 4869
+
+
+
+
+
+
+ BieliMelanie
+ DunbarOliver R. A.
+ JongEmily K. de
+ JarugaAnna
+ SchneiderTapio
+ BischoffTobias
+
+ An efficient Bayesian approach to learning droplet collision kernels: Proof of concept using “Cloudy,” a new n-moment bulk microphysics scheme
+
+ 2022
+ 14
+ 8
+ 10.1029/2022MS002994
+ e2022MS002994
+
+
+
+
+
+
+ HillierAdeline
+
+ Supervised calibration and uncertainty quantification of subgrid closure parameters using ensemble Kalman inversion
+ Massachusetts Institute of Technology. Department of Electrical Engineering; Computer Science
+ 2022
+ 1721.1/145140
+
+
+
+
+
+ WilliamsChristopher KI
+ RasmussenCarl Edward
+
+
+ MIT press Cambridge, MA
+ 2006
+ 2
+ 10.1142/S0129065704001899
+
+
+
+
+
+ IglesiasMarco A
+ LawKody JH
+ StuartAndrew M
+
+ Ensemble kalman methods for inverse problems
+
+ IOP Publishing
+ 2013
+ 29
+ 4
+ 10.1088/0266-5611/29/4/045001
+ 045001
+
+
+
+
+
+
+ RahimiAli
+ RechtBenjamin
+ others
+
+ Random features for large-scale kernel machines.
+
+ 2007
+ 3
+ https://proceedings.neurips.cc/paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf
+ 5
+
+
+
+
+
+
+ RahimiAli
+ RechtBenjamin
+
+ Uniform approximation of functions with random bases
+
+ IEEE
+ 2008
+ 10.1109/allerton.2008.4797607
+ 555
+ 561
+
+
+
+
+
+ LiuFanghui
+ HuangXiaolin
+ ChenYudong
+ SuykensJohan A. K.
+
+ Random features for kernel approximation: A survey on algorithms, theory, and beyond
+
+ 2022
+ 44
+ 10
+ 10.1109/TPAMI.2021.3097011
+ 7128
+ 7148
+
+
+
+
+
+ CotterS. L.
+ RobertsG. O.
+ StuartA. M.
+ WhiteD.
+
+ MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster
+
+ Institute of Mathematical Statistics
+ 2013
+ 28
+ 3
+ 10.1214/13-STS421
+ 424
+ 446
+
+
+
+
+
+ SherlockChris
+ FearnheadPaul
+ RobertsGareth O.
+
+ The random walk metropolis: Linking theory and practice through a case study
+
+ Institute of Mathematical Statistics
+ 2010
+ 25
+ 2
+ 10.1214/10-sts327
+ 172
+ 190
+
+
+
+
+
+ DunbarOliver R. A.
+ Garbuno-InigoAlfredo
+ SchneiderTapio
+ StuartAndrew M.
+
+ Calibration and uncertainty quantification of convective parameters in an idealized GCM
+
+ 2021
+ 13
+ 9
+ 10.1029/2020MS002454
+ e2020MS002454
+
+
+
+
+
+
+ HowlandMichael F.
+ DunbarOliver R. A.
+ SchneiderTapio
+
+ Parameter uncertainty quantification in an idealized GCM with a seasonal cycle
+
+ 2022
+ 14
+ 3
+ 10.1029/2021MS002735
+ e2021MS002735
+
+
+
+
+
+
+ DunbarOliver R. A.
+ HowlandMichael F.
+ SchneiderTapio
+ StuartAndrew M.
+
+ Ensemble-based experimental design for targeting data acquisition to inform climate models
+
+ 2022
+ 14
+ 9
+ 10.1029/2022MS002997
+ e2022MS002997
+
+
+
+
+
+
+ PedregosaF.
+ VaroquauxG.
+ GramfortA.
+ MichelV.
+ ThirionB.
+ GriselO.
+ BlondelM.
+ PrettenhoferP.
+ WeissR.
+ DubourgV.
+ VanderplasJ.
+ PassosA.
+ CournapeauD.
+ BrucherM.
+ PerrotM.
+ DuchesnayE.
+
+ Scikit-learn: Machine learning in Python
+
+ 2011
+ 12
+ 2825
+ 2830
+
+
+
+
+
+ FairbrotherJamie
+ NemethChristopher
+ RischardMaxime
+ BreaJohanni
+ PinderThomas
+
+ GaussianProcesses. Jl: A nonparametric bayes package for the julia language
+
+ 2022
+ 102
+ 10.18637/jss.v102.i01
+ 1
+ 36
+
+
+
+
+
+ DixitVaibhav Kumar
+ RackauckasChristopher
+
+ GlobalSensitivity.jl: Performant and parallel global sensitivity analysis with julia
+
+ The Open Journal
+ 2022
+ 7
+ 76
+ 10.21105/joss.04561
+ 4561
+
+
+
+
+
+
+ Garbuno-InigoAlfredo
+ NüskenNikolas
+ ReichSebastian
+
+ Affine invariant interacting Langevin dynamics for Bayesian inference
+
+ SIAM
+ 2020
+ 19
+ 3
+ 10.1137/19M1304891
+ 1633
+ 1658
+
+
+
+
+
+ TankhilevichEvgeny
+ Ish-HorowiczJonathan
+ HameedTara
+ RoeschElisabeth
+ KleijnIstvan
+ StumpfMichael P H
+ HeFei
+
+ GpABC: a Julia package for approximate Bayesian computation with Gaussian process emulation
+
+ 202002
+ 1367-4803
+ 10.1093/bioinformatics/btaa078
+
+
+
+
+
+ HuangDaniel Zhengyu
+ HuangJiaoyang
+ ReichSebastian
+ StuartAndrew M
+
+ Efficient derivative-free bayesian inference for large-scale inverse problems
+
+ IOP Publishing
+ 202210
+ 38
+ 12
+ 10.1088/1361-6420/ac99fa
+ 125006
+
+
+
+
+
+
+ MansfieldL. A.
+ SheshadriA.
+
+ Calibration and uncertainty quantification of a gravity wave parameterization: A case study of the Quasi-Biennial Oscillation in an intermediate complexity climate model
+
+ 2022
+ 14
+ 11
+ 1942-2466
+ 10.1029/2022MS003245
+
+
+
+
+
+ KingRobert C
+ MansfieldLaura A
+ SheshadriAditi
+
+ Bayesian history matching applied to the calibration of a gravity wave parameterization
+ Preprints
+ 202312
+ 10.22541/essoar.170365299.96491153/v1
+
+
+
+
+
+ MetropolisNicholas
+ RosenbluthArianna W
+ RosenbluthMarshall N
+ TellerAugusta H
+ TellerEdward
+
+ Equation of state calculations by fast computing machines
+
+ American Institute of Physics
+ 1953
+ 21
+ 6
+ 10.1063/1.1699114
+ 1087
+ 1092
+
+
+
+
+
+ HugginsBobby
+ LiChengkun
+ TobabenMarlon
+ AarnosMikko J.
+ AcerbiLuigi
+
+ PyVBMC: Efficient bayesian inference in python
+
+ The Open Journal
+ 2023
+ 8
+ 86
+ 10.21105/joss.05428
+ 5428
+
+
+
+
+
+
+ GammalJonas El
+ SchönebergNils
+ TorradoJesús
+ FidlerChristian
+
+ Fast and robust bayesian inference using gaussian processes with GPry
+
+ IOP Publishing
+ 202310
+ 2023
+ 10
+ 10.1088/1475-7516/2023/10/021
+ 021
+
+
+
+
+
+
+ LivingstoneSamuel
+ ZanellaGiacomo
+
+ The barker proposal: Combining robustness and efficiency in gradient-based MCMC
+
+ Oxford University Press
+ 2022
+ 84
+ 2
+ 10.1111/rssb.12482
+ 496
+ 523
+
+
+
+
+
+ HoffmanMatthew D
+ GelmanAndrew
+ others
+
+ The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo.
+
+ 2014
+ 15
+ 1
+ 1593
+ 1623
+
+
+
+
+
diff --git a/joss.06372/10.21105.joss.06372.pdf b/joss.06372/10.21105.joss.06372.pdf
new file mode 100644
index 0000000000..f944f9d61d
Binary files /dev/null and b/joss.06372/10.21105.joss.06372.pdf differ
diff --git a/joss.06372/media/sinusoid_GP_emulator_contours.png b/joss.06372/media/sinusoid_GP_emulator_contours.png
new file mode 100644
index 0000000000..8eb67bf43a
Binary files /dev/null and b/joss.06372/media/sinusoid_GP_emulator_contours.png differ
diff --git a/joss.06372/media/sinusoid_MCMC_hist_GP.png b/joss.06372/media/sinusoid_MCMC_hist_GP.png
new file mode 100644
index 0000000000..10e1f7ef1a
Binary files /dev/null and b/joss.06372/media/sinusoid_MCMC_hist_GP.png differ
diff --git a/joss.06372/media/sinusoid_eki_pairs.png b/joss.06372/media/sinusoid_eki_pairs.png
new file mode 100644
index 0000000000..229f75d1c6
Binary files /dev/null and b/joss.06372/media/sinusoid_eki_pairs.png differ
diff --git a/joss.06372/media/sinusoid_prior.png b/joss.06372/media/sinusoid_prior.png
new file mode 100644
index 0000000000..ae7e41d1f7
Binary files /dev/null and b/joss.06372/media/sinusoid_prior.png differ
diff --git a/joss.06372/media/sinusoid_true_vs_observed_signal.png b/joss.06372/media/sinusoid_true_vs_observed_signal.png
new file mode 100644
index 0000000000..d143ac7c50
Binary files /dev/null and b/joss.06372/media/sinusoid_true_vs_observed_signal.png differ