diff --git a/joss.06156/10.21105.joss.06156.crossref.xml b/joss.06156/10.21105.joss.06156.crossref.xml
new file mode 100644
index 0000000000..175c595b38
--- /dev/null
+++ b/joss.06156/10.21105.joss.06156.crossref.xml
@@ -0,0 +1,390 @@
+
+
+
+ 20240328T112659-923a9a4ed5fa96b8fb2d478d3cbf089ac5b4f3c3
+ 20240328112659
+
+ JOSS Admin
+ admin@theoj.org
+
+ The Open Journal
+
+
+
+
+ Journal of Open Source Software
+ JOSS
+ 2475-9066
+
+ 10.21105/joss
+ https://joss.theoj.org
+
+
+
+
+ 03
+ 2024
+
+
+ 9
+
+ 95
+
+
+
+ simChef: High-quality data science simulations in
+R
+
+
+
+ James
+ Duncan
+ https://orcid.org/0000-0003-3297-681X
+
+
+ Tiffany
+ Tang
+ https://orcid.org/0000-0002-8079-6867
+
+
+ Corrine F.
+ Elliott
+ https://orcid.org/0000-0001-7935-9945
+
+
+ Philippe
+ Boileau
+ https://orcid.org/0000-0002-4850-2507
+
+
+ Bin
+ Yu
+ https://orcid.org/0000-0002-8888-4060
+
+
+
+ 03
+ 28
+ 2024
+
+
+ 6156
+
+
+ 10.21105/joss.06156
+
+
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+
+
+
+ Software archive
+ 10.5281/zenodo.10845638
+
+
+ GitHub review issue
+ https://github.com/openjournals/joss-reviews/issues/6156
+
+
+
+ 10.21105/joss.06156
+ https://joss.theoj.org/papers/10.21105/joss.06156
+
+
+ https://joss.theoj.org/papers/10.21105/joss.06156.pdf
+
+
+
+
+
+ Veridical data science
+ Yu
+ Proceedings of the National Academy of
+Sciences
+ 8
+ 117
+ 10.1073/pnas.1901326117
+ 0027-8424
+ 2020
+ Yu, B., & Kumbier, K. (2020).
+Veridical data science. Proceedings of the National Academy of Sciences,
+117(8), 3920–3929.
+https://doi.org/10.1073/pnas.1901326117
+
+
+ batchtools: Tools for R to work on batch
+systems
+ Lang
+ Journal of Open Source
+Software
+ 10
+ 2
+ 10.21105/joss.00135
+ 2475-9066
+ 2017
+ Lang, M., Bischl, B., & Surmann,
+D. (2017). batchtools: Tools for R to work on batch systems. Journal of
+Open Source Software, 2(10), 135.
+https://doi.org/10.21105/joss.00135
+
+
+ Welcome to the Tidyverse
+ Wickham
+ Journal of Open Source
+Software
+ 43
+ 4
+ 10.21105/joss.01686
+ 2475-9066
+ 2019
+ Wickham, H., Averick, M., Bryan, J.,
+Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A.,
+Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S.
+M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., …
+Yutani, H. (2019). Welcome to the Tidyverse. Journal of Open Source
+Software, 4(43), 1686.
+https://doi.org/10.21105/joss.01686
+
+
+ A Unifying Framework for Parallel and
+Distributed Processing in R using Futures
+ Bengtsson
+ The R Journal
+ 2
+ 13
+ 10.32614/RJ-2021-048
+ 2073-4859
+ 2021
+ Bengtsson, H. (2021). A Unifying
+Framework for Parallel and Distributed Processing in R using Futures.
+The R Journal, 13(2), 208.
+https://doi.org/10.32614/RJ-2021-048
+
+
+ R6: Encapsulated Classes with Reference
+Semantics
+ Chang
+ 2022
+ Chang, W. (2022). R6: Encapsulated
+Classes with Reference Semantics.
+https://r6.r-lib.org
+
+
+ Writing Effective and Reliable Monte Carlo
+Simulations with the SimDesign Package
+ Chalmers
+ The Quantitative Methods for
+Psychology
+ 4
+ 16
+ 10.20982/tqmp.16.4.p248
+ 2020
+ Chalmers, M. C., R. Philip AND
+Adkins. (2020). Writing Effective and Reliable Monte Carlo Simulations
+with the SimDesign Package. The Quantitative Methods for Psychology,
+16(4), 248–280.
+https://doi.org/10.20982/tqmp.16.4.p248
+
+
+ SimEngine: A Modular Framework for
+Statistical Simulations in R
+ Kenny
+ 10.48550/arXiv.2403.05698
+ 2024
+ Kenny, A., & Wolock, C. J.
+(2024). SimEngine: A Modular Framework for Statistical Simulations in R.
+https://doi.org/10.48550/arXiv.2403.05698
+
+
+ simpr: Flexible ’Tidyverse’-Friendly
+Simulations
+ Brown
+ 2023
+ Brown, E. (2023). simpr: Flexible
+’Tidyverse’-Friendly Simulations.
+https://statisfactions.github.io/simpr/
+
+
+ rsimsum: Summarise results from Monte Carlo
+simulation studies
+ Gasparini
+ Journal of Open Source
+Software
+ 26
+ 3
+ 10.21105/joss.00739
+ 2018
+ Gasparini, A. (2018). rsimsum:
+Summarise results from Monte Carlo simulation studies. Journal of Open
+Source Software, 3(26), 739.
+https://doi.org/10.21105/joss.00739
+
+
+ Declaring and Diagnosing Research
+Designs
+ Blair
+ American Political Science
+Review
+ 3
+ 113
+ 10.1017/S0003055419000194
+ 2019
+ Blair, G., Cooper, J., Coppock, A.,
+& Humphreys, M. (2019). Declaring and Diagnosing Research Designs.
+American Political Science Review, 113(3), 838–859.
+https://doi.org/10.1017/S0003055419000194
+
+
+ simhelpers: Helper Functions for Simulation
+Studies
+ Joshi
+ 2024
+ Joshi, M., & Pustejovsky, J.
+(2024). simhelpers: Helper Functions for Simulation Studies.
+https://meghapsimatrix.github.io/simhelpers/index.html
+
+
+ simTool: Conduct Simulation Studies with a
+Minimal Amount of Source Code
+ Scheer
+ 2020
+ Scheer, M. (2020). simTool: Conduct
+Simulation Studies with a Minimal Amount of Source Code.
+https://CRAN.R-project.org/packages=simTool
+
+
+ parSim: Parallel Simulation
+Studies
+ Epskamp
+ 2023
+ Epskamp, S. (2023). parSim: Parallel
+Simulation Studies.
+https://CRAN.R-project.org/package=parSim
+
+
+ simitation: Simplified
+Simulations
+ Shilane
+ 2023
+ Shilane, D., Budugutta, S., &
+Bansal, M. (2023). simitation: Simplified Simulations.
+https://CRAN.R-project.org/package=simitation
+
+
+ tidyMC: Monte Carlo Simulations Made Easy and
+Tidy
+ Linner
+ 2022
+ Linner, S., Moreira Lara, I., &
+Lehmann, K. (2022). tidyMC: Monte Carlo Simulations Made Easy and Tidy.
+https://github.com/stefanlinner/tidyMC
+
+
+ simmer: Discrete-event simulation for
+R
+ Ucar
+ Journal of Statistical
+Software
+ 2
+ 90
+ 10.18637/jss.v090.i02
+ 2019
+ Ucar, I., Smeets, B., & Azcorra,
+A. (2019). simmer: Discrete-event simulation for R. Journal of
+Statistical Software, 90(2), 1–30.
+https://doi.org/10.18637/jss.v090.i02
+
+
+ MonteCarloSEM: An R Package to Simulate Data
+for SEM
+ Orcan
+ International Journal of Assessment Tools in
+Education
+ 3
+ 8
+ 10.21449/ijate.804203
+ 2021
+ Orcan, F. (2021). MonteCarloSEM: An R
+Package to Simulate Data for SEM. International Journal of Assessment
+Tools in Education, 8(3), 704–713.
+https://doi.org/10.21449/ijate.804203
+
+
+ simMetric: Metrics (with Uncertainty) for
+Simulation Studies that Evaluate Statistical Methods
+ Parsons
+ 10.25912/RDF_1665114451679
+ 2022
+ Parsons, R. (2022). simMetric:
+Metrics (with Uncertainty) for Simulation Studies that Evaluate
+Statistical Methods. Queensland University of Technology.
+https://doi.org/10.25912/RDF_1665114451679
+
+
+ The Simulator: An Engine to Streamline
+Simulations
+ Bien
+ 10.48550/arXiv.1607.00021
+ 2016
+ Bien, J. (2016). The Simulator: An
+Engine to Streamline Simulations.
+https://doi.org/10.48550/arXiv.1607.00021
+
+
+ infer: An R package for tidyverse-friendly
+statistical inference
+ Couch
+ Journal of Open Source
+Software
+ 65
+ 6
+ 10.21105/joss.03661
+ 2021
+ Couch, S. P., Bray, A. P., Ismay, C.,
+Chasnovski, E., Baumer, B. S., & Çetinkaya-Rundel, M. (2021). infer:
+An R package for tidyverse-friendly statistical inference. Journal of
+Open Source Software, 6(65), 3661.
+https://doi.org/10.21105/joss.03661
+
+
+ Parallel and Other Simulations in R Made
+Easy: An End-to-End Study
+ Hofert
+ Journal of Statistical
+Software
+ 4
+ 69
+ 10.18637/jss.v069.i04
+ 2016
+ Hofert, M., & Mächler, M. (2016).
+Parallel and Other Simulations in R Made Easy: An End-to-End Study.
+Journal of Statistical Software, 69(4), 1–44.
+https://doi.org/10.18637/jss.v069.i04
+
+
+ Designing a data science simulation with
+MERITS: A primer
+ Elliott
+ 10.48550/arXiv.2403.08971
+ 2024
+ Elliott, C. F., Duncan, J., Tang, T.
+M., Behr, M., Kumbier, K., & Yu, B. (2024). Designing a data science
+simulation with MERITS: A primer.
+https://doi.org/10.48550/arXiv.2403.08971
+
+
+
+
+
+
diff --git a/joss.06156/10.21105.joss.06156.jats b/joss.06156/10.21105.joss.06156.jats
new file mode 100644
index 0000000000..8753f32231
--- /dev/null
+++ b/joss.06156/10.21105.joss.06156.jats
@@ -0,0 +1,935 @@
+
+
+
+
+
+
+
+Journal of Open Source Software
+JOSS
+
+2475-9066
+
+Open Journals
+
+
+
+6156
+10.21105/joss.06156
+
+simChef: High-quality data science
+simulations in R
+
+
+
+https://orcid.org/0000-0003-3297-681X
+
+Duncan
+James
+
+
+
+
+https://orcid.org/0000-0002-8079-6867
+
+Tang
+Tiffany
+
+
+*
+
+
+https://orcid.org/0000-0001-7935-9945
+
+Elliott
+Corrine F.
+
+
+
+
+https://orcid.org/0000-0002-4850-2507
+
+Boileau
+Philippe
+
+
+
+
+https://orcid.org/0000-0002-8888-4060
+
+Yu
+Bin
+
+
+
+
+
+
+
+
+Graduate Group in Biostatistics, University of California,
+Berkeley, United States of America
+
+
+
+
+Department of Statistics, University of California,
+Berkeley, United States of America
+
+
+
+
+Department of Electrical Engineering and Computer Sciences,
+University of California, Berkeley, United States of
+America
+
+
+
+
+Center for Computational Biology, University of California,
+Berkeley, United States of America
+
+
+
+
+* E-mail:
+
+
+28
+6
+2023
+
+9
+95
+6156
+
+Authors of papers retain copyright and release the
+work under a Creative Commons Attribution 4.0 International License (CC
+BY 4.0)
+2022
+The article authors
+
+Authors of papers retain copyright and release the work under
+a Creative Commons Attribution 4.0 International License (CC BY
+4.0)
+
+
+
+simulations
+data science
+R
+
+
+
+
+
+ Summary
+
simChef is an R
+ package that empowers data science practitioners to rapidly plan,
+ carry out, and summarize statistical simulation studies in a flexible,
+ efficient, and low-code manner. Drawing substantially from the
+ Predictability, Computability, and Stability (PCS) framework
+ (Yu
+ & Kumbier, 2020), simChef emphasizes
+ the scientific best practices encompassed by PCS by removing many of
+ the administrative burdens of simulation design through: (1) an
+ intuitive
+ tidy
+ grammar of data science simulations; (2) powerful
+ abstractions for distributed simulation processing backed by
+ future
+ (Bengtsson,
+ 2021); and (3) automated generation of interactive
+ R
+ Markdown simulation documentation, situating results next
+ to the workflows needed to reproduce them. Taken together,
+ simChef’s capabilities overcome many of the
+ design, computational, and reproducibility hurdles inherent in nearly
+ every data science simulation study.
+
+
+ Statement of need
+
Data science simulation studies occupy an important role in
+ scientific research as a means to gain insight into new and existing
+ statistical methods. Simulations serve as statistical sandboxes that
+ open a path toward otherwise inaccessible discoveries. For example,
+ they can be used to establish comprehensive benchmarks of existing
+ procedures for a common task; to demonstrate the strengths and
+ weaknesses of novel methodology applied to synthetic and real-world
+ data; or to probe the validity of a theoretical analysis.
+
Creating high-quality simulation studies typically involves a
+ number of repetitive and error-prone coding tasks: implementing
+ data-generating processes (DGPs) and statistical methods; sampling
+ from these DGPs; parallelizing computation of simulation replicates;
+ summarizing metrics; visualizing, documenting, presenting, and saving
+ results; and so on. While this administrative overhead is necessary,
+ it is not sufficient for scientific understanding. Data scientists
+ must navigate a number of important judgment calls such as the choice
+ of DGPs, baseline statistical methods, associated parameters, and
+ evaluation metrics for scientific relevancy.
+
While the scientific context may vary drastically from one study to
+ the next, the simulation scaffolding remains largely similar. Yet
+ simulation code repositories often lack reusability, both for novel
+ settings and when new questions arise in the original context.
+ simChef addresses the need for an intuitive,
+ extensible, and reusable framework for data science simulations,
+ allowing data science practitioners to focus their energies on
+ scientific questions by reducing the burdens of parameterization,
+ parallelization, and documentation.
+
+
+ Core abstractions of data science simulations
+
At its core, simChef breaks down a
+ simulation experiment into four modular components
+ ([fig:api]), each
+ implemented as an R6 class
+ (Chang,
+ 2022):
+
+
+
DGP: the data-generating processes from
+ which to generate data
+
+
+
Method: the methods (or models) to
+ fit in the experiment
+
+
+
Evaluator: the evaluation metrics used
+ to evaluate the methods’ performance
+
+
+
Visualizer: the visualization functions
+ used to visualize outputs from the method fits or
+ evaluation results (can be tables, plots, or even
+ R Markdown snippets to display)
+
+
+
+
Overview of the four core components in a
+ simChefExperiment.
+ simChef provides four classes that implement
+ distinct simulation objects in an intuitive and modular manner:
+ DGP, Method,
+ Evaluator, and
+ Visualizer. Using these classes, users can
+ easily build a simChef
+ Experiment using reusable, customizable
+ functions (i.e., dgp_fun,
+ method_fun, eval_fun,
+ and viz_fun). Optional named parameters can
+ be set in these custom functions via the ...
+ arguments in the create_*() methods.
+
+
+
+
Using these classes, users can create or reuse custom functions
+ (i.e., dgp_fun,
+ method_fun, eval_fun,
+ and viz_fun in
+ [fig:api]) aligned with
+ their scientific goals. The custom functions then can be parameterized
+ and encapsulated in one of the corresponding classes via a
+ create_* method, together with optional named
+ parameters (see
+ [fig:api]).
+
A fifth R6 class,
+ Experiment, unites the four components above
+ and serves as a concrete implementation of the user’s intent to answer
+ a specific scientific question. Specifically, the
+ Experiment stores references to the
+ DGP(s), Method(s),
+ Evaluator(s), and
+ Visualizer(s) along with the
+ DGP and Method
+ parameters that should be varied and combined during the simulation
+ run.
+
+
Overview of running a simChef
+ Experiment. The
+ Experiment class handles relationships among
+ the four classes portrayed in
+ [fig:api].
+ Experiments may have multiple DGPs and
+ Methods, which are combined across the
+ Cartesian product of their varying parameters (represented by
+ \*). Once computed, each
+ Evaluator and
+ Visualizer takes in the fitted simulation
+ replicates, while Visualizer additionally
+ receives evaluation summaries.
+
+
+
+
+
+ A powerful grammar of data science simulations
+
Inspired by the tidyverse
+ (Wickham
+ et al., 2019), simChef develops an
+ intuitive grammar for running simulation studies using the
+ aforementioned R6 classes. We provide an
+ illustrative example usage next.
In the example usage, DGP(s),
+ Method(s), Evaluator(s),
+ and Visualizer(s) are first created via
+ create_*(). These simulation objects can then
+ be combined into an Experiment using either
+ create_experiment() and/or
+ add_*().
+
In an Experiment,
+ DGP(s) and Method(s) can
+ also be varied across one or multiple parameters via
+ add_vary_across(). For instance, in the example
+ Experiment, there are two
+ DGP instances, both of which are varied across
+ three values of n and one of which is
+ additionally varied across two values of
+ sparse. This effectively results in nine
+ distinct configurations for data generation (i.e., 3 variations on
+ dgp1 + 3x2 variations on
+ dgp2). For the single
+ Method in the experiment, we use three values
+ of scalar_valued_param, two of
+ vector_valued_param, and another two of
+ list_valued_param, giving 12 distinct
+ configurations. Hence, there are a total of 9x12 = 108
+ DGP-method-parameter combinations in the
+ Experiment.
+
Thus far, we have simply instantiated an
+ Experiment object (akin to creating a recipe
+ for an experiment). To compute and run the simulation experiment, we
+ next call run_experiment with the desired
+ number of replicates. As summarized in
+ [fig:run-exper],
+ running the experiment will (1) fit each
+ Method on each DGP (and
+ for each of the varying parameter configurations), (2)
+ evaluate the experiment according to the given
+ Evaluator(s), and (3)
+ visualize the experiment according to the given
+ Visualizer(s). Furthermore, the number of
+ replicates per combination of DGP,
+ Method, and parameters specified via
+ add_vary_across is determined by the
+ n_reps argument to
+ run_experiment. Because replication happens at
+ the per-combination level, the effective total number of replicates in
+ the Experiment depends on the number of DGPs,
+ methods, and varied parameters. In the given example, there are 108
+ DGP-method-parameter combinations, each of which is replicated 100
+ times. To reduce the computational burden, the
+ Experiment class flexibly handles the
+ computation of simulation replicates in parallel using the
+ future package
+ (Bengtsson,
+ 2021).
+ [fig:exper-schematic]
+ provides a detailed schematic of the
+ run_experiment workflow, along with the
+ expected inputs to and outputs from user-defined functions.
+
+
Detailed schematic of the
+ run_experiment workflow using
+ simChef. Expected inputs to and outputs from
+ user-defined functions are also
+ provided.
+
+
+
+
+ Additional Features
+
In addition to the ease of parallelization,
+ simChef enables caching of results to further
+ alleviate the computational burden. Here, users can choose to save the
+ experiment’s results to disk by passing
+ save = TRUE to
+ run_experiment. Once saved, the user can add
+ new DGP and Method
+ objects to the experiment and compute additional replicates without
+ re-computing existing results via the
+ use_cached option. Considering the example
+ above, when we add new_method and call
+ run_experiment with
+ use_cached = TRUE,
+ simChef finds that the cached results are
+ missing combinations of new_method, existing
+ DGPs, and their associated parameters, giving nine new configurations.
+ Replicates for the new combinations are then appended to the cached
+ results.
+
simChef also provides users with a
+ convenient API to automatically generate an R
+ Markdown document. This documentation gathers the scientific details,
+ summary tables, and visualizations side-by-side with the user’s custom
+ source code and parameters for data-generating processes, statistical
+ methods, evaluation metrics, and plots. A call to
+ init_docs generates empty markdown files for
+ the user to populate with their overarching simulation objectives and
+ with descriptions of each of the DGP,
+ Method, Evaluator, and
+ Visualizer objects included in the
+ Experiment. Finally, a call to
+ render_docs prepares the
+ R Markdown document, either for iterative
+ design and analysis of the simulation or to provide a high-quality
+ overview that can be shared easily. We provide an example of the
+ simulation documentation
+ here.
+ Corresponding R source code is available on
+ GitHub.
+
+
+ Related R packages
+
A number of existing R packages and projects
+ address needs related to simChef’s
+ functionality. At a higher level of abstraction, the
+ batchtools package
+ (Lang
+ et al., 2017) includes concepts for “problems”, “algorithms”,
+ and “experiments”, similar to simChef’s
+ DGP, Method, and
+ Experiment objects, respectively, but less
+ tailored to the specific needs of data science simulation experiments.
+ Additionally, batchtools provides a number of
+ utilities for shared-memory and distributed memory computations,
+ including for interacting with high-performance computing cluster
+ schedulers such as Slurm and Torque. simChef is
+ able to leverage these utilities for distributed computations via the
+ backends provided by the future.batchtools
+ package which is part of the future ecosystem
+ of R packages
+ (Bengtsson,
+ 2021). Whereas batchtools is a general
+ tool for distributed mapping operations,
+ simChef specializes in data science simulations
+ and provides additional functionality tailored to that setting
+ including its tidy grammar of simulation
+ experiments, the Evaluator and
+ Visualizer concepts, and automated
+ documentation capabilities discussed above.
+
Like simChef, many existing packages
+ specifically aim to simplify the process of creating simulation
+ experiments by reducing coding burden through helpful abstractions,
+ distributed computing helpers, and preset methods for generating,
+ computing, and summarizing simulation replicates. Of particular note
+ are the following:
+
+
+
SimDesign
+ (Chalmers,
+ 2020) focuses on Monte Carlo simulation experiments and
+ provides a function runSimulation that
+ accepts user-defined generate,
+ analyse, and
+ summarise functions, with support for
+ distributed computation via the parallel
+ base R package and
+ future.
+
+
+
simulator
+ (Bien,
+ 2016) provides a tidy grammar of
+ simulation experiments and highly modular helpers for evaluating
+ and managing simulation outputs, relying on the
+ parallel package for distributed
+ computation.
+
+
+
simpr
+ (Brown,
+ 2023) defines a tidy simulation
+ framework for generating data, fitting models, varying parameters,
+ and aggregating simulation results with user-defined and
+ purr-style functions. In addition, it
+ support distributed computations backed by the
+ future framework.
+
+
+
SimEngine
+ (Kenny
+ & Wolock, 2024) defines and executes simulation
+ ‘levels’ (parameters to vary) and ‘scripts’ (functions to execute
+ a single simulation replicate). It manages the definition and
+ execution of simulations and calculates summary statistics, with
+ support for distributed computations in coordination with
+ high-performance computing cluster schedulers.
+
+
+
A third category of related packages are those that share
+ conceptual similarities simChef in terms of
+ providing helpful abstractions for the design and analysis of
+ simulation experiments, but at a finer level of detail than
+ simChef intends. For example, the package
+ DeclareDesign
+ (Blair
+ et al., 2019) provides various declare_*
+ functions for defining and evaluating statistical research questions,
+ with an emphasis on the social sciences. The package
+ infer
+ (Couch
+ et al., 2021) provides a tidy API for
+ statistical inference, providing the ability to specify random
+ variables and their relationships, define a null hypothesis, generate
+ data under that hypothesis, and calculate distributions of statistics
+ based on that hypothesis. Both of these packages and many of the
+ packages below could be employed in a user’s
+ DGP, Method,
+ Evaluator, or Visualizer
+ and deployed via an Experiment to carry out a
+ large-scale simulation with automated documentation in harmony with
+ simChef.
+
Finally, many packages provide a small number of well-tailored
+ helper functions for specific data-generating processes and simulation
+ settings, with or without distributed computation. In no particular
+ order these include: simitation
+ (Shilane
+ et al., 2023), simhelpers
+ (Joshi
+ & Pustejovsky, 2024), simTool
+ (Scheer,
+ 2020), parSim
+ (Epskamp,
+ 2023), rsimsum
+ (Gasparini,
+ 2018), simsalapar
+ (Hofert
+ & Mächler, 2016), tidyMC
+ (Linner
+ et al., 2022), MonteCarloSEM
+ (Orcan,
+ 2021), simMetric
+ (Parsons,
+ 2022), and simmer
+ (Ucar
+ et al., 2019). To our knowledge, no single existing package
+ includes simChef’s combination of conceptual
+ modularity, tidy grammar, computational
+ flexibility, simulation workflow management, and automated
+ documentation.
+
+
+ Discussion
+
While simChef’s core functionality focuses
+ on computability (C) – encompassing efficient usage of computational
+ resources, ease of user interaction, reproducibility, and
+ documentation – we emphasize the importance of predictability (P) and
+ stability (S) in data science simulations (see
+ (Elliott
+ et al., 2024) for an in-depth discussion). The principal goal
+ of simChef is to provide a tool for data
+ scientists to create simulations that incorporate predictability
+ (through fit to real-world data) and stability (through sufficient
+ exploration of uncertainty) in their simulations. In future work, we
+ intend to provide tools that can be flexibly tailored to a user’s
+ particular scientific needs and further these goals through automated
+ predictability and stability summaries and documentation.
+
+
+ Acknowledgements
+
The authors gratefully acknowledge partial support from (a) the NSF
+ under awards DMS-2209975, 1613002, 1953191, 2015341, and IIS 1741340;
+ and grant 2023505 supporting the Foundations of Data Science Institute
+ (FODSI); (b) the Weill Neurohub; and (c) the Chan Zuckerberg Biohub
+ under an Intercampus Research Award. TMT acknowledges support from the
+ NSF Graduate Research Fellowship Program DGE-2146752.
+
+
+
+
+
+
+
+ YuBin
+ KumbierKarl
+
+ Veridical data science
+
+ 202002
+ 20210904
+ 117
+ 8
+ 0027-8424
+ http://www.pnas.org/lookup/doi/10.1073/pnas.1901326117
+ 10.1073/pnas.1901326117
+ 3920
+ 3929
+
+
+
+
+
+ LangMichel
+ BischlBernd
+ SurmannDirk
+
+ batchtools: Tools for R to work on batch systems
+
+ 201702
+ 20230420
+ 2
+ 10
+ 2475-9066
+ https://joss.theoj.org/papers/10.21105/joss.00135
+ 10.21105/joss.00135
+ 135
+
+
+
+
+
+
+ WickhamHadley
+ AverickMara
+ BryanJennifer
+ ChangWinston
+ McGowanLucy D’Agostino
+ FrançoisRomain
+ GrolemundGarrett
+ HayesAlex
+ HenryLionel
+ HesterJim
+ KuhnMax
+ PedersenThomas Lin
+ MillerEvan
+ BacheStephan Milton
+ MüllerKirill
+ OomsJeroen
+ RobinsonDavid
+ SeidelDana Paige
+ SpinuVitalie
+ TakahashiKohske
+ VaughanDavis
+ WilkeClaus
+ WooKara
+ YutaniHiroaki
+
+ Welcome to the Tidyverse
+
+ 201911
+ 20230420
+ 4
+ 43
+ 2475-9066
+ https://joss.theoj.org/papers/10.21105/joss.01686
+ 10.21105/joss.01686
+ 1686
+
+
+
+
+
+
+ BengtssonHenrik
+
+ A Unifying Framework for Parallel and Distributed Processing in R using Futures
+
+ 2021
+ 20230420
+ 13
+ 2
+ 2073-4859
+ https://journal.r-project.org/archive/2021/RJ-2021-048/index.html
+ 10.32614/RJ-2021-048
+ 208
+
+
+
+
+
+
+ ChangWinston
+
+
+ 2022
+ https://r6.r-lib.org
+
+
+
+
+
+ ChalmersMark C.R. Philip AND Adkins
+
+ Writing Effective and Reliable Monte Carlo Simulations with the SimDesign Package
+
+ TQMP
+ 2020
+ 16
+ 4
+ http://www.tqmp.org/RegularArticles/vol16-4/p248/p248.pdf
+ 10.20982/tqmp.16.4.p248
+ 248
+ 280
+
+
+
+
+
+ KennyAvi
+ WolockCharles J.
+
+ SimEngine: A Modular Framework for Statistical Simulations in R
+ 2024
+ https://doi.org/10.48550/arXiv.2403.05698
+ 10.48550/arXiv.2403.05698
+
+
+
+
+
+ BrownEthan
+
+
+ 2023
+ https://statisfactions.github.io/simpr/
+
+
+
+
+
+ GaspariniAlessandro
+
+ rsimsum: Summarise results from Monte Carlo simulation studies
+
+ The Open Journal
+ 2018
+ 3
+ 26
+ https://joss.theoj.org/papers/10.21105/joss.00739
+ 10.21105/joss.00739
+ 739
+
+
+
+
+
+
+ BlairGraeme
+ CooperJasper
+ CoppockAlexander
+ HumphreysMacartan
+
+ Declaring and Diagnosing Research Designs
+
+ 2019
+ 113
+ 3
+ https://doi.org/10.1017/S0003055419000194
+ 10.1017/S0003055419000194
+ 838
+ 859
+
+
+
+
+
+ JoshiMegha
+ PustejovskyJames
+
+
+ 2024
+ https://meghapsimatrix.github.io/simhelpers/index.html
+
+
+
+
+
+ ScheerMarsel
+
+
+ 2020
+ https://CRAN.R-project.org/packages=simTool
+
+
+
+
+
+ EpskampSacha
+
+
+ 2023
+ https://CRAN.R-project.org/package=parSim
+
+
+
+
+
+ ShilaneDavid
+ BuduguttaSrivastav
+ BansalMayur
+
+
+ 2023
+ https://CRAN.R-project.org/package=simitation
+
+
+
+
+
+ LinnerStefan
+ Moreira LaraIgnacio
+ LehmannKonstantin
+
+
+ 2022
+ https://github.com/stefanlinner/tidyMC
+
+
+
+
+
+ UcarIñaki
+ SmeetsBart
+ AzcorraArturo
+
+ simmer: Discrete-event simulation for R
+
+ 2019
+ 90
+ 2
+ https://dogi.org/10.18637/jss.v090.i02
+ 10.18637/jss.v090.i02
+ 1
+ 30
+
+
+
+
+
+ OrcanFatih
+
+ MonteCarloSEM: An R Package to Simulate Data for SEM
+
+ 2021
+ 8
+ 3
+ https://dergipark.org.tr/en/download/article-file/1323860
+ 10.21449/ijate.804203
+ 704
+ 713
+
+
+
+
+
+ ParsonsRex
+
+ simMetric: Metrics (with Uncertainty) for Simulation Studies that Evaluate Statistical Methods
+ Queensland University of Technology
+ 2022
+ https://doi.org/10.25912/RDF_1665114451679
+ 10.25912/RDF_1665114451679
+
+
+
+
+
+ BienJacob
+
+ The Simulator: An Engine to Streamline Simulations
+ 2016
+ https://doi.org/10.48550/arXiv.1607.00021
+ 10.48550/arXiv.1607.00021
+
+
+
+
+
+ CouchSimon P.
+ BrayAndrew P.
+ IsmayChester
+ ChasnovskiEvgeni
+ BaumerBenjamin S.
+ Çetinkaya-RundelMine
+
+ infer: An R package for tidyverse-friendly statistical inference
+
+ 2021
+ 6
+ 65
+ https://joss.theoj.org/papers/10.21105/joss.03661
+ 10.21105/joss.03661
+ 3661
+
+
+
+
+
+
+ HofertMarius
+ MächlerMartin
+
+ Parallel and Other Simulations in R Made Easy: An End-to-End Study
+
+ 2016
+ 69
+ 4
+ https://doi.org/10.18637/jss.v069.i04
+ 10.18637/jss.v069.i04
+ 1
+ 44
+
+
+
+
+
+ ElliottCorrine F
+ DuncanJames
+ TangTiffany M
+ BehrMerle
+ KumbierKarl
+ YuBin
+
+ Designing a data science simulation with MERITS: A primer
+ 2024
+ https://arxiv.org/abs/2403.08971
+ 10.48550/arXiv.2403.08971
+
+
+
+
+
diff --git a/joss.06156/10.21105.joss.06156.pdf b/joss.06156/10.21105.joss.06156.pdf
new file mode 100644
index 0000000000..a9c2b2c892
Binary files /dev/null and b/joss.06156/10.21105.joss.06156.pdf differ
diff --git a/joss.06156/media/api_overview.png b/joss.06156/media/api_overview.png
new file mode 100644
index 0000000000..d139dd593c
Binary files /dev/null and b/joss.06156/media/api_overview.png differ
diff --git a/joss.06156/media/fit_eval_viz.png b/joss.06156/media/fit_eval_viz.png
new file mode 100644
index 0000000000..1f471077ce
Binary files /dev/null and b/joss.06156/media/fit_eval_viz.png differ
diff --git a/joss.06156/media/run_experiment.png b/joss.06156/media/run_experiment.png
new file mode 100644
index 0000000000..bbf0aa60a5
Binary files /dev/null and b/joss.06156/media/run_experiment.png differ