diff --git a/joss.07410/10.21105.joss.07410.crossref.xml b/joss.07410/10.21105.joss.07410.crossref.xml new file mode 100644 index 0000000000..9666196e67 --- /dev/null +++ b/joss.07410/10.21105.joss.07410.crossref.xml @@ -0,0 +1,161 @@ + + + + 20241125165450-f57d540279810a5dcb19535c9ed41f4a3b02eb83 + 20241125165450 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 11 + 2024 + + + 9 + + 103 + + + + Snk: A Snakemake CLI and Workflow Management +System + + + + Wytamma + Wirth + + Peter Doherty Institute for Infection and Immunity, University of Melbourne, Australia + + https://orcid.org/0000-0001-7070-0078 + + + Simon + Mutch + + Melbourne Data Analytics Platform, University of Melbourne, Melbourne 3010, Australia + + https://orcid.org/0000-0002-3166-4614 + + + Robert + Turnbull + + Melbourne Data Analytics Platform, University of Melbourne, Melbourne 3010, Australia + + https://orcid.org/0000-0003-1274-6750 + + + + 11 + 25 + 2024 + + + 7410 + + + 10.21105/joss.07410 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.14214901 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/7410 + + + + 10.21105/joss.07410 + https://joss.theoj.org/papers/10.21105/joss.07410 + + + https://joss.theoj.org/papers/10.21105/joss.07410.pdf + + + + + + Pango lineage designation and assignment +using SARS-CoV-2 spike gene nucleotide sequences + O’Toole + BMC Genomics + 1 + 23 + 10.1186/s12864-022-08358-2 + 1471-2164 + 2022 + O’Toole, Á., Pybus, O. G., Abram, M. +E., Kelly, E. J., & Rambaut, A. (2022). Pango lineage designation +and assignment using SARS-CoV-2 spike gene nucleotide sequences. BMC +Genomics, 23(1), 121. +https://doi.org/10.1186/s12864-022-08358-2 + + + Ten simple rules and a template for creating +workflows-as-applications + Roach + PLOS Computational Biology + 12 + 18 + 10.1371/journal.pcbi.1010705 + 1553-7358 + 2022 + Roach, M. J., Pierce-Ward, N. T., +Suchecki, R., Mallawaarachchi, V., Papudeshi, B., Handley, S. A., Brown, +C. T., Watson-Haigh, N. S., & Edwards, R. A. (2022). Ten simple +rules and a template for creating workflows-as-applications. PLOS +Computational Biology, 18(12), e1010705. +https://doi.org/10.1371/journal.pcbi.1010705 + + + Sustainable data analysis with snakemake +[version 2; peer review: 2 approved] + Mölder + F1000Research + 33 + 10 + 10.12688/f1000research.29032.2 + 2021 + Mölder, F., Jablonski, K. Ph., +Letcher, B., Hall, M. B., Tomkins-Tinch, Christopher H, Sochat, V., +Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, +M., Rahmann, S., Nahnsen, S., & Köster, J. (2021). Sustainable data +analysis with snakemake [version 2; peer review: 2 approved]. +F1000Research, 10(33). +https://doi.org/10.12688/f1000research.29032.2 + + + + + + diff --git a/joss.07410/10.21105.joss.07410.pdf b/joss.07410/10.21105.joss.07410.pdf new file mode 100644 index 0000000000..d8d4b85a3d Binary files /dev/null and b/joss.07410/10.21105.joss.07410.pdf differ diff --git a/joss.07410/paper.jats/10.21105.joss.07410.jats b/joss.07410/paper.jats/10.21105.joss.07410.jats new file mode 100644 index 0000000000..3732e072fc --- /dev/null +++ b/joss.07410/paper.jats/10.21105.joss.07410.jats @@ -0,0 +1,320 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +7410 +10.21105/joss.07410 + +Snk: A Snakemake CLI and Workflow Management +System + + + +https://orcid.org/0000-0001-7070-0078 + +Wirth +Wytamma + + + + +https://orcid.org/0000-0002-3166-4614 + +Mutch +Simon + + + + +https://orcid.org/0000-0003-1274-6750 + +Turnbull +Robert + + + + + +Peter Doherty Institute for Infection and Immunity, +University of Melbourne, Australia + + + + +Melbourne Data Analytics Platform, University of Melbourne, +Melbourne 3010, Australia + + + + +25 +9 +2024 + +9 +103 +7410 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2024 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +Python +Snakemake +Workflow +Bioinformatics +Reproducibility +Command Line Interface + + + + + + Summary +

Snk (pronounced “snek”) is a workflow management tool designed to + simplify the use of Snakemake workflows by dynamically generating + command-line interfaces (CLIs). Snk allows complex Snakemake workflows + to be used as modular components in larger systems that can be + executed and managed from the command line with minimal overhead. This + enables researchers and developers to integrate and manage + sophisticated workflows seamlessly. Snk significantly improves the + interoperability and accessibility of Snakemake workflows, making it + easier to use and share computational pipelines in various research + fields.

+
+ + Statement of need +

The integration of bioinformatic analyses into comprehensive + pipelines (aka workflows) has revolutionised the field by improving + the robustness and reproducibility of analyses. One of the most + popular workflow frameworks is Snakemake + (Mölder + et al., 2021). Snakemake is a user-friendly and adaptable + make-style workflow framework with a powerful specification language + built atop of the Python programming language. Despite its success, + Snakemake workflows are often developed for specific research analysis + rather than as general-purpose reusable tools. That is, Snakemake + workflows are typically built for the reproducibility of a single + analysis but not necessarily built for flexibility.

+

To improve their utility, Snakemake workflows developers often + encapsulate workflows within CLI tools by producing wrapper code to + abstract the workflow execution, sometimes called + workflows-as-applications or workflows packages + (Roach + et al., 2022). These wrappers serve as intermediaries between + the end-user (via the CLI) and the workflow execution, enabling + developers to tailor the Snakemake experience to specific use cases. + For example, the pangolin CLI tool wraps a Snakemake workflow for + SARS-CoV-2 lineage assignment + (O’Toole + et al., 2022). Initiatives like Snaketool have simplified the + development of Snakemake-based CLIs by offering a template for + developers + (Roach + et al., 2022). Nonetheless, the onus remains on the developer + to create and maintain the CLI wrappers for their workflow.

+

Here we present Snk, a Snakemake workflow management system that + allows users to install Snakemake workflows as dynamically generated + CLIs. Thus users can create a CLI for their (or others’) Snakemake + workflows with minimal to no code changes required. The Snk-generated + CLIs follow best practices and include several features out of the box + that improve user experience. The CLIs can be configured at install + time or via a snk.yaml configuration file. Snk + is readily available for installation via PyPI and Conda, using the + commands pip install snk and + conda install snk, respectively.

+

Snk has two distinct major functions; managing the installation of + workflows, and dynamical generating CLIs from Snakemake configuration + files. To install a workflow as a CLI, users can specify the file + path, URL, or GitHub name (username/repo) of a workflow. Snk copies + (clones) workflows into a managed directory structure, creates a CLI + entry point, and optionally creates an isolated virtual environment + for each workflow. Workflows can be installed from specific commits, + tags, or branches, ensuring reproducibility. The advent of Snk allows + users to utilise the Snakemake workflow catalog + (https://snakemake.github.io/snakemake-workflow-catalog) + as a searchable package index of Snk-installable Snakemake tools. The + snk install command is flexible and can be used to install diverse + workflows using installation options. For example, the + dna-seq-gatk-variant-calling + workflow (release tag v2.1.1) can be installed as a CLI + named variant-calling with Snakemake v8.10.8 + and Pandas and NumPy dependencies using the following command:

+ snk install \ + snakemake-workflows/dna-seq-gatk-variant-calling \ + --name variant-calling \ + --snakemake 8.10.8 \ + -d pandas==1.5.3 \ + -d numpy==1.26.4 \ + -t v2.1.1 +

The workflow will then be accessible via the + variant-calling CLI in the terminal (Figure 1). + Additionally, the snk command can be used to + list and uninstall workflows installed with Snk. The complete + documentation for managing workflows can be found at + https://snk.wytamma.com/managing_workflows.

+ +

The variant-calling CLI generated + by Snk.

+ +
+

The core functionality of Snk is the dynamic creation of CLIs. + Internally snk uses the Snk-CLI sister package + to generate the CLI. By default key values pairs of the Snakemake + configfile are mapped to CLI option. For example, + samples: samples.tsv in the configfile will + generate a --samples option in the CLI with the + default value samples.tsv (Figure 2). The CLI + generated by snk is highly customisable and can be configured via a + snk.yaml file placed in the workflow directory. The snk.yaml file can + configure many aspects of CLI including subcommands, ASCII art, help + messages, resource files, default values, and much more. Complete + documentation for the Snk config file can be found at + https://snk.wytamma.com/snk_config_file.

+ +

The run command of the + variant-calling CLI dynamically generated + from the Snakemake configfile. Several standard options are provided + in the Options section, e.g., --dry + (equivalent to Snakemakes --dry-run), + --dag to create a DAG plot of the workflow, + and --cores witch defaults to all. The + Workflow Configuration section contains the options dynamically + generated from the configfile. Snk-CLI automatically infers the + defaults and types of the options and creates flags for boolean + options.

+ +
+

Developers can also directly use the Snk-CLI package to generate + CLIs for their Snakemake workflows. By using the + CLI class from Snk-CLI, developers can build a + fully featured workflow package without having to write a Snakemake + wrapper. We provide a guide for using Snk-CLI to build self-contained + workflow packages at + https://snk.wytamma.com/workflow_packages. + The Snk-CLI package is available via PyPI and can be installed using + the command pip install snk-cli.

+

Snk is a powerful tool that simplifies the use of Snakemake + workflows by dynamically generating CLIs. Snk is open-source software + released under the MIT license. Snk documentation, source code, and + issue tracker are available at + https://github.com/Wytamma/snk. + We welcome contributions and feedback from the community to improve + Snk and make it a valuable tool for the Snakemake community and + reproducible research at large.

+
+ + Acknowledgements +

The authors would like to thank Katherine Eaton + (@ktmeaton) for her valuable open-source + contributions. Additionally, we acknowledge all users who opened + issues and submitted pull requests, as their input has been + instrumental in enhancing this project. We also extend our gratitude + to the editors and reviewers at the Journal of Open Source Software + (JOSS) for their support and constructive feedback.

+
+ + + + + + + + O’TooleÁine + PybusOliver G. + AbramMichael E. + KellyElizabeth J. + RambautAndrew + + Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences + BMC Genomics + 202202 + 23 + 1 + 1471-2164 + https://doi.org/10.1186/s12864-022-08358-2 + 10.1186/s12864-022-08358-2 + 121 + + + + + + + RoachMichael J. + Pierce-WardN. Tessa + SucheckiRadoslaw + MallawaarachchiVijini + PapudeshiBhavya + HandleyScott A. + BrownC. Titus + Watson-HaighNathan S. + EdwardsRobert A. + + Ten simple rules and a template for creating workflows-as-applications + PLOS Computational Biology + + MarkelScott + + 202212 + 20241002 + 18 + 12 + 1553-7358 + https://dx.plos.org/10.1371/journal.pcbi.1010705 + 10.1371/journal.pcbi.1010705 + e1010705 + + + + + + + MölderFelix + JablonskiKim PHilipp + LetcherBrice + HallMichael B. + Tomkins-TinchChristopher H + SochatVanessa + ForsterJan + LeeSoohyun + TwardziokSven O. + KanitzAlexander + WilmAndreas + HoltgreweManuel + RahmannSven + NahnsenSven + KösterJohannes + + Sustainable data analysis with snakemake [version 2; peer review: 2 approved] + F1000Research + 2021 + 10 + 33 + 10.12688/f1000research.29032.2 + + + + +
diff --git a/joss.07410/paper.jats/variant-calling-cli-run.png b/joss.07410/paper.jats/variant-calling-cli-run.png new file mode 100644 index 0000000000..7557daf7ef Binary files /dev/null and b/joss.07410/paper.jats/variant-calling-cli-run.png differ diff --git a/joss.07410/paper.jats/variant-calling-cli.png b/joss.07410/paper.jats/variant-calling-cli.png new file mode 100644 index 0000000000..9c8ef91207 Binary files /dev/null and b/joss.07410/paper.jats/variant-calling-cli.png differ