From 6560d3f1aea059040e0a28a1c1e4efa839b4f7cb Mon Sep 17 00:00:00 2001 From: Peter van Heusden Date: Fri, 20 Sep 2024 20:03:40 +0200 Subject: [PATCH] Add Galaxy to Additional Resources --- .../_posts/2024-09-14-additional-resources.md | 2 +- .../_posts/2024-09-15-galaxy.md | 109 ++++++++++++++++++ .../kleborate/_posts/2024-09-01-kleborate.md | 10 ++ serve.sh | 2 +- 4 files changed, 121 insertions(+), 2 deletions(-) create mode 100644 modules/additional-resources/_posts/2024-09-15-galaxy.md create mode 100644 modules/kleborate/_posts/2024-09-01-kleborate.md diff --git a/modules/additional-resources/_posts/2024-09-14-additional-resources.md b/modules/additional-resources/_posts/2024-09-14-additional-resources.md index 2a692b2..4b0bcf2 100644 --- a/modules/additional-resources/_posts/2024-09-14-additional-resources.md +++ b/modules/additional-resources/_posts/2024-09-14-additional-resources.md @@ -1,5 +1,5 @@ --- -Title: Additional resources +title: Additional resources ---
diff --git a/modules/additional-resources/_posts/2024-09-15-galaxy.md b/modules/additional-resources/_posts/2024-09-15-galaxy.md new file mode 100644 index 0000000..83b98a1 --- /dev/null +++ b/modules/additional-resources/_posts/2024-09-15-galaxy.md @@ -0,0 +1,109 @@ +--- +title: Galaxy for web-based pathogen data analysis +--- +### What is Galaxy + +[Galaxy](https://galaxyproject.org) is a free, open-source system for analyzing data, authoring workflows and more. The Galaxy server environment can be installed and run locally but for most uses the free service provided by public servers like , and . If you or your institution wants to run a local Galaxy server, consult the [Galaxy for sysadmins](https://training.galaxyproject.org/training-material/topics/admin/index.html) training resources. + +Training materials for Galaxy are available on the [Galaxy Training Network](https://training.galaxyproject.org/) portal. + +Galaxy can be used to run individual tools like `fastp`, `FastQC` and `snippy` and in fact all of these tools are available on the public servers mentioned above. To be very productive with Galaxy, however, it is important to learn about two main concepts: collections and workflows. + +#### Sharing a history + +One of the most useful features of Galaxy is that you can share your history with other Galaxy users if they are using the same server. This lets you do data analysis together with colleagues and is also useful if you encounter an error and want to share your history so that you can get advice. Read this guide on [Sharing your history](https://training.galaxyproject.org/training-material/faqs/galaxy/histories_sharing.html). + +#### Collections + +In Galaxy, data is stored datasets that are in histories. Each history can be thought of as a folder and it is best to use a single history for a single analysis. In fact, once primary data (e.g. reads and reference genomes) is well organised, it often is a good idea to make a copy of the history with the primary data so that if mistakes are made in an analysis it is possible to return to the clean organisation of the primary data. A copy of a history does not use additional disk space as each dataset is stored only once even if referred to multiple times. + +Within a history, datasets can be organised into collections. Collections are groups of related datasets, for example pairs for paired-end read data, lists for groups of datasets, such as a list of datasets from single-end read technologies like Oxford Nanopore and lists of pairs for collections of paired-end samples. Learn more about collections in the [collections tutorial](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/collections/tutorial.html). + +In general, when run with a collection as an input, Galaxy will run a tool several times, once for every element of the collection. For example, if reads are organised into a list of pairs, `fastp` can be run on the entire list and it will run once for each sample (i.e. each pair of reads). + +To organise datasets into collections it is worth learning about the [Rule Based Uploader](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/upload-rules/tutorial.html). This is a Galaxy feature that can rapidly re-organise your datasets into collections. There is also a second, more [advanced tutorial on the Rule Based Uploader](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/upload-rules-advanced/tutorial.html). + +#### Workflows + +Galaxy tools can be combined into workflows. These workflows can provide a complete and sometimes complex recipe for processing data. In the tutorial [A short introduction to Galaxy](https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html) there is an introduction to running a workflow. + +At a more advanced level, the [M. tuberculosis variant analysis](https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.html#processing-many-samples-at-once-collections-and-workflows-optional) tutorial also includes a final step using workflows, as shown in this [video](https://youtu.be/-nJPngFk36c?si=uuP7GMiVGxctrMVv&t=3558) which corresponds to [this section of the tutorial](https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.html#processing-many-samples-at-once-collections-and-workflows-optional). + +For your use, here are two workflows on the server: + +1. [Bacterial Genome Assembly](https://usegalaxy.eu/u/pvanheus/w/bacterial-genome-assembly) which corresponds to the main part of the default Bactopia workflow. + +2. [Bacterial Variant Analysis](https://usegalaxy.eu/u/pvanheus/w/bacteria-genome-variant-analysis) which is a snippy-based workflow for reference based variant analysis and phylogeny building. + +You can import these and use them for your analysis. + +#### 2024: The Galaxy Training Academy + +If you are reading this page before the 11th of October 2024, you might want to participate in the [Galaxy Training Academy](https://training.galaxyproject.org/training-material/events/galaxy-academy-2024.html). This is a global Galaxy training event and a great way to start your journey with Galaxy! + +#### Advanced collection building with Rules + +Remember that collections are built using a menu at the top of the history panel (this is mentioned in this [Frequently Asked Question](https://training.galaxyproject.org/training-material/faqs/galaxy/collections_build_list.html)). That menu also has an option called _Build Collection from Rules_. + +If you completed the advanced Ruled Based Uploader tool you'll have seen reference to the JSON Rule Definitions. Together with the _Build Collection from Rules_ option this allows saved "formulas" to be used to organise datasets into rules. + +As a hint, if you have select a set of input datasets where each dataset is named like `sample1_1.fastq.gz` or `sample1_2.fastq.gz`, this rule definition can quickly organise your datasets into a list of pairs: + +```json +{ + "rules": [ + { + "type": "add_column_metadata", + "value": "hid" + }, + { + "type": "add_column_metadata", + "value": "name" + }, + { + "type": "add_column_regex", + "target_column": 1, + "expression": "(.*)_[12].fastq.gz", + "group_count": 1, + "replacement": null + }, + { + "type": "add_column_regex", + "target_column": 1, + "expression": ".*_([12]).fastq.gz", + "group_count": 1, + "replacement": null + }, + { + "type": "add_column_regex", + "target_column": 2, + "expression": "(.*)", + "replacement": "#\\1", + "group_count": null + } + ], + "mapping": [ + { + "type": "list_identifiers", + "columns": [ + 2 + ], + "editing": false + }, + { + "type": "paired_identifier", + "columns": [ + 3 + ] + }, + { + "type": "tags", + "columns": [ + 4 + ], + "editing": false + } + ] +} +``` + diff --git a/modules/kleborate/_posts/2024-09-01-kleborate.md b/modules/kleborate/_posts/2024-09-01-kleborate.md new file mode 100644 index 0000000..0140d81 --- /dev/null +++ b/modules/kleborate/_posts/2024-09-01-kleborate.md @@ -0,0 +1,10 @@ +--- +title: Introduction +published: true +--- + +## Introduction + + +Write your course content here. +
Here are some markdown [examples](https://course-in-a-box.p2pu.org/modules/content/markdown-and-media/) diff --git a/serve.sh b/serve.sh index 2cbc8bf..d289650 100755 --- a/serve.sh +++ b/serve.sh @@ -5,4 +5,4 @@ docker run -i -t --rm -u $(id -u):$(id -g) \ -v $(pwd)/.bundler/:/opt/bundler \ -e BUNDLE_PATH=/opt/bundler \ -w /opt/app ruby:2.7 bash \ - -c "bundle install && bundle exec jekyll serve --watch -H 0.0.0.0" + -c "bundle install && bundle exec jekyll serve --watch -H 0.0.0.0 --incremental"