Skip to content

Commit

Permalink
Add Galaxy to Additional Resources
Browse files Browse the repository at this point in the history
  • Loading branch information
pvanheus committed Sep 20, 2024
1 parent 19f2a60 commit 6560d3f
Show file tree
Hide file tree
Showing 4 changed files with 121 additions and 2 deletions.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
Title: Additional resources
title: Additional resources
---
<br>

Expand Down
109 changes: 109 additions & 0 deletions modules/additional-resources/_posts/2024-09-15-galaxy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
title: Galaxy for web-based pathogen data analysis
---
### What is Galaxy

[Galaxy](https://galaxyproject.org) is a free, open-source system for analyzing data, authoring workflows and more. The Galaxy server environment can be installed and run locally but for most uses the free service provided by public servers like <https://usegalaxy.eu>, <https://usegalaxy.org> and <https://usegalaxy.org.au>. If you or your institution wants to run a local Galaxy server, consult the [Galaxy for sysadmins](https://training.galaxyproject.org/training-material/topics/admin/index.html) training resources.

Training materials for Galaxy are available on the [Galaxy Training Network](https://training.galaxyproject.org/) portal.

Galaxy can be used to run individual tools like `fastp`, `FastQC` and `snippy` and in fact all of these tools are available on the public servers mentioned above. To be very productive with Galaxy, however, it is important to learn about two main concepts: collections and workflows.

#### Sharing a history

One of the most useful features of Galaxy is that you can share your history with other Galaxy users if they are using the same server. This lets you do data analysis together with colleagues and is also useful if you encounter an error and want to share your history so that you can get advice. Read this guide on [Sharing your history](https://training.galaxyproject.org/training-material/faqs/galaxy/histories_sharing.html).

#### Collections

In Galaxy, data is stored datasets that are in histories. Each history can be thought of as a folder and it is best to use a single history for a single analysis. In fact, once primary data (e.g. reads and reference genomes) is well organised, it often is a good idea to make a copy of the history with the primary data so that if mistakes are made in an analysis it is possible to return to the clean organisation of the primary data. A copy of a history does not use additional disk space as each dataset is stored only once even if referred to multiple times.

Within a history, datasets can be organised into collections. Collections are groups of related datasets, for example pairs for paired-end read data, lists for groups of datasets, such as a list of datasets from single-end read technologies like Oxford Nanopore and lists of pairs for collections of paired-end samples. Learn more about collections in the [collections tutorial](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/collections/tutorial.html).

In general, when run with a collection as an input, Galaxy will run a tool several times, once for every element of the collection. For example, if reads are organised into a list of pairs, `fastp` can be run on the entire list and it will run once for each sample (i.e. each pair of reads).

To organise datasets into collections it is worth learning about the [Rule Based Uploader](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/upload-rules/tutorial.html). This is a Galaxy feature that can rapidly re-organise your datasets into collections. There is also a second, more [advanced tutorial on the Rule Based Uploader](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/upload-rules-advanced/tutorial.html).

#### Workflows

Galaxy tools can be combined into workflows. These workflows can provide a complete and sometimes complex recipe for processing data. In the tutorial [A short introduction to Galaxy](https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html) there is an introduction to running a workflow.

At a more advanced level, the [M. tuberculosis variant analysis](https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.html#processing-many-samples-at-once-collections-and-workflows-optional) tutorial also includes a final step using workflows, as shown in this [video](https://youtu.be/-nJPngFk36c?si=uuP7GMiVGxctrMVv&t=3558) which corresponds to [this section of the tutorial](https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.html#processing-many-samples-at-once-collections-and-workflows-optional).

For your use, here are two workflows on the <https://usegalaxy.eu> server:

1. [Bacterial Genome Assembly](https://usegalaxy.eu/u/pvanheus/w/bacterial-genome-assembly) which corresponds to the main part of the default Bactopia workflow.

2. [Bacterial Variant Analysis](https://usegalaxy.eu/u/pvanheus/w/bacteria-genome-variant-analysis) which is a snippy-based workflow for reference based variant analysis and phylogeny building.

You can import these and use them for your analysis.

#### 2024: The Galaxy Training Academy

If you are reading this page before the 11th of October 2024, you might want to participate in the [Galaxy Training Academy](https://training.galaxyproject.org/training-material/events/galaxy-academy-2024.html). This is a global Galaxy training event and a great way to start your journey with Galaxy!

#### Advanced collection building with Rules

Remember that collections are built using a menu at the top of the history panel (this is mentioned in this [Frequently Asked Question](https://training.galaxyproject.org/training-material/faqs/galaxy/collections_build_list.html)). That menu also has an option called _Build Collection from Rules_.

If you completed the advanced Ruled Based Uploader tool you'll have seen reference to the JSON Rule Definitions. Together with the _Build Collection from Rules_ option this allows saved "formulas" to be used to organise datasets into rules.

As a hint, if you have select a set of input datasets where each dataset is named like `sample1_1.fastq.gz` or `sample1_2.fastq.gz`, this rule definition can quickly organise your datasets into a list of pairs:

```json
{
"rules": [
{
"type": "add_column_metadata",
"value": "hid"
},
{
"type": "add_column_metadata",
"value": "name"
},
{
"type": "add_column_regex",
"target_column": 1,
"expression": "(.*)_[12].fastq.gz",
"group_count": 1,
"replacement": null
},
{
"type": "add_column_regex",
"target_column": 1,
"expression": ".*_([12]).fastq.gz",
"group_count": 1,
"replacement": null
},
{
"type": "add_column_regex",
"target_column": 2,
"expression": "(.*)",
"replacement": "#\\1",
"group_count": null
}
],
"mapping": [
{
"type": "list_identifiers",
"columns": [
2
],
"editing": false
},
{
"type": "paired_identifier",
"columns": [
3
]
},
{
"type": "tags",
"columns": [
4
],
"editing": false
}
]
}
```

10 changes: 10 additions & 0 deletions modules/kleborate/_posts/2024-09-01-kleborate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Introduction
published: true
---

## Introduction


Write your course content here.
<br> Here are some markdown [examples](https://course-in-a-box.p2pu.org/modules/content/markdown-and-media/)
2 changes: 1 addition & 1 deletion serve.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ docker run -i -t --rm -u $(id -u):$(id -g) \
-v $(pwd)/.bundler/:/opt/bundler \
-e BUNDLE_PATH=/opt/bundler \
-w /opt/app ruby:2.7 bash \
-c "bundle install && bundle exec jekyll serve --watch -H 0.0.0.0"
-c "bundle install && bundle exec jekyll serve --watch -H 0.0.0.0 --incremental"

0 comments on commit 6560d3f

Please sign in to comment.