-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4 from SANBI-SA/add_galaxy
Add Galaxy to Additional Resources
- Loading branch information
Showing
4 changed files
with
121 additions
and
2 deletions.
There are no files selected for viewing
2 changes: 1 addition & 1 deletion
2
modules/additional-resources/_posts/2024-09-14-additional-resources.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
Title: Additional resources | ||
title: Additional resources | ||
--- | ||
<br> | ||
|
||
|
109 changes: 109 additions & 0 deletions
109
modules/additional-resources/_posts/2024-09-15-galaxy.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
--- | ||
title: Galaxy for web-based pathogen data analysis | ||
--- | ||
### What is Galaxy | ||
|
||
[Galaxy](https://galaxyproject.org) is a free, open-source system for analyzing data, authoring workflows and more. The Galaxy server environment can be installed and run locally but for most uses the free service provided by public servers like <https://usegalaxy.eu>, <https://usegalaxy.org> and <https://usegalaxy.org.au>. If you or your institution wants to run a local Galaxy server, consult the [Galaxy for sysadmins](https://training.galaxyproject.org/training-material/topics/admin/index.html) training resources. | ||
|
||
Training materials for Galaxy are available on the [Galaxy Training Network](https://training.galaxyproject.org/) portal. | ||
|
||
Galaxy can be used to run individual tools like `fastp`, `FastQC` and `snippy` and in fact all of these tools are available on the public servers mentioned above. To be very productive with Galaxy, however, it is important to learn about two main concepts: collections and workflows. | ||
|
||
#### Sharing a history | ||
|
||
One of the most useful features of Galaxy is that you can share your history with other Galaxy users if they are using the same server. This lets you do data analysis together with colleagues and is also useful if you encounter an error and want to share your history so that you can get advice. Read this guide on [Sharing your history](https://training.galaxyproject.org/training-material/faqs/galaxy/histories_sharing.html). | ||
|
||
#### Collections | ||
|
||
In Galaxy, data is stored datasets that are in histories. Each history can be thought of as a folder and it is best to use a single history for a single analysis. In fact, once primary data (e.g. reads and reference genomes) is well organised, it often is a good idea to make a copy of the history with the primary data so that if mistakes are made in an analysis it is possible to return to the clean organisation of the primary data. A copy of a history does not use additional disk space as each dataset is stored only once even if referred to multiple times. | ||
|
||
Within a history, datasets can be organised into collections. Collections are groups of related datasets, for example pairs for paired-end read data, lists for groups of datasets, such as a list of datasets from single-end read technologies like Oxford Nanopore and lists of pairs for collections of paired-end samples. Learn more about collections in the [collections tutorial](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/collections/tutorial.html). | ||
|
||
In general, when run with a collection as an input, Galaxy will run a tool several times, once for every element of the collection. For example, if reads are organised into a list of pairs, `fastp` can be run on the entire list and it will run once for each sample (i.e. each pair of reads). | ||
|
||
To organise datasets into collections it is worth learning about the [Rule Based Uploader](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/upload-rules/tutorial.html). This is a Galaxy feature that can rapidly re-organise your datasets into collections. There is also a second, more [advanced tutorial on the Rule Based Uploader](https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/upload-rules-advanced/tutorial.html). | ||
|
||
#### Workflows | ||
|
||
Galaxy tools can be combined into workflows. These workflows can provide a complete and sometimes complex recipe for processing data. In the tutorial [A short introduction to Galaxy](https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html) there is an introduction to running a workflow. | ||
|
||
At a more advanced level, the [M. tuberculosis variant analysis](https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.html#processing-many-samples-at-once-collections-and-workflows-optional) tutorial also includes a final step using workflows, as shown in this [video](https://youtu.be/-nJPngFk36c?si=uuP7GMiVGxctrMVv&t=3558) which corresponds to [this section of the tutorial](https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.html#processing-many-samples-at-once-collections-and-workflows-optional). | ||
|
||
For your use, here are two workflows on the <https://usegalaxy.eu> server: | ||
|
||
1. [Bacterial Genome Assembly](https://usegalaxy.eu/u/pvanheus/w/bacterial-genome-assembly) which corresponds to the main part of the default Bactopia workflow. | ||
|
||
2. [Bacterial Variant Analysis](https://usegalaxy.eu/u/pvanheus/w/bacteria-genome-variant-analysis) which is a snippy-based workflow for reference based variant analysis and phylogeny building. | ||
|
||
You can import these and use them for your analysis. | ||
|
||
#### 2024: The Galaxy Training Academy | ||
|
||
If you are reading this page before the 11th of October 2024, you might want to participate in the [Galaxy Training Academy](https://training.galaxyproject.org/training-material/events/galaxy-academy-2024.html). This is a global Galaxy training event and a great way to start your journey with Galaxy! | ||
|
||
#### Advanced collection building with Rules | ||
|
||
Remember that collections are built using a menu at the top of the history panel (this is mentioned in this [Frequently Asked Question](https://training.galaxyproject.org/training-material/faqs/galaxy/collections_build_list.html)). That menu also has an option called _Build Collection from Rules_. | ||
|
||
If you completed the advanced Ruled Based Uploader tool you'll have seen reference to the JSON Rule Definitions. Together with the _Build Collection from Rules_ option this allows saved "formulas" to be used to organise datasets into rules. | ||
|
||
As a hint, if you have select a set of input datasets where each dataset is named like `sample1_1.fastq.gz` or `sample1_2.fastq.gz`, this rule definition can quickly organise your datasets into a list of pairs: | ||
|
||
```json | ||
{ | ||
"rules": [ | ||
{ | ||
"type": "add_column_metadata", | ||
"value": "hid" | ||
}, | ||
{ | ||
"type": "add_column_metadata", | ||
"value": "name" | ||
}, | ||
{ | ||
"type": "add_column_regex", | ||
"target_column": 1, | ||
"expression": "(.*)_[12].fastq.gz", | ||
"group_count": 1, | ||
"replacement": null | ||
}, | ||
{ | ||
"type": "add_column_regex", | ||
"target_column": 1, | ||
"expression": ".*_([12]).fastq.gz", | ||
"group_count": 1, | ||
"replacement": null | ||
}, | ||
{ | ||
"type": "add_column_regex", | ||
"target_column": 2, | ||
"expression": "(.*)", | ||
"replacement": "#\\1", | ||
"group_count": null | ||
} | ||
], | ||
"mapping": [ | ||
{ | ||
"type": "list_identifiers", | ||
"columns": [ | ||
2 | ||
], | ||
"editing": false | ||
}, | ||
{ | ||
"type": "paired_identifier", | ||
"columns": [ | ||
3 | ||
] | ||
}, | ||
{ | ||
"type": "tags", | ||
"columns": [ | ||
4 | ||
], | ||
"editing": false | ||
} | ||
] | ||
} | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Introduction | ||
published: true | ||
--- | ||
|
||
## Introduction | ||
|
||
|
||
Write your course content here. | ||
<br> Here are some markdown [examples](https://course-in-a-box.p2pu.org/modules/content/markdown-and-media/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters