updated dryad links and added extra badges

TurakhiaLab · Jul 23, 2024 · 723f604 · 723f604
1 parent 77239bb
commit 723f604
Show file tree

Hide file tree

Showing 2 changed files with 57 additions and 22 deletions.
diff --git a/README.md b/README.md
@@ -10,6 +10,8 @@
 [<img src="https://img.shields.io/badge/Made with-Snakemake-brightgreen.svg?logo=snakemake">](https://snakemake.readthedocs.io/en/v7.19.1/index.html)
 [<img src="https://img.shields.io/badge/Install with-DockerHub-informational.svg?logo=Docker">](https://hub.docker.com/r/ang037/roadies)
 [<img src="https://img.shields.io/badge/Submitted to-bioRxiv-critical.svg?logo=LOGO">](https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1)
+[<img src="https://img.shields.io/badge/DOI-10.5061/dryad.tht76hf73-brightgreen.svg?logo=LOGO">](https://doi.org/10.5061/dryad.tht76hf73)
+[<img src="https://img.shields.io/badge/Watch it on-Youtube-FF0000.svg?logo=YouTube">](https://youtu.be/1sR741TvZnM?si=vVNAnonvzNEzrLKq)
 
 <div align="center">
 
@@ -27,9 +29,10 @@
     - [Using Installation Script](#script)
 - [Quick Start](#start)
 - [Run ROADIES with your own datasets](#runpipeline)
-- [Contributions and Support](#support)
 - [Citing ROADIES](#citation)
 
+<br>
+
 ## <a name="overview"></a> Introduction
 
 Welcome to the official repository of ROADIES, a novel pipeline designed for phylogenetic tree inference of the species directly from their raw genomic assemblies. ROADIES offers a fully automated, easy-to-use, scalable solution, eliminating any error-prone manual steps and providing unique flexibility in adjusting the tradeoff between accuracy and runtime. 
@@ -47,7 +50,7 @@ Welcome to the official repository of ROADIES, a novel pipeline designed for phy
 
 </div>
 
-
+<br>
 
 ## <a name="usage"></a> Quick Install
 
@@ -82,7 +85,7 @@ docker build -t roadies_image .
 docker run -it roadies_image
 ```
 
-### <a name="script"></a> Using installation script
+### <a name="script"></a> Using installation script (requires sudo access)
 
 First clone the repository:
 
@@ -119,28 +122,38 @@ sudo apt-get install -y wget unzip make g++ python3 python3-pip python3-setuptoo
 
 **Note:** If you encounter issues with the Boost library, add its path to `$CPLUS_LIBRARY_PATH` and save it in `~/.bashrc`.
 
+<br>
+
 ## <a name="start"></a> Quick Start
 
 Once setup is done, you can run the ROADIES pipeline using the provided test dataset. Follow these steps for a 16-core machine:
 
-1. Create a directory for the test data and download the test datasets:
+1. Go to ROADIES repository directory if not there:
+
+```
+cd ROADIES
+```
+
+2. Create a directory for the test data and download the test datasets (using the following one line command):
 
 ```
 mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'
 ```
-2. Run the ROADIES pipeline:
+3. Run the pipeline with the following command (from ROADIES directory):
 
 ```
 python run_roadies.py --cores 16
 ```
 
-The first command will download the 11 Drosophila genomic datasets (links provided in `test/input_genome_links.txt`) and save them in the `test/test_data` directory. The second command will run ROADIES for those 11 Drosophila genomes and save the final newick tree as `roadies.nwk` in a separate `ROADIES/output_files` folder upon completion.
+The second command will download the 11 Drosophila genomic datasets (links provided in `test/input_genome_links.txt`) and save them in the `test/test_data` directory. The third command will run ROADIES pipeline for those 11 Drosophila genomes and save the final newick tree as `roadies.nwk` in a separate `output_files` folder upon completion.
+
+<br>
 
 ## <a name="runpipeline"></a> Run ROADIES with your own datasets
 
 To run ROADIES with your own datasets, follow these steps:
 
-1. **Specify Input Genomic Dataset**: Update the `config.yaml` file to include the path to your input datasets under the `GENOMES` parameter. Ensure all input genomic assemblies are in `.fa` or `.fa.gz` format and named according to the species' name (e.g., `Aardvark.fa`). 
+1. **Specify Input Genomic Dataset**: Update the `config.yaml` file (found in the ROADIES directory - `config` folder) to include the path to your input datasets under the `GENOMES` parameter. Ensure all input genomic assemblies are in `.fa` or `.fa.gz` format and named according to the species' name (e.g., `Aardvark.fa`). 
 
 **Note**: Each file should contain the genome assembly of one unique species. If a file contains multiple species, split it into individual genome files (`fasplit` can be used: `faSplit byname <input_dir> <output_dir>`).
 
@@ -165,18 +178,18 @@ python run_roadies.py --cores 16 --mode balanced
 python run_roadies.py --cores 16 --mode fast
 ```
 
-## <a name="support"></a> Contributions and Support
-
-We welcome contributions from the community. If you encounter any issues or have suggestions for improvement, please open an issue on GitHub. For general inquiries and support, reach out to our team.
+<br>
 
 ## <a name="citation"></a> Citing ROADIES
 
 If you use ROADIES in your research or publications, please cite the following paper:
 
-Gupta A, Mirarab S, Turakhia Y, (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. _bioRxiv_. https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1
+Gupta A, Mirarab S, Turakhia Y, (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. _bioRxiv_. [https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1](https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1).
 
 ### Accessing ROADIES output files
 
-The output files with the gene trees and species trees generated by ROADIES are deposited to [Dryad](https://datadryad.org/stash). To access it, please refer to [this](https://datadryad.org/stash/share/Pbbmp5I6AEmJmOHRvNld7FBT2ext-DEemyajkqUQfX0) link (Note: the dataset submission is undergoing review and a permanent link will be posted once available).
+The output files with the gene trees and species trees generated by ROADIES in the manuscript are deposited to [Dryad](https://datadryad.org/stash). To access it, please refer to the following:
+
+Gupta, Anshu; Mirarab, Siavash; Turakhia, Yatish (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES [Dataset]. Dryad. [https://doi.org/10.5061/dryad.tht76hf73](https://doi.org/10.5061/dryad.tht76hf73).
 
 
diff --git a/docs/index.md b/docs/index.md
@@ -96,7 +96,7 @@ docker build -t roadies_image .
 docker run -it roadies_image
 ```
 
-### Using installation script
+### Using installation script (requires sudo access)
 
 First clone the repository:
 
@@ -138,22 +138,28 @@ sudo apt-get install -y wget unzip make g++ python3 python3-pip python3-setuptoo
     If you encounter issues with the Boost library, add its path to `$CPLUS_LIBRARY_PATH` and save it in `~/.bashrc`.
 
 
-### Quick start (with provided test dataset)
+## Quick start (with provided test dataset)
 
 Once setup is done, you can run the ROADIES pipeline using the provided test dataset. Follow these steps for a 16-core machine:
 
-1. Create a directory for the test data and download the test datasets:
+1. Go to ROADIES repository directory if not there:
+
+```
+cd ROADIES
+```
+
+2. Create a directory for the test data and download the test datasets (using the following one line command):
 
 ```
 mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'
 ```
-2. Run the ROADIES pipeline:
+3. Run the pipeline with the following command (from ROADIES directory):
 
 ```
 python run_roadies.py --cores 16
 ```
 
-The first command will download the 11 Drosophila genomic datasets (links provided in `test/input_genome_links.txt`) and save them in the `test/test_data` directory. The second command will run ROADIES for those 11 Drosophila genomes and save the final newick tree as `roadies.nwk` in a separate `ROADIES/output_files` folder upon completion.
+The second command will download the 11 Drosophila genomic datasets (links provided in `test/input_genome_links.txt`) and save them in the `test/test_data` directory. The third command will run ROADIES for those 11 Drosophila genomes and save the final newick tree as `roadies.nwk` in a separate `output_files` folder upon completion.
 
 **Running ROADIES with different modes of operation**: To run ROADIES in various other modes of operation (fast, balanced, accurate) (description of these modes are mentioned in [Modes of operation](index.md#modes-of-operation) section), try the following commands:
 
@@ -190,13 +196,13 @@ python run_roadies.py --cores 16 --mode fast --converge
 
 The output files for all iterations will be saved in a separate `converge_files` folder. `output_files` will save the results of the last iteration. Species tree for all iterations will be saved in `converge_files` folder with the nomenclature `iteration_<iteration_number>.nwk`.
 
-## Usage
+## Detailed Usage
 
 This section provides detailed instructions on how to configure the ROADIES pipeline further for various user requirements with your own genomic dataset. Once the required environment setup process is complete, follow the steps below.
 
 ### Step 1: Specify input genomic dataset
 
-After installing the environment, you need to get input genomic sequences for creating the species tree. To run ROADIES with your own dataset, update the `config.yaml` file to include the path to your input datasets under the `GENOMES` parameter.
+After installing the environment, you need to get input genomic sequences for creating the species tree. To run ROADIES with your own dataset, update the `config.yaml` file (found in the ROADIES directory - `config` folder) to include the path to your input datasets under the `GENOMES` parameter.
 
 !!! Note 
     All input genome assemblies in the path mentioned in `GENOMES` should be in `.fa` or `.fa.gz` format. The genome assembly files should be named according to the species' names (for example, Aardvark's genome assembly is to be named `Aardvark.fa`). Each file should contain the genome assembly of one unique species. If a file contains multiple species, split it into individual genome files (fasplit can be used for this: `faSplit byname <input_dir> <output_dir>`). Moreover, the file name should not have any special characters like `.` (apart from `_`) - for example, if the file name is `Aardvark.1.fa`, rename it to `Aardvark_1.fa`.
@@ -230,15 +236,15 @@ Adjust other parameters listed in `config.yaml` as per specific user requirement
 
 ### Step 3: Run the ROADIES pipeline
 
-Once the required installations are completed and the parameters are configured in `config/config.yaml` file, execute the following command:
+Once the required installations are completed and the parameters are configured in `config.yaml` file, execute the following command (from ROADIES repo home directory):
 
 ```
 python run_roadies.py --cores <number of cores>
 ```
 
 This will let ROADIES run in accurate mode by default with specified number of cores. After the completion of the execution, the output species tree in Newick format will be saved as `roadies.nwk` in a separate `output_files` folder.
 
-#### Command line arguments
+### Command line arguments
 
 There are multiple command line arguments through which user can change the mode of operation, specify the custom config file path, etc.
 
@@ -249,6 +255,12 @@ There are multiple command line arguments through which user can change the mode
 | `--converge` | Run ROADIES in [converge](index.md#convergence-mechanism) mode if you do not know the optimal gene count to start with |
 | `--config` | Provide optional custom YAML files (in the same format as `config.yaml` provided with this repository). If not given, by default `config/config.yaml` file will be considered.|
 
+For example:
+
+```
+python run_roadies.py --cores 16 --mode balanced --converge --config config/config.yaml
+```
+
 Use `--help` to get the list of command line arguments.
 
 ### Step 4: Analyze output files
@@ -307,8 +319,18 @@ For extensive debugging, other intermediate output files for each stage of the p
 
 We welcome contributions from the community. If you encounter any issues or have suggestions for improvement, please open an issue on GitHub. For general inquiries and support, reach out to our team.
 
+Anshu Gupta - ang037 [at] ucsd [dot] edu
+
+Yatish Turakhia - yturakhia [at] ucsd [dot] edu
+
 ## Citing ROADIES
 
 If you use ROADIES in your research or publications, please cite the following paper:
 
-Gupta A, Mirarab S, Turakhia Y, (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. _bioRxiv_. https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1
+Gupta A, Mirarab S, Turakhia Y, (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. _bioRxiv_. [https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1](https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1).
+
+### Accessing ROADIES output files
+
+The output files with the gene trees and species trees generated by ROADIES in the manuscript are deposited to [Dryad](https://datadryad.org/stash). To access it, please refer to the following:
+
+Gupta, Anshu; Mirarab, Siavash; Turakhia, Yatish (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES [Dataset]. Dryad. [https://doi.org/10.5061/dryad.tht76hf73](https://doi.org/10.5061/dryad.tht76hf73).