Rust toolset for working with readfish data/outputs
First, ensure you have Rust installed. You can install Rust using rustup.
# Clone the repository
git clone https://github.com/LooseLab/rftools.git
cd rftools
# Build the project
cargo build -r
# Run tests
cargo test
This will create an executable, located at target/release/rftools
. This can be moved to a directory on your PATH
.
The Rust documentation can be built using the cargo doc
command. This command will generate HTML documentation for all dependencies and your own crate in the target/doc
directory.
# Generate documentation
cargo doc
Once the documentation is generated, you can open it in your web browser by navigating to the generated HTML files. You can find the entry point to your crate's documentation in target/doc/{crate_name}/index.html
.
Alternatively, you can use the --open
flag with cargo doc
to automatically open the documentation in your default web browser:
# Generate documentation and open it in the browser
cargo doc --open
This will build the documentation and open it in your default web browser automatically.
Here's how you can use the project:
rftools --help
rftools 0.1.0
Helper tools for after running readfish
USAGE:
rftools <SUBCOMMAND>
OPTIONS:
-h, --help Print help information
-V, --version Print version information
SUBCOMMANDS:
help Print this message or the help of the given subcommand(s)
split-bam Split BAM files into sequenced and unblocked
split-fq Split FASTQ into sequenced and unblocked
split-ss Split Sequenecing summary into sequenced and unblocked
Warning
These commands will read the whole of unblocked_read_ids.txt
into memory!
Takes in a unblocked_read_ids.txt file, and FASTQ(s). Splits into sequenced and optionally unblocked.
rftools split-fq --help
rftools-split-fq 0.1.0
Split FASTQ into sequenced and unblocked
USAGE:
rftools split-fq [OPTIONS] <UNBLOCKED_READ_IDS> [INPUT_FASTQ]...
ARGS:
<UNBLOCKED_READ_IDS> Unblocked read ids from readfish
<INPUT_FASTQ>... Input FASTQ files from MinKNOW
OPTIONS:
-a, --write-unblocked Write rejected reads as well (default is false)
-h, --help Print help information
-p, --prefix <PREFIX> Output file prefix [default: ]
-V, --version Print version information
Example:
# Splitting FASTQ files, writing out both sequenced and unblocked
rftools split-fq --write-unblocked --prefix example_split unblocked_read_ids.txt input.fq
rftools split-ss --help
rftools-split-ss 0.1.0
Split Sequenecing summary into sequenced and unblocked
USAGE:
rftools split-ss [OPTIONS] --unblocked-read-ids <UNBLOCKED_READ_IDS> --sequencing-summary <SEQUENCING_SUMMARY>
OPTIONS:
-h, --help Print help information
-p, --prefix <PREFIX> Output file prefix [default: ]
-s, --sequencing-summary <SEQUENCING_SUMMARY> sequencing_summary.txt file from MinKNOW
-u, --unblocked-read-ids <UNBLOCKED_READ_IDS> Unblocked read ids from readfish
-V, --version Print version information
Example:
# Splitting Sequencing Summary file, only writing out sequenced
rftools split-ss --prefix output unblocked_read_ids.txt sequencing_summary.txt
rftools split-bam --help
rftools-split-bam 0.1.0
Split BAM files into sequenced and unblocked
USAGE:
rftools split-bam [OPTIONS] --unblocked-read-ids <UNBLOCKED_READ_IDS> --bam-file <BAM_FILE>
OPTIONS:
-b, --bam-file <BAM_FILE>
Bam file containing reads to be split
-c, --compression <COMPRESSION>
Compression type for FASTX output [default: gzipped] [possible values: gzipped,
bgzipped, uncompressed]
--emit-type <EMIT_TYPE>
Write out FASTQ rather than a BAM [default: bam] [possible values: bam, fastq, fasta]
-h, --help
Print help information
-l, --length-thresh <LENGTH_THRESH>
minimum length threshold - If set reads shorter than this threshold will be filtered
out. [default: 0]
-p, --prefix <PREFIX>
Output file prefix [default: ]
-q, --qual-thresh <QUAL_THRESH>
Average read quality threshold. If set, reads below this threshold will be filtered out
-s, --split-type <SPLIT_TYPE>
Write only sequenced reads, unblocked reads, or both. Default is sequenced only
[default: sequenced-only] [possible values: all, unblocked-only, sequenced-only]
-u, --unblocked-read-ids <UNBLOCKED_READ_IDS>
Unblocked read ids file from readfish
-V, --version
Print version information
Examples:
# Splitting BAM file, writing out both unblocked and sequenced records into sequenced and unblocked bam files, filtering to Q score > 20.
rftools split-bam --unblocked-read-ids ids.txt --bam-file input.bam --split-type all --qual-thresh 20
# Splitting BAM file, writing out only unblocked records into a FASTQ file.
rftools split-bam --unblocked-read-ids ids.txt --bam-file input.bam --split-type unblocked-only --emit-type fastq
By default all Fastx output is gzipped to the current systems default level. This can be disabled by setting
-c uncompressed
We welcome contributions! If you'd like to contribute to this project, please follow these guidelines:
- Fork the repository.
- Create a new branch (
git checkout -b feature-or-fix
). - Make your changes.
- Ensure tests pass (
cargo test
). - Commit your changes (
git commit -am 'Add new feature'
). - Push to the branch (
git push origin feature-or-fix
). - Create a new Pull Request.
Please make sure to update tests and documentation as appropriate.
This project is licensed under the MIT License - see the LICENSE file for details.