Rust implementation of NanoFilt+NanoLyse, both originally written in Python. This tool, intended for long read sequencing such as PacBio or ONT, filters and trims a fastq file.
Filtering is done on average read quality and minimal or maximal read length, and applying a headcrop (start of read) and tailcrop (end of read) while printing the reads passing the filter.
Compared to the Python implementation the scope is to deliver the same results, almost the same functionality, at much faster execution times. At the moment this tool does not support filtering using a sequencing_summary file or filtering on GC content. If those features are of interest then please reach out.
Preferably, for most users, download a ready-to-use binary for your system to add directory on your $PATH from the releases.
You may have to change the file permissions to execute it with chmod +x chopper
Alternatively, use conda to install
conda install -c bioconda chopper
Reads on stdin and writes to stdout.
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
--headcrop Trim N nucleotides from the start of a read [default: 0]
--maxlength Sets a maximum read length [default: 2147483647]
-l, --minlength Sets a minimum read length [default: 1]
-q, --quality Sets a minimum Phred average quality score [default: 0]
--tailcrop Trim N nucleotides from the end of a read [default: 0]
--threads Number of parallel threads to use [default: 4]
--contam Fasta file with reference to check potential contaminants against [default None]
EXAMPLE:
gunzip -c reads.fastq.gz | chopper -q 10 -l 500 | gzip > filtered_reads.fastq.gz
If you use this tool, please consider citing our publication.