Skip to content

Latest commit

 

History

History
88 lines (68 loc) · 2.24 KB

README.md

File metadata and controls

88 lines (68 loc) · 2.24 KB

HECIL Workflow Example

This workflow gives an example of using Makeflow to parallelize the Hybrid Error Correction with Iterative Learning (HECIL) tool.

Citation: "HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning", Olivia Choudhury, Ankush Chakrabarty, and Scott Emrich bioRxiv preprint, 2017. https://doi.org/10.1101/162917

The conversion of HECIL into a workflow was accomplished by Connor Howington as part of a summer REU project at Notre Dame.

Installation and Use

Before beginning, make sure you have already cloned the makeflow-examples repository and are in the hecil directory:

git clone https://github.com/cooperative-computing-lab/makeflow-examples
cd makeflow-examples/hecil

First, build the bwa binary for your architecture:

git clone https://github.com/lh3/bwa bwa-src
cd bwa-src
make
cp bwa ..
cd ..

git clone https://github.com/samtools/htslib htslib --branch 1.15.1
cd htslib
git submodule update --init --recursive
autoreconf
make
cd ..

git clone https://github.com/samtools/samtools samtools-src --branch 1.15.1
cd samtools-src
autoreconf
make
cp samtools ..
cd ..

If you do not have real data to work with, then generate some simulated data, which will result in a ~15 minute workflow on a single machine:

./fastq_generate.pl 100000 1000 > ref.fastq
./fastq_generate.pl 100000 100 ref.fastq > query.fastq

The long read file needs to be in fasta format, so you'll need to convert it:

./convert_fastq.py ref.fastq > ref.fasta

Then, generate a workflow to process the data:

./make_hecil_workflow -l ref.fasta -s query.fastq -len 100 -p 100 -ps 2 -rs 1000

Finally, execute the workflow using makeflow locally or using a batch system like Condor, SGE, or Work Queue:

makeflow hecil.mf
makeflow -T condor hecil.mf
makeflow -T sge hecil.mf
makeflow -T wq hecil.mf

Alternatively, it can be run using the JX or JSON representation

makeflow --jx hecil.jx
makeflow --json hecil.json

corr.out (default) contains only the corrected long reads. Corrected_ref.fasta contains all reads, with the corrected reads replacing the old reads (order is not conserved from input fasta file).