-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Jessica Mattick edited this page Apr 30, 2020
·
26 revisions
Welcome to the RNA_Editing_Detection_Pipeline wiki!
- Create a tab-delimited file containing the urls to all required reference data keeping the first column identical to the example.
Example reference_data.txt
:
genome ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz
genome_annotation ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/gencode.v30lift37.annotation.gtf.gz
strand_detection https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/hg19_RefSeq.bed.gz
rmsk http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz
dbSNP http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp151.txt.gz
rediportal_db http://srv00.recas.ba.infn.it/webshare/rediportalDownload/table1_full.txt.gz
- Run
get_ref_data_annotation.py
to download all required data into specified directory and generate annotation files.
Parameters:
-
-i
or--input
: path to tab-delimited file containing data urls -
-o
or--output
: path to output directory
Example:
nohup python3 get_ref_data_annotation.py -i reference_data.txt -o output_path &
Some reference data may need to be reformatted. This can be done following the instructions in box 7 of Lo Guidice et al. This will only need to be done once per genome release. Formatted reference data is provided in the test dataset.
- Run
index_genome_STAR.py
to index the genome for STAR.
Parameters:
-
-f
or--fasta
: path to genome fasta file -
-a
or--gtf_annotation
: path to genome gtf annotation -
-o
or--output
: path to output directory
Example:
nohup python3 index_genome_STAR.py -f genome.fa -a annotation.gtf -o index_output/ &
- Create a txt file containing a list of SRA accession numbers.
- Run
get_SRA_data.py
to download data
Parameters:
-
-a
or--acc_list
: path to file containing list of SRA accession numbers -
-o
or--output
: path to output directory
Example:
nohup python3 get_SRA_data.py -a acc.txt -o output_path &
- Run
fastqc.py
to quality check the sequencing reads
Parameters:
-
-se
or--single_end
: include at beginning of parameters if data is single end -
-f
or--fastq_dir
: path to fastq directory -
-o
or--output
: path to output directory
Example:
PE data
nohup python3 fastqc.py -f fastq_dir -o output_dir &
SE data
nohup python3 fastp.py -se -f fastq_dir -o output_dir &
- Run
fastp.py
to trim RNAseq Reads
Parameters:
-
-se
or--single_end
: include at beginning of parameters if data is single end -
-f
or--fastq_dir
: path to fastq directory -
-o
or--output
: path to output directory
Example:
PE data
nohup python3 fastp.py -f fastq_dir -o output_dir &
SE data
nohup python3 fastp.py -se -f fastq_dir -o output_dir &
- Make sure genome has been indexed for STAR
- Run
align_STAR.py
to align paired-end data to the genome
Parameters:
-
-f
or--fastq_dir
: path to directory containing fastq files -
-g
or--genome_idx
: path to STAR genome index -
-o
or--output
: path to output directory
Example:
nohup python align_STAR.py -f fastq_dir -g genome_index -o output_dir &
- Run
infer_strand_direction.py
Parameters:
-
-d
or--bam_dir
: path to directory containing bams -
-r
or--ref_seq_bed
: path to refseq bed file
Example:
nohup python3 infer_strand_direction.py -d bam_dir -r ref_seq_bed &
- Create a text file containing a list of ERR accession numbers.
- Run
get_WGS_data.py
to download data
Parameters:
-
-a
or--acc_list
: path to file containing a list of ERR accession numbers -
-o
or--output
: path to output directory
Example:
nohup python3 get_WGS_data.py -a acc.txt -o output_path &
- Run
index_genome_bwa.py
to index the genome for BWA.
Parameters:
-
-f
or--fasta_dir
: path to genome fasta file
Example:
nohup python3 index_genome_bwa.py -f fasta_dir &
- Run
align_bwa.py
to align paired-end data to the genome
Parameters:
-
-fq
or--fastq_dir
: path to directory containing fastq files -
-fa
or--fasta_dir
: path to directory containing genome fasta file
Example:
nohup python3 align_bwa.py -fq fastq_dir -fa fasta_dir &
- Run
select_map_chr.py
to select and map reads to a specific chromosome
Parameters:
-
-g
or--genome_dir
: path to directory containing the genome .fai file -
-f
or--fastq_dir
: path to directory containing the WGS fastq file and also sam file -
-o
or--output_dir
: path to directory store the output files -
-chr
or--chrNum
: select the chromosome number as 'chr[Int]' (e.g. -chr chr21)
Example:
nohup python3 select_map_chr.py -g genome_dir -f fastq_dir -o output_dir -chr chrNum &