Pipeline for Automated Read ANalysis Of iCLIP Data
PARANOiD is a versatile software for the fully automated analysis of iCLIP and iCLIP2 data. It contains all steps necessary for preprocessing, the determination of cross-link locations and several additional steps which can be used to detect specific characteristics, e.g. definite distances between cross-link events or binding motifs. The cross-link sites are presented as WIG files that can be easily visualized e.g. using IGV, for which a config file is offered. Additionally, results are offered as statistical plots for a quick overview and as standardized bioinformatics file formats or TSV files which can be used for further analysis steps.
Basic usage
Inputs
Parameters
Additional analyses
Outputs
nextflow PARANOiD.nf --reads \<reads.fastq\> --reference \<reference_sequence.fasta\> --barcodes \<barcodes.tsv\>
Reads generated by iCLIP experiments. Can be provided as one or more files. If providing more than one file, regular expressions can be used within quotation marks.
Format: FASTQ
--reads reads_file.fastq
--reads "reads_{1,2}.fastq"
--reads "*.fastq"
File containing the reference to which the reads will be mapped.
Format: FASTA
--reference reference_file.fasta
Barcode sequences are used to assign reads to their experiment. The file is provided as TSV-file (tab separated value).
The first consists of the experiment name and the second of the nucleotide sequence representing the barcode of the experiment.
One experiment is described per lane and the columns are divided by a tab.
The experiment name should be named as follows:
<experiment_name>_rep_<replicate-number>
Example:
experiment1_rep_1 GCATTG
experiment1_rep_2 CAGTAA
experiment1_rep_3 GGCCTA
experiment2_rep_1 AATCCG
experiment2_rep_2 CCGTTA
experiment2_rep_3 GTCATT
--barcodes barcode_file.tsv
File containing annotations of the reference provided. Advised when working with splicing capable organisms. Necessary for RNA subtype analysis.
Formats: GFF GTF
--annotation annotation_file.gff
A string that allows to adapt to other barcode patterns (default is iCLIP2). N represent the random barcodes and X represent the experimental barcode.
Default: NNNNNXXXXXXNNNN
Example for iCLIP:
--barcode_pattern NNXXXXNNN
Default:
--barcode_pattern NNNNNXXXXXXNNNN
Enables the use of a splicing capable mapping tool (STAR) if necessary.
Options:
pro -> Bowtie2 for splicing incapable organisms or spliced transcripts
eu -> STAR for splicing capable organisms
Default: pro
--domain eu
Default:
--domain pro
Path to output directory. Allows to save outputs to another location.
Default: ./output
--output /path/to/output
Default:
--output ./output
Minimum length for reads to retain after adapter trimming. All reads that are cut shorter during this step are removed.
Default: 30
--min_length 30
Minimum quality of bases necessary to retain them. Bases below that quality are cut of. Furthermore, reads with a certain percentage of bases below that quality are completely removed (see --min_percent_qual_filter). The value is based on the Phred score:
Quality score | Error | Accuracy |
---|---|---|
10 | 10% | 90% |
20 | 1% | 99% |
30 | 0.1% | 99.9% |
40 | 0.01% | 99.99% |
For more information click here
Default: 20
--min_qual 20
Minimum percent of bases above the stated quality (see --min_qual) necessary to retain a read after quality filtering.
Default: 90
--min_percent_qual_filter 90
Allowed number of mismatches in experimental barcode sequence to assign reads to experiments. This gives the possibility to still assign reads when a sequencing error occurs in the barcode sequence.
Default: 1
--barcode_mismatches 1
--mapq 2
--split_fastq_by 1000000
--map_to_transcripts
--number_top_transcripts 10
--mapq 2
--peak_calling
--merge_replicates
--rna_subtypes 3_prime_UTR,transcript,5_prime_UTR
--gene_id ID
--color_barplot #69b3a2
--peak_distance
Shared with sequence extraction
--percentile 90
--distance 50
--sequence_extraction
Shared with sequence extraction
--percentile 90
--seq_len 20
--sequence_format_txt
TODO: document streme parameters params.max_motif_num = 50 // INT max number of motifs to search for params.min_motif_width = 8 // INT minimum motif width to report, >=3 params.max_motif_width = 15 // INT maximum motif width to report, <= 30