Running the orthology search

Overview

Covariance models

Path to directory that contains a covariance model for each reference miRNA
Filename of the model has to exactly match the identifier in the Tab separated miRNA input file
Covariance model file has to have the ending ".cm"
All other files in the directory are disregarded by ncOrtho

Reference miRNAs

Information about all reference miRNAs has to be supplied in a single Tab separated file with 7 columns:

Unique miRNA id
Contig/Chromosome id (needs to match the one in the reference GFF file!)
Start
Stop
Strand (+ or -)
pre-miRNA sequence
mature miRNA sequence (no features that use the mature sequence are as of yet implemented. Therefore this column can be filled with a placeholder like "NA" or "None")

Example:

hsa-mir-552	NC_000001.11	34669599	34669694	-	AACCAUUCAAAUAUACCACAGUUUGUUUAACCUUUUGCCUGUUGGUUGAAGAUGCCUUUCAACAGGUGACUGGUUAGACAAACUGUGGUAUAUACA	NA
hsa-mir-30e	NC_000001.11	40754355	40754446	+	GGGCAGUCUUUGCUACUGUAAACAUCCUUGACUGGAAGCUGUAAGGUGUUCAGAGGAGCUUUCAGUCGGAUGUUUACAGCGGCAGGCUGCCA	NA
hsa-mir-30c-1	NC_000001.11	40757284	40757372	+	ACCAUGCUGUAGUGUGUGUAAACAUCCUACACUCUCAGCUGUGAGCUCAAGGUGGCUGGGAGAGGGUUGUUUACUCCUUCUGCCAUGGA	NA
hsa-mir-6733	NC_000001.11	43171652	43171712	-	GUGCUUGGGAAAGACAAACUCAGAGUUCCCUUCUUGUGAGCUCAGUGUCUGGAUUUCCUAG	NA

You can retrieve this information from popular databases like miRBase.

Fasta data

Genomic sequence in FASTA format (e.g "genomic.fna" from RefSeq or "dna.toplevel.fa" from Ensembl)

Full help text

#########################################################
###                                                   ###
###   ncOrtho - ortholog search for non-coding RNAs   ###
###                                                   ###
#########################################################

usage: ncSearch [-h] -m <path> -n <path> -o <path> -q <.fa> -r <.fa> [--queryname [str]] [--cpu [int]] [--cm_cutoff [float]] [--minlength [float]] [--heuristic [True/False]]
                [--heur_blast_evalue [float]] [--heur_blast_length [float]] [--cleanup [True/False]] [--refblast [<path>]] [--queryblast [<path>]] [--maxcmhits [int]] [--dust [yes/no]]
                [--checkCoorthologsRef [True/False]]

Find orthologs of reference miRNAs in the genome of a query species.

Required Arguments:
  -m <path>, --models <path>
                        Path to directory containing covariance models (.cm)
  -n <path>, --ncrna <path>
                        Path to Tab separated file with information about the reference miRNAs
  -o <path>, --output <path>
                        Path to the output directory
  -q <.fa>, --query <.fa>
                        Path to query genome in FASTA format
  -r <.fa>, --reference <.fa>
                        Path to reference genome in FASTA format

Optional Arguments:
  --queryname [str]     Name for the output directory (RECOMMENDED)
  --cpu [int]           Number of CPU cores to use (Default: all available)
  --cm_cutoff [float]   CMsearch bit score cutoff, given as ratio of the CMsearch bitscore of the CM against the refernce species (Default: 0.5)
  --minlength [float]   CMsearch hit in the query species must have at least the length of this value times the length of the refernce pre-miRNA (Default: 0.7)
  --heuristic [True/False]
                        Perform a BLAST search of the reference miRNA in the query genome to identify candidate regions for the CMsearch. Majorly improves speed. (Default: True)
  --heur_blast_evalue [float]
                        Evalue filter for the BLASTn search that determines candidate regions for the CMsearch when running ncOrtho in heuristic mode. (Default: 0.5) (Set to 10 to turn off)
  --heur_blast_length [float]
                        Length cutoff for BLASTN search with which candidate regions for the CMsearch are identified.Cutoff is given as ratio of the reference pre-miRNA length (Default: 0.5) (Set to 0
                        to turn off)
  --cleanup [True/False]
                        Cleanup temporary files (Default: True)
  --refblast [<path>]   Path to BLASTdb of the reference species
  --queryblast [<path>]
                        Path to BLASTdb of the query species
  --maxcmhits [int]     Maximum number of cmsearch hits to examine. Decreases runtime significantly if reference miRNA in genomic repeat region. Set to empty variable to disable (i.e. --maxcmhits=None,
                        default)
  --dust [yes/no]       Use BLASTn dust filter during re-BLAST. Greatly decreases runtime if reference miRNA(s) are located in repeat regions. However ncOrtho will also not identify orthologs for these
                        miRNAs
  --checkCoorthologsRef [True/False]
                        If the re-blast does not identify the original reference miRNA sequence as best hit,ncOrtho will check whether the best blast hit is likely a co-ortholog of the reference miRNA
                        relative to the search taxon. NOTE: Setting this flag will substantially increasethe sensitivity of HaMStR but most likely affect also the specificity, especially when the
                        search taxon is evolutionarily only verydistantly related to the reference taxon (Default: False)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the orthology search

Overview

Covariance models

Reference miRNAs

Fasta data

Full help text

Content

Introduction

Home

Covariance Model construction

Input Data

Choosing core species

Running CM construction

Ortholog Search

Running the orthology search

Downstream

Analysis

Support

Known Issues

Clone this wiki locally