forked from felixlangschied/ncortho
-
Notifications
You must be signed in to change notification settings - Fork 0
Running the orthology search
felixlangschied edited this page Feb 16, 2023
·
2 revisions
- Path to directory that contains a covariance model for each reference miRNA
- Filename of the model has to exactly match the identifier in the Tab separated miRNA input file
- Covariance model file has to have the ending ".cm"
- All other files in the directory are disregarded by ncOrtho
Information about all reference miRNAs has to be supplied in a single Tab separated file with 7 columns:
- Unique miRNA id
- Contig/Chromosome id (needs to match the one in the reference GFF file!)
- Start
- Stop
- Strand (+ or -)
- pre-miRNA sequence
- mature miRNA sequence (no features that use the mature sequence are as of yet implemented. Therefore this column can be filled with a placeholder like "NA" or "None")
Example:
hsa-mir-552 NC_000001.11 34669599 34669694 - AACCAUUCAAAUAUACCACAGUUUGUUUAACCUUUUGCCUGUUGGUUGAAGAUGCCUUUCAACAGGUGACUGGUUAGACAAACUGUGGUAUAUACA NA
hsa-mir-30e NC_000001.11 40754355 40754446 + GGGCAGUCUUUGCUACUGUAAACAUCCUUGACUGGAAGCUGUAAGGUGUUCAGAGGAGCUUUCAGUCGGAUGUUUACAGCGGCAGGCUGCCA NA
hsa-mir-30c-1 NC_000001.11 40757284 40757372 + ACCAUGCUGUAGUGUGUGUAAACAUCCUACACUCUCAGCUGUGAGCUCAAGGUGGCUGGGAGAGGGUUGUUUACUCCUUCUGCCAUGGA NA
hsa-mir-6733 NC_000001.11 43171652 43171712 - GUGCUUGGGAAAGACAAACUCAGAGUUCCCUUCUUGUGAGCUCAGUGUCUGGAUUUCCUAG NA
You can retrieve this information from popular databases like miRBase.
- Genomic sequence in FASTA format (e.g "genomic.fna" from RefSeq or "dna.toplevel.fa" from Ensembl)
#########################################################
### ###
### ncOrtho - ortholog search for non-coding RNAs ###
### ###
#########################################################
usage: ncSearch [-h] -m <path> -n <path> -o <path> -q <.fa> -r <.fa> [--queryname [str]] [--cpu [int]] [--cm_cutoff [float]] [--minlength [float]] [--heuristic [True/False]]
[--heur_blast_evalue [float]] [--heur_blast_length [float]] [--cleanup [True/False]] [--refblast [<path>]] [--queryblast [<path>]] [--maxcmhits [int]] [--dust [yes/no]]
[--checkCoorthologsRef [True/False]]
Find orthologs of reference miRNAs in the genome of a query species.
Required Arguments:
-m <path>, --models <path>
Path to directory containing covariance models (.cm)
-n <path>, --ncrna <path>
Path to Tab separated file with information about the reference miRNAs
-o <path>, --output <path>
Path to the output directory
-q <.fa>, --query <.fa>
Path to query genome in FASTA format
-r <.fa>, --reference <.fa>
Path to reference genome in FASTA format
Optional Arguments:
--queryname [str] Name for the output directory (RECOMMENDED)
--cpu [int] Number of CPU cores to use (Default: all available)
--cm_cutoff [float] CMsearch bit score cutoff, given as ratio of the CMsearch bitscore of the CM against the refernce species (Default: 0.5)
--minlength [float] CMsearch hit in the query species must have at least the length of this value times the length of the refernce pre-miRNA (Default: 0.7)
--heuristic [True/False]
Perform a BLAST search of the reference miRNA in the query genome to identify candidate regions for the CMsearch. Majorly improves speed. (Default: True)
--heur_blast_evalue [float]
Evalue filter for the BLASTn search that determines candidate regions for the CMsearch when running ncOrtho in heuristic mode. (Default: 0.5) (Set to 10 to turn off)
--heur_blast_length [float]
Length cutoff for BLASTN search with which candidate regions for the CMsearch are identified.Cutoff is given as ratio of the reference pre-miRNA length (Default: 0.5) (Set to 0
to turn off)
--cleanup [True/False]
Cleanup temporary files (Default: True)
--refblast [<path>] Path to BLASTdb of the reference species
--queryblast [<path>]
Path to BLASTdb of the query species
--maxcmhits [int] Maximum number of cmsearch hits to examine. Decreases runtime significantly if reference miRNA in genomic repeat region. Set to empty variable to disable (i.e. --maxcmhits=None,
default)
--dust [yes/no] Use BLASTn dust filter during re-BLAST. Greatly decreases runtime if reference miRNA(s) are located in repeat regions. However ncOrtho will also not identify orthologs for these
miRNAs
--checkCoorthologsRef [True/False]
If the re-blast does not identify the original reference miRNA sequence as best hit,ncOrtho will check whether the best blast hit is likely a co-ortholog of the reference miRNA
relative to the search taxon. NOTE: Setting this flag will substantially increasethe sensitivity of HaMStR but most likely affect also the specificity, especially when the
search taxon is evolutionarily only verydistantly related to the reference taxon (Default: False)