-
Notifications
You must be signed in to change notification settings - Fork 3
How fDOG works
In general, fDOG contains three main steps: (1) core group compilation (steps in black), (2) ortholog search (steps in green) and (3) FAS score calculation (step in red).
First, fDOG search for orthologs of seed sequence in all taxa within the coreTaxa_dir
folder (--corepath
). The reference species of the seed sequence must be also present in this core taxon list. Depend on the user specified settings, fDOG will try to compile the core ortholog group for the seed with n-1 sequences (n defined by the option --coreSize
, default value is 6) and maximize the taxonomy diversification of the core group in the range between the specified minimum and maximum rank (with the options --minDist
and maxDist
, by default are genus and kingdom, respectively). The output core ortholog group will be saved in the core_orthologs
folder (--hmmpath
).
In this step, fDOG also use FAS scores for choosing the best candidate to add into a core group. This FAS score evaluation will not be applied if the user uses the option --fasoff
.
After having the core ortholog group of the seed gene, fDOG will use its profile HMM to find orthologs in the search taxa, which are all taxa in the searchTaxa_dir
folder (--searchpath
). The main output of this step is a multiple fasta file (jobName.extended.fa
), where the seed sequence can be found at the beginning of the file, and followed by all founded ortholog sequences.
If the option --fasoff
is used, the last step will be skipped, and fDOG will create another output called jobName.phyloprofile
, which can be input to PhyloProfile tool for further phylogenetic analyzing.
In case --fasoff
not set, fDOG will perform the FAS score calculation based on the jobName.extended.fa
file. fdogFAS
function of the FAS tool will be applied to compare the feature architecture of the seed protein against all other sequences in the jobID.extended.fa
. The outputs of this step will be jobName.phyloprofile
, jobName_forward.domains
and jobName_reverse.domains
.
Because fdogFAS
takes the first sequence from the jobName.extended.fa
file as the seed protein, therefore if you encounter any strange FAS result, you can check if the jobName.extended.fa
is as expected.