-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report gene and model coverage from hmmsearch #2186
base: master
Are you sure you want to change the base?
Conversation
Hey @mschecht, While testing this I run into a problem related to the parsing of the dom table. Here is a reproducible workflow using this contigs-db: # run hmms
anvi-run-hmms -c P_MARINUS_MIT9301-contigs.db \
-I Bacteria_71 \
--domain-hits-table \
--just-do-it \
--hmmer-output-dir HMM_OUTPUT
# filter stuff using the dom output table and get an error
anvi-script-filter-hmm-hits-table -c P_MARINUS_MIT9301-contigs.db \
--domain-hits-table HMM_OUTPUT/hmm.domtable \
--hmm-source Bacteria_71 \
--report-gene-and-model-coverage \
--min-model-coverage 0.9
HMM profiles .................................: 9 sources have been loaded: Bacteria_71 (71 genes, domain: bacteria), Archaea_76 (76 genes, domain: archaea), Ribosomal_RNA_23S (2
genes, domain: None), Ribosomal_RNA_28S (1 genes, domain: None), Ribosomal_RNA_5S (5 genes, domain: None), Ribosomal_RNA_16S (3
genes, domain: None), Ribosomal_RNA_12S (1 genes, domain: None), Protista_83 (83 genes, domain: eukarya), Ribosomal_RNA_18S (1
genes, domain: None)
Database Path ................................: P_MARINUS_MIT9301-contigs.db
Domtblout Path ...............................: HMM_OUTPUT/hmm.domtable
Config Error: Doesn't look like a --domtblout... anvi'o can't even... Please look at this
error message to find out what happened: Error tokenizing data. C error:
Expected 23 fields in line 2, saw 25 This is happening because of these lines in colnames_coltypes_list = list(zip(*self.dom_table_columns))
colnames_coltypes_dict = dict(zip(colnames_coltypes_list[0], colnames_coltypes_list[1])) The first line in dom table has 23 columns, but the next one has 25 if you only consider spaces to split due to changes in the length of the description (i.e.,
If you change the description for anvi-script-filter-hmm-hits-table -c P_MARINUS_MIT9301-contigs.db \
--domain-hits-table HMM_OUTPUT/hmm.domtable \
--hmm-source Bacteria_71 \
--report-gene-and-model-coverage \
--min-model-coverage 0.9
HMM profiles .................................: 9 sources have been loaded: Bacteria_71 (71 genes, domain: bacteria), Archaea_76 (76 genes, domain: archaea), Ribosomal_RNA_23S (2
genes, domain: None), Ribosomal_RNA_28S (1 genes, domain: None), Ribosomal_RNA_5S (5 genes, domain: None), Ribosomal_RNA_16S (3
genes, domain: None), Ribosomal_RNA_12S (1 genes, domain: None), Protista_83 (83 genes, domain: eukarya), Ribosomal_RNA_18S (1
genes, domain: None)
Database Path ................................: P_MARINUS_MIT9301-contigs.db
Domtblout Path ...............................: HMM_OUTPUT/hmm.domtable
Config Error: Doesn't look like a --domtblout... anvi'o can't even... Please look at this
error message to find out what happened: Error tokenizing data. C error:
Expected 25 fields in line 3, saw 26 The other instances that work with dom table knows that this is a stupid output file format with a variable number of length for the description column, and uses with open(self.domtblout) as dom_table:
dom_table_entries = [l.strip('\n').split(maxsplit=len(self.dom_table_columns) - 1) for l in dom_table.readlines()] FYI :) |
I think users will find it valuable to quickly explore the distribution of gene and model coverage values of
anvi-run-hmms --domain-hits-table
. To do this I implementedanvi-script-filter-hmm-hits-table --report-gene-and-model-coverage
which will add two columns to the hmmsearchdomtblout
(model_coverage
andgene_coverage
) and report it ashmm_domtabl_alignment_coverage.tsv
in the same directory as the domtblout.You can test the
--report-gene-and-model-coverage
and see the output file here:rm -rf metagenomics-full-test; anvi-self-test --suite metagenomics-full -o metagenomics-full-test head metagenomics-full-test/hmm_domtabl_alignment_coverage.tsv