You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you create a taxdump using create-taxdump (ICTV taxonomy, for example), the taxids "skip" some numbers. For example:
$ head ictv-taxdump/names.dmp
1 | root | | scientific name |
287205 | Hoswirudivirus MRV1 | | scientific name |
287935 | Shomudavirus limadaptatum | | scientific name |
1096518 | Sclerotimonavirus betaclarireediae | | scientific name |
1138752 | Potato virus H | | scientific name |
1536674 | Rhopapillomavirus 1 | | scientific name |
1845995 | Monomorium pharaonis virus 1 | | scientific name |
1890985 | Aquamavirus A | | scientific name |
2079526 | Hylipavirus | | scientific name |
2290567 | Fattrevirus | | scientific name |
This is not a problem in itself, as the nodes are still connected. However, this causes a bug when you try to create a MMSeqs2 taxonomy database using the custom taxonomy, as it apparently assumes that numbers are not skipped (unless they are in delnodes.dmp and merged.dmp, I guess).
I wrote a script that mapped taxids such that no number is skipped and it solved the issue.
$ head ictv-taxdump/names.dmp
1 | root | | scientific name |
2 | Hoswirudivirus MRV1 | | scientific name |
3 | Shomudavirus limadaptatum | | scientific name |
4 | Sclerotimonavirus betaclarireediae | | scientific name |
5 | Potato virus H | | scientific name |
6 | Rhopapillomavirus 1 | | scientific name |
7 | Monomorium pharaonis virus 1 | | scientific name |
8 | Aquamavirus A | | scientific name |
9 | Hylipavirus | | scientific name |
10 | Fattrevirus | | scientific name |
This is not a TaxonKit bug in any way. But because MMSeqs2 is pretty popular, I thought it was best to report this here in case anyone else faces the same issue.
The text was updated successfully, but these errors were encountered:
When you create a taxdump using
create-taxdump
(ICTV taxonomy, for example), the taxids "skip" some numbers. For example:This is not a problem in itself, as the nodes are still connected. However, this causes a bug when you try to create a MMSeqs2 taxonomy database using the custom taxonomy, as it apparently assumes that numbers are not skipped (unless they are in delnodes.dmp and merged.dmp, I guess).
I wrote a script that mapped taxids such that no number is skipped and it solved the issue.
This is not a TaxonKit bug in any way. But because MMSeqs2 is pretty popular, I thought it was best to report this here in case anyone else faces the same issue.
The text was updated successfully, but these errors were encountered: