You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While porting django-genes code to Python 3 for this project, a question came up about the Gene.weight property (see py3-adage-backend/adage/genes/models.py) and how it was used.
After reviewing the original django-genes codebase, it's clear that .weight supports an important search feature. Although it's not initially needed for Adage, it will likely be useful in the future and it will certainly be needed when it comes time to factor the py3 version of django-genes back out into a separate component. Details of my code review are below:
Weight is a search tuning parameter. Although it is not particularly useful for the Pseudomonas data we currently use in Adage, a fair amount of work was done in django-genes (for Tribe, I assume) to add this in because it was needed.
What happens when searching for genes across many data sources is that you find the same gene name being used to refer to different genetic locations even within the same organism. So, when a user is searching for a gene to add to a list, there needs to be a way to sort through the duplicates. From the comments in django-genes/genes/search_indexes.py (lines 34-59) and django-genes/genes/management/commands/genes_load_geneinfo.py (lines 213-271), it appears that the weighting is done in such a way that the “more popular” gene hits will rise to the top of the search list. genes_load_geneinfo.py has logic that counts the number of cross-references and aliases a gene has and gives the gene a higher weight if there are a lot of those. search_indexes.py then tweaks those weights into a boost parameter, which appears to be what actually modifies the search visibility of that gene.
So I take from this that we will need the weight parameter and the boost logic to return, somehow, before this code is folded back into django-genes.
How soon will we need this sort of logic for Adage? I guess that’s really a question for @cgreene, but I think it’s safe to say that if we expand to other organisms we will hit this duplicate gene name issue eventually.
The text was updated successfully, but these errors were encountered:
While porting
django-genes
code to Python 3 for this project, a question came up about theGene.weight
property (seepy3-adage-backend/adage/genes/models.py
) and how it was used.After reviewing the original
django-genes
codebase, it's clear that.weight
supports an important search feature. Although it's not initially needed for Adage, it will likely be useful in the future and it will certainly be needed when it comes time to factor the py3 version ofdjango-genes
back out into a separate component. Details of my code review are below:Weight is a search tuning parameter. Although it is not particularly useful for the Pseudomonas data we currently use in Adage, a fair amount of work was done in
django-genes
(for Tribe, I assume) to add this in because it was needed.What happens when searching for genes across many data sources is that you find the same gene name being used to refer to different genetic locations even within the same organism. So, when a user is searching for a gene to add to a list, there needs to be a way to sort through the duplicates. From the comments in
django-genes/genes/search_indexes.py
(lines 34-59) anddjango-genes/genes/management/commands/genes_load_geneinfo.py
(lines 213-271), it appears that the weighting is done in such a way that the “more popular” gene hits will rise to the top of the search list.genes_load_geneinfo.py
has logic that counts the number of cross-references and aliases a gene has and gives the gene a higher weight if there are a lot of those.search_indexes.py
then tweaks those weights into aboost
parameter, which appears to be what actually modifies the search visibility of that gene.So I take from this that we will need the
weight
parameter and theboost
logic to return, somehow, before this code is folded back intodjango-genes
.How soon will we need this sort of logic for Adage? I guess that’s really a question for @cgreene, but I think it’s safe to say that if we expand to other organisms we will hit this duplicate gene name issue eventually.
The text was updated successfully, but these errors were encountered: