-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SBO term for confidence score #5
Comments
How exactly will this work if SBO terms are to be unique per reaction? |
At the moment there is no field where we can store the confidence scores. We either need parameters or something new. Parameters aren't appropriate, because we cannot refer to a reaction from them. Local parameters aren't suitable either because we would then need to create a kinetic law whose math element must not be empty. I am currently thinking about what to do with confidence scores. |
Ah I see. That's still a TBD. I think one approach would be to create an evidence type. So you could link to a paper, and classify the type of evidence it is. I'm not yet sure if that will work with every case though. |
There is an SBML package for distributions, need to check if this can be helpful: http://sourceforge.net/p/sbml/code/HEAD/tree/trunk/specifications/sbml-level-3/version-1/distrib/sbml-level-3-distrib-package-proposal.pdf?format=raw |
What parts in particular would be relevant? This seems to be about sampling from distributions, and I can't see how that's related. |
Yes. I wanted to check if it also includes confidence scores, but haven't seen it either. Conclusion, we probably need some additional field where we can put this. |
I think this calls for a new "citations" or "evidence" package |
Good idea! I'll collect all other missing fields and see what else is needed. I'll raise this point in the next SBML team meeting (tomorrow). |
This was further discussed in thread opencobra/schema/issues/4, where @matthiaskoenig had the idea to use more specific terms from the evidenceontology.org. We should check if we can make use of this here. |
In my opinion an SBO term is the wrong way to do this. The evidence ontology ECO is absolutely sufficient to encode all the evidence today. It is part of the MIRIAM registry collections Used in multiple projects and allows encoding the evidence for projects like UniProt In addition it is very easy to use and supported today by SBML and other standard formats like CellML. One just has to write the annotation and that is it. No need for any additional package. To annotate the evidence for an SBML element just write the annotation for the evidence. For instance to say that a certain reaction/protein is based on "high throughput evidence used in automatic assertion (ECO:0006057)" just do:
Please no new mechanisms if there are established working mechanisms to encode all the information, and in a much better way than evidence codes. By using composite annotations even the original datasets and publications for the evidence can be easily stored in the annotation for the SBML element. Best Matthias |
Here is an overview of the scores as these are usually defined in COBRA, where 0 is best and 4 is lowest confidence.
For the export from COBRA/BiGG models to SBML we will only need to find the closest terms from ECO for these 4 levels. For the other direction we will need to also define a rule how to match terms between those. |
Here some suggestion, please feel free to correct. If this is not exact enough additional terms should be added to ECO. 0 = Biochemical: Enzyme has been tested biochemically.
Or a subclass of it to be more specific like e.g., 1 = Genetic: Gene overexpression and purification, gene deletions.
http://evidenceontology.org/browse/#ECO_0000073 2 = Sequence: There is significant sequence similarity to another gene with known function.
http://evidenceontology.org/browse/#ECO_0000044 3 = Physiological: There is physiological data to support inclusion in the model.
4 = Modeling: Reaction is included to improve simulation results
|
And forgot: |
This looks like a very good start! Thanks @matthiaskoenig. We should also direct @tpfau to this suggestion. |
Just to add to this: One wants to store for a reaction all the evidence which is there, not only the minimal common denominator.
Suddenly you have the collection of evidence and confidence for the reaction and not only a "0". Confidence scores is nothing anybody should use in a reconstruction in 2018. |
Supporting this is something @Midnighter and @cdiener may also want to consider when improving the cobrapy parsers. Once this finds its way into Cobrapy.Model objects I'm very happy to start writing tests for this in memote. Important to me is that one can directly link the ECO terms with links to the literature (DOI, PubmedID, etc). But if I understand @matthiaskoenig correctly, composite annotations would allow us to do this! |
In general I think using ECO here is a very good idea.
To the 0-4 levels. |
Please note that COBRA's definition of the confidence scores (0= best, 4 = lowest confidence score) is inverse to the definition of Ines Thiele's and Bernhard Ø. Palsson's "A protocol for generating a high-quality genome-scale metabolic reconstruction", where 4 is the best and 0 the lowest confidence score (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3125167/table/T2/?report=objectonly). Hence, using ECO numbers instead of scores from 0 to 4 might help to avoid confusion. |
Request a new SBO term to be used for confidence scores.
The text was updated successfully, but these errors were encountered: