-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In cobra.io neither sbml.py nor sbml3.py seem to import or export notes. #4
Comments
From @cdiener on July 6, 2017 18:54 That is a good point and one that pops up every once in a while for discussion. There is some ongoing discussion about the meaning of the SBML spec regarding the notes field. SBML only says:
and comparing to
The interpretation of the cobrapy maintainers in the past was that since notes should not be "consumed by a machine" it would not be written or read by cobrapy except for supporting the SBML 2 cobra annotations. The argument was that all annotation should go into the annotation tag as described in the spec. For the particular use case of DOIs <annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-
qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#M_h_c">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/kegg.compound/C00080"/>
<rdf:li rdf:resource="http://identifiers.org/doi/10.1038/nbt1156"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation> However, that only works for direct annotations and not for adding data. For instance if I want to add some other quantity to the species or reaction (confidence scores or charge in various conditions, etc.), there is no way to do that with annotations. This is a shortcoming of SBML IMHO. So I would be in favour of reading and writing the notes field. Could be just raw text of could be a dictionary that is read and written to |
From @ChristianLieven on July 11, 2017 14:24 #534 Referencing this issue because @draeger, @Midnighter and @hredestig came up with this solution, which I consider quite optimal:
|
From @draeger on July 19, 2017 14:0 Well, there is, of course, another way of storing confidence scores for reactions in a standard-compliant form. You could use |
I fully support the idea of coming up with your own schema to store information in the 'annotation' child of SBML objects; I think this is a great idea. However, there are a couple things you've mentioned wanting that you could store in SBML packages:
The 'groups' package is released and ready to use today. The 'distrib' package has not yet been finalized, so if there's anything you need that is not yet there, it would be relatively straightforward to add it (I've been in charge of shepherding that package to completion; email me and/or the package working group at [email protected] if you have questions or requests.) |
I see pros and cons of having the notes field and the annotations field, and the fact that one is supposed to be human-readable and the other machine-readable. The thing is... what if you want to have something that is both human-readable and machine-readable? It is very nice and convenient just to have the best of both worlds. I currently added support for having an extended set of metabolite and reaction attributes in framed and carveme. When reading/writing an SBML file I parse attributes in the form of "key: value" pairs which are stored in the notes field. These are then stored inside the Metabolite and Reaction objects, using an attribute called "metadata" which is just a python dictionary. This metadata includes things like formulas, ec numbers, manual curation notes, etc. I frequently use these attributes to implement different kinds of methods (e.g.: delta G values for thermodynamic FBA). I think that constantly extending SBML with new attributes every time someone needs a new attribute is not very sustainable in the long term. You need to wait for a new release of the fbc package, which takes a lot of time, and in the meantime, people already came up with their own workarounds. One possible solution (not ideal, I know) is to have these dictionaries of extended attributes, and the subset of people who want to use a particular attribute (like delta G value), or implement support for it in their simulation libraries, just come together and agree on a suitable identifier name. |
@luciansmith: One comment about the confidence scores. These are not confidence intervals from a distribution. These are typically discrete numbers (often from 0 to 4) indicating the level of knowledge the model creator has that the component should be in the model. The numbers correspond to categories such as "read in a paper," "experimentally verified," "from a related organism," "computationally inferred," or similar. I, therefore, believe that the @cdanielmachado: I think it would also be good to create a specified new package for adding additional properties to model components. The SBML extension would only introduce an extension to |
The confidence score is basically an evidence annotation.
Personally I would just annotate this to an evidence ontology, which has a
much more fine grained evidence handling (and especially the tree
relationship between the different confidence/evidence
http://www.evidenceontology.org/browse/
This is a much more universal and reusable solution than using an arbitrary
evidence category of 0-4.
You could easily map your 0-4 to the respective terms, but at the same time
it would others to work with your confidence and use it for inferences.
Basically you have everything you need and
*Term id:*ECO:0005549 *Term name:*biological system reconstruction evidence
based on homology evidence *Definition:*A type of biological system
reconstruction where the evidence is inferred by homology based on
conservation of sequence, function, and composition from an existing
experimentally supported model to a process, pathway, or complex. [ECO:SN,
PMID:15660128] *Comment:*Inference may be based on paralogy andor orthology
of the genome-encoded components and is made primarily on functional
conservation between the two systems. The sequences and number of
genome-encoded components are fairly conserved but some divergence is
observed. Evidence may originate from a combination of several experiments
in the same or another species.
is much cleaner than writing "2 from related organism"
Matthias
…On Sun, Nov 19, 2017 at 10:41 AM, Andreas Dräger ***@***.***> wrote:
@luciansmith <https://github.com/luciansmith>: One comment about the
confidence scores. These are not confidence intervals from a distribution.
These are typically discrete numbers (often from 0 to 4) indicating the
level of knowledge the model creator has that the component should be in
the model. The numbers correspond to categories such as "read in a paper,"
"experimentally verified," "from a related organism," "computationally
inferred," or similar. I, therefore, believe that the distrib package is
not the right recommendation for storing this kind of information.
@cdanielmachado <https://github.com/cdanielmachado>: I think it would
also be good to create a specified new package for adding additional
properties to model components. The SBML extension would only introduce an
extension to SBase in the sense that you can add a value pair of an
ontology term, some value (either a qualitative value or a quantitative
one), and a third attribute for the data type of the value. For instance,
an ontology term for Gibbs free energy would be one attribute and the value
would be a stored as a String. The third attribute would indicate that the
value is a floating-point number so that a software package could parse it
out. The ontology could be continuously extended and improved, independent
from the SBML extension package. In this way, we could systematically add
many kinds of values. Best practices should be given in this package's
specification to avoid that information is stored there that should better
go to other (more specific) fields. For instance, EC-numbers should go to
MIRIAM annotations.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA29ugnmnnX6nylrPXWS_xa2MXpYSDX2ks5s3_evgaJpZM4OffGs>
.
--
Dr. Matthias König
Junior Group Leader LiSyM - Systems Medicine of the Liver
Humboldt-University Berlin, Institute of Biology, Institute for Theoretical
Biology
https://www.livermetabolism.com
[email protected]
Tel: +49 30 20938450
Tel: +49 176 81168480
|
I can get behind using an evidence ontology instead of the rather arbitrary confidence scores that are floating around. Just to get us back on track, however, my initial question was more aimed at finding the best way of connecting any annotation-information with both a human-readable note AND a machine-readable DOI. So through this schema, I'd like to consolidate a way that this can be done consistently for COBRA models. The whole reason for this is: Using memote, I want to be able to not only gather information on the number of annotations for any given model component but also provide information on the amount and quality of evidence backing up these annotations. To take up Matthias suggestion for ECO again, I could imagine a possible metric to be the ratio of Edit:
|
Looks like the discussion at draeger-lab/ModelPolisher#5 provided an excellent solution for this issue without necessarily needing to reinvent the wheel with a new schema. |
Hi all, From what I read above, in associated issues, and in SBML L3V2 documentation, I understand that we can annotate in But I still can't find a way to encode data, such as Gibbs free energy in In my opinion, COBRA should not parse anything within the Note that @draeger proposed to create a SBML package that would be generic enough to solve this kind of problem. |
@bdelepine, thanks for pointing out that notes aren't the right place to store machine-readable information. A few additional comments from my side:
|
@draeger here we go: http://pysces.sourceforge.net/KeyValueData/ Note that in the "practical example" some of the terms can be written as MIRIAM uri's or are now encoded in FBC, the keys are arbitrary. I've been using this for a few years in my tools and it is simple to parse as an SBML annotation and extremely flexible. In general I've found the "type" attribute to be practically redundant. One extension I'm considering is to add a "url" attribute to the element that will act as a optional/supplementary controlled key. |
From @ChristianLieven on July 6, 2017 16:6
Problem description
I am currently reconstructing a metabolic model, for which I am adding confidence scores, comments, and literature references in the notes attribute of reactions, metabolites and genes. The importance of confidence scores and related qualitative annotation parameters is discussed in the publications linked above.
I tried importing simple noted by adding the following notes field to the RECON1 model from BiGG.
<notes> <body xmlns="http://www.w3.org/1999/xhtml"> <center><h2>This is a TEST</h2></center> <p>I am wondering if COBRApy is able to import this.</p> </body> </notes>
I was quite surprised that the RECON1 model did not contain the confidence scores upon which some of the results of this research are based on.
I was not able to find the keywords 'confidence', 'score' or 'confidence_score' in cobra.io.sbml nor cobra.io.sbml3. If I saw that right the legacy import looks specifically for charge, GPR, and subsystem in the notes field but doesn't account for the confidence score.
Code Sample
You can find my modified example SMBL3+FBC RECON1 file here. The modification is at R_EX_dopa_e.
Discussion
It seems like the community hasn't decided yet what exactly the notes field should contain and how it should be formatted. Personally, I'd find most useful if there was a clever way of allowing both, short human-readable comment entries, as well as optional, but specifically related machine-readable DOI-styled literature references. In the model object, I suppose this could be a nested dictionary looking something like this:
some_model.reaction.SOME_RXN.notes = {"confidence_score":{"value":4, "reference":"some_doi"}}
Based on the referenced publications above, another useful key of the notes-field/attribute would be a simple 'comment' option, which would be limited in length (50 chars? 70 chars? 80 chars?).
some_model.reaction.some_metabolite.notes = {"comment":{"value":"Short string outlining a hypothesis or specific decision for this metabolite", "optional_reference":"some_doi"}}
I don't doubt that there could be a feasible, simple implementation on the python side of things, however I am unfamiliar with the options on the xml specifically SMBL side. A notes field according to the SMBL specifications is allowed to contain...
...which seem pretty straight-forward, namely the notes field ...
Hence, I think a solution here could be to use
<ul>
from HTML?What do you think?
Copied from original issue: opencobra/cobrapy#541
The text was updated successfully, but these errors were encountered: