-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve mat #1197
Improve mat #1197
Conversation
Codecov Report
@@ Coverage Diff @@
## devel #1197 +/- ##
==========================================
+ Coverage 83.63% 84.23% +0.60%
==========================================
Files 65 65
Lines 5370 5549 +179
Branches 1203 1287 +84
==========================================
+ Hits 4491 4674 +183
+ Misses 569 564 -5
- Partials 310 311 +1
Continue to review full report at Codecov.
|
That's awesome, thanks so much! I think all cobrapy species have annotations, including genes:
Empty in this case but present. |
With the annotated Recon3D it errors:
I think you don't want the |
Fixed. |
Oh yeah, those are additional variables used for enzyme capacity models. Not sure whether we should read them. They can be represented with optlang, but cobrapy has no methods for enzyme capacity models at this point. We may want to support model.C, model.ctrs, and model.ctrNames, which would just be added as additional constraints to the model. But this should probably be its own PR. What we may consider in this PR is to add writing annotations to Matlab as well, or did you want this in a separate PR (would be fine with me)? |
Some of the properties don't seem to get parsed correctly. For instance, EC numbers:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good. Some improvements to variable names and simplifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Will work on writing to matlab later.
If we can add these as constraints, you can tell me what each one means or how to represent it in optlang, so they can be read in. It can be this PR. |
Some thoughts before mission creep and/or adding stuff to this PR that doesn't belong I've been working on comparing RECON3D SBML and RECON3D mat reading to see what's missing, and I have some helper functions I'm working on and so forth.
|
I would say separate PR and maybe have it in the utils submodule?
We had that discussion before but Notes in SBML are specifically meant to not be parsed and are only there for a human to look at them. No machine readable information should be stored there. So ignoring those is fine. This should be fixed on the cobratoolbox side when creating those models.
Yeah, the subsystem attribute itself is a leftover from legacy cobra models. Groups would definitely be the modern version for this. Probably better in a separate PR but it makes sense to read them as groups and write them from groups with specific names. |
Oh and single annotations don't have to be in a list right now, but they can be. And maybe leave the writing to a new PR because that one is already large. |
Oh and single annotations don't have to be in a list right now, but they can be. Can single annotations be in a list? I wrote the code in a way that both single and multiple annotations will go into a list. The PR has writing. If you want me to separate it to a different PR, I could, but I did it before I saw your comment. |
Yes a list is fine. You mean like And no need to separate the PR. I thought you hadn't started on writing yet, but it's all good. |
Yes. That example is exactly how it is.
…On Wed., Apr. 6, 2022, 11:08 p.m. Christian Diener, < ***@***.***> wrote:
Yes a list is fine. You mean like annotations = {"ec-number": ["1.2.3.4"]},
right? That is fine.
And no need to separate the PR. I thought you hadn't started on writing
yet, but it's all good.
—
Reply to this email directly, view it on GitHub
<#1197 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQYYZVUYALH3PFQIMZ7XSLVDZGSNANCNFSM5SGWEDZA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Adding test cases, and nearly done. Are rewritten (because of create_mat_metabolite_id) as metabolites that do have compartment in the met id. It means that models that don't have compartments in met_ids do not make a seamless read, write, reread transfer. This seems to be a quirk of RAVEN format, so I don't know if it is even used apart from test cases. |
Yeah, I agree with you that this is odd. I would also favor maintaining IDs the same. There isn't really a rule that compartment has to be in a metabolite ID. It's a convention that many IDs follow, but you usually also want a real attribute for a compartment. I would vote for removing that substitution and using the original ID. My argument would be that it would be surprising if I wrote a COBRApy model to Matlab and suddenly all my metabolites got renamed. |
made failure better added tests to fail when compartments are missing or problematic added test to see that mat models with no genes are imported correctly
It looks like pepole who create mat files are sloppy and/or the behavior of the cobratoolbox changed over time. Specifically, the field names for annotations keep varying a bit on the capitalization Likewise for other fields. Question - do we want cobrapy to be tolerant of this? If not, I'll go with the official description in https://github.com/opencobra/cobratoolbox/blob/master/docs/source/notes/COBRAModelFields.md If so, do you have any idea on how to do case-insensitive matching? |
Yeah, they run no validations at all it seems. You could convert all keys in the mat struct to lower case and match on those only, maybe? |
Okay. I'll output what is supposed to be the official version. |
…ion of model fields (metChebiID, metCHEBIID, metChEBIID) - revised reading to be tolerant of this
…ssed in the CHEBI website. removed unnecessary function create_mat_metabolite_id()
…ames are ignored)
added tests - test_compare_xml_to_written_matlab_model, test_mat_model_with_long_compartment_ids, test_mat_model_wrong_caps
All done. Including tests for all kinds of weird cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, only a bunch of formatting/variable naming changes. The code is getting really hard to review at this point due to large number of commits. So I would probably recommend to leave most of those for a clean-up PR afterwards. The only thing for now would be to remove the code in the docstring that generates the wrong key model. Then we can merge and follow up with a PR to fix the wrong annotations in SBML models and another one for the variable name cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Some questions remaining.
next release corrected removed instructions to gist
@cdiener - reverted the test_sbml.py changes |
Actually, my bad. I read up on it and I think the lower case providers in SBML are now the de facto standard and are allowed by identifiers.org. So annotations["sbo"] is fine and the SBML parser will consolidate into this format (so transform from "SBO"). But we can discuss this more in another PR that addresses this because it also needs some adaptations for IDs. |
@cdiener - I think we're done. Would you like another developer to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this in.
Improve reading of matlab models including annotations. See https://github.com/opencobra/cobratoolbox/blob/master/docs/source/notes/COBRAModelFields.md for fields added.
Fields I didn't add, and I'm not sure what they mean
all gene annotations (genes don't have annotations in cobrapy yet)
model.C, model.E. model.e*, model.D, model.d*
I haven't modified tests yet. Not sure what's necessary, but I'm thinking about making a copy of load_model which will load mat models from BIGG/BioModels, and then maybe make a test of load_model, load_model_mat and compare them.
I haven't modified the writing of mat models. If this code makes sense, I can write the inverse function.
Improve mat reading