Improve mat #1197

akaviaLab · 2022-03-31T21:09:47Z

fix Improve the reading of MATLAB models #1040
description of feature/fix
Improve reading of matlab models including annotations. See https://github.com/opencobra/cobratoolbox/blob/master/docs/source/notes/COBRAModelFields.md for fields added.
Fields I didn't add, and I'm not sure what they mean
all gene annotations (genes don't have annotations in cobrapy yet)
model.C, model.E. model.e*, model.D, model.d*

I haven't modified tests yet. Not sure what's necessary, but I'm thinking about making a copy of load_model which will load mat models from BIGG/BioModels, and then maybe make a test of load_model, load_model_mat and compare them.

I haven't modified the writing of mat models. If this code makes sense, I can write the inverse function.

add an entry to the next release
Improve mat reading

codecov-commenter · 2022-03-31T21:15:42Z

Codecov Report

Merging #1197 (49cc180) into devel (ca8e8bd) will increase coverage by 0.60%.
The diff coverage is 84.28%.

@@            Coverage Diff             @@
##            devel    #1197      +/-   ##
==========================================
+ Coverage   83.63%   84.23%   +0.60%     
==========================================
  Files          65       65              
  Lines        5370     5549     +179     
  Branches     1203     1287      +84     
==========================================
+ Hits         4491     4674     +183     
+ Misses        569      564       -5     
- Partials      310      311       +1

Impacted Files	Coverage Δ
src/cobra/io/mat.py	`82.28% <84.28%> (+14.73%)`	⬆️
src/cobra/io/sbml.py	`80.45% <0.00%> (+0.35%)`	⬆️
src/cobra/sampling/core.py	`93.93% <0.00%> (+15.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca8e8bd...49cc180. Read the comment docs.

cdiener · 2022-03-31T22:43:18Z

That's awesome, thanks so much!

I think all cobrapy species have annotations, including genes:

In [1]: from cobra.test import create_test_model

In [2]: mod = create_test_model("textbook")

In [3]: mod.genes[0].annotation
Out[3]: {}

Empty in this case but present.

cdiener · 2022-04-01T18:04:26Z

With the annotated Recon3D it errors:

In [5]: rec = load_matlab_model("Recon3DModel_301.mat")
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 rec = load_matlab_model("Recon3DModel_301.mat")

File ~/code/cobrapy/src/cobra/io/mat.py:231, in load_matlab_model(infile_path, variable_name, inf)
    229 for possible_name in possible_names:
    230     try:
--> 231         return from_mat_struct(data[possible_name], model_id=possible_name, inf=inf)
    232     except ValueError:
    233         # TODO: use custom cobra exception to handle exception
    234         pass

File ~/code/cobrapy/src/cobra/io/mat.py:496, in from_mat_struct(mat_struct, model_id, inf)
    494         new_metabolite.formula = met_formulas[i]
    495     new_metabolites.append(new_metabolite)
--> 496 mat_annotations(new_metabolites, m, _d_replace=D_MET)
    497 model.add_metabolites(new_metabolites)
    499 new_genes = []

File ~/code/cobrapy/src/cobra/io/mat.py:368, in mat_annotations(target_list, mat_struct, _d_replace)
    364 annotation_matlab = list(
    365     set(mat_struct.dtype.names).intersection(D_REPLACE[_d_replace].keys())
    366 )
    367 annotation_lists = [[None]] * len(annotation_matlab)
--> 368 annotation_providers = [
    369     D_REPLACE[_d_replace + "_REV"][x] for x in annotation_matlab
    370 ]
    371 for i in range(len(annotation_matlab)):
    372     annotation_lists[i] = _cell_to_str_list(mat_struct[annotation_matlab[i]][0, 0])

File ~/code/cobrapy/src/cobra/io/mat.py:369, in <listcomp>(.0)
    364 annotation_matlab = list(
    365     set(mat_struct.dtype.names).intersection(D_REPLACE[_d_replace].keys())
    366 )
    367 annotation_lists = [[None]] * len(annotation_matlab)
    368 annotation_providers = [
--> 369     D_REPLACE[_d_replace + "_REV"][x] for x in annotation_matlab
    370 ]
    371 for i in range(len(annotation_matlab)):
    372     annotation_lists[i] = _cell_to_str_list(mat_struct[annotation_matlab[i]][0, 0])

KeyError: 'metKEGGID'

I think you don't want the + "_REV" in line 369.

akaviaLab · 2022-04-01T19:01:37Z

Fixed.
If you can figure out what everything lower than
model.E m x evars
on the cobratoolbox page means, and if it is relevant to cobrapy, I'd appreciate it. I think cobrapy doesn't support any of these fields, but if I'm wrong, please let me know.

cdiener · 2022-04-01T21:32:32Z

Oh yeah, those are additional variables used for enzyme capacity models. Not sure whether we should read them. They can be represented with optlang, but cobrapy has no methods for enzyme capacity models at this point. We may want to support model.C, model.ctrs, and model.ctrNames, which would just be added as additional constraints to the model. But this should probably be its own PR. What we may consider in this PR is to add writing annotations to Matlab as well, or did you want this in a separate PR (would be fine with me)?

cdiener · 2022-04-01T22:14:09Z

Some of the properties don't seem to get parsed correctly. For instance, EC numbers:

In [97]: m["Recon3DModel"]["rxnECNumbers"][0][0][12]
Out[97]: array([array(['1.4.3.6'], dtype='<U7')], dtype=object)

In [98]: m["Recon3DModel"]["rxns"][0][0][12]
Out[98]: array([array(['13DAMPPOX'], dtype='<U9')], dtype=object)

In [99]: rec.reactions[12].annotation
Out[99]: {'pubmed': '15035803,8920635,9653080'}

In [100]: rec.reactions[12]
Out[100]: <Reaction 13DAMPPOX at 0x7fb42ae47ca0>

cdiener

Looking pretty good. Some improvements to variable names and simplifications.

src/cobra/io/mat.py

akaviaLab

Okay. Will work on writing to matlab later.

src/cobra/io/mat.py

akaviaLab · 2022-04-01T23:20:20Z

Oh yeah, those are additional variables used for enzyme capacity models. Not sure whether we should read them. They can be represented with optlang, but cobrapy has no methods for enzyme capacity models at this point. We may want to support model.C, model.ctrs, and model.ctrNames, which would just be added as additional constraints to the model. But this should probably be its own PR. What we may consider in this PR is to add writing annotations to Matlab as well, or did you want this in a separate PR (would be fine with me)?

If we can add these as constraints, you can tell me what each one means or how to represent it in optlang, so they can be read in. It can be this PR.
The enzyme capacity that we don't have methods in cobrapy should probably be ignored for now.
As for writing annotations to matlab - I'm working on it.

akaviaLab · 2022-04-04T15:17:28Z

Some thoughts before mission creep and/or adding stuff to this PR that doesn't belong

I've been working on comparing RECON3D SBML and RECON3D mat reading to see what's missing, and I have some helper functions I'm working on and so forth.

Would you like me to try and add a function to compare models? I have a kludgy function I can tidy up. I'm using it to see where the differences are, so I can focus on what to fix. Separate PR is also an option.
Would you like me to improve SBML reading? For example, the RECON3D SBML (which I think was written by cobratoolbox) has things in Notes that should really be in annotations. I can add some changes to SBML notes reading/writing to make sure these kind of annotations end up in annotations, or I can ignore them.
It looks like cobratoolbox uses subsystems for groups, and only uses groups for reactions for now. Would you like me to make the matlab reader build groups based on subsystems? What about writing?

cdiener · 2022-04-06T15:19:01Z

Would you like me to try and add a function to compare models? I have a kludgy function I can tidy up. I'm using it to see where the differences are, so I can focus on what to fix. Separate PR is also an option.

I would say separate PR and maybe have it in the utils submodule?

Would you like me to improve SBML reading? For example, the RECON3D SBML (which I think was written by cobratoolbox) has things in Notes that should really be in annotations. I can add some changes to SBML notes reading/writing to make sure these kind of annotations end up in annotations, or I can ignore them.

We had that discussion before but Notes in SBML are specifically meant to not be parsed and are only there for a human to look at them. No machine readable information should be stored there. So ignoring those is fine. This should be fixed on the cobratoolbox side when creating those models.

It looks like cobratoolbox uses subsystems for groups, and only uses groups for reactions for now. Would you like me to make the matlab reader build groups based on subsystems? What about writing?

Yeah, the subsystem attribute itself is a leftover from legacy cobra models. Groups would definitely be the modern version for this. Probably better in a separate PR but it makes sense to read them as groups and write them from groups with specific names.

cdiener · 2022-04-06T23:31:21Z

Oh and single annotations don't have to be in a list right now, but they can be. And maybe leave the writing to a new PR because that one is already large.

akaviaLab · 2022-04-07T00:56:46Z

Oh and single annotations don't have to be in a list right now, but they can be.

Can single annotations be in a list? I wrote the code in a way that both single and multiple annotations will go into a list.

The PR has writing. If you want me to separate it to a different PR, I could, but I did it before I saw your comment.

cdiener · 2022-04-07T03:08:11Z

Yes a list is fine. You mean like annotations = {"ec-number": ["1.2.3.4"]}, right? That is fine.

And no need to separate the PR. I thought you hadn't started on writing yet, but it's all good.

akaviaLab · 2022-04-07T06:43:08Z

Yes. That example is exactly how it is.

…

On Wed., Apr. 6, 2022, 11:08 p.m. Christian Diener, < ***@***.***> wrote: Yes a list is fine. You mean like annotations = {"ec-number": ["1.2.3.4"]}, right? That is fine. And no need to separate the PR. I thought you hadn't started on writing yet, but it's all good. — Reply to this email directly, view it on GitHub <#1197 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQYYZVUYALH3PFQIMZ7XSLVDZGSNANCNFSM5SGWEDZA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

src/cobra/io/mat.py

akaviaLab · 2022-04-20T15:36:12Z

Adding test cases, and nearly done.
Question - models that don't have compartment in the met_id (like raven.mat) that match with
if all(var in m.dtype.names for var in ["metComps", "comps", "compNames"]):

Are rewritten (because of create_mat_metabolite_id) as metabolites that do have compartment in the met id.
For example, in raven.mat
'm156' is rewritten as 'm156[Cytoplasm]'

It means that models that don't have compartments in met_ids do not make a seamless read, write, reread transfer.
Should this be happening? Should I change raven.mat or change the functions so it is rewritten the same as it was?

This seems to be a quirk of RAVEN format, so I don't know if it is even used apart from test cases.

cdiener · 2022-04-20T18:04:17Z

Yeah, I agree with you that this is odd. I would also favor maintaining IDs the same. There isn't really a rule that compartment has to be in a metabolite ID. It's a convention that many IDs follow, but you usually also want a real attribute for a compartment. I would vote for removing that substitution and using the original ID. My argument would be that it would be surprising if I wrote a COBRApy model to Matlab and suddenly all my metabolites got renamed.

made failure better added tests to fail when compartments are missing or problematic added test to see that mat models with no genes are imported correctly

akaviaLab · 2022-04-21T18:20:14Z

It looks like pepole who create mat files are sloppy and/or the behavior of the cobratoolbox changed over time. Specifically, the field names for annotations keep varying a bit on the capitalization
metChEBIID, metCHEBIID, and metChebiID are things that I've seen

Likewise for other fields.

Question - do we want cobrapy to be tolerant of this? If not, I'll go with the official description in https://github.com/opencobra/cobratoolbox/blob/master/docs/source/notes/COBRAModelFields.md

If so, do you have any idea on how to do case-insensitive matching?
Right now, I run
annotation_matlab = list(
set(mat_struct.dtype.names).intersection(D_REPLACE[_d_replace].keys())
)
Any ideas on how to do case-insensitive matching equivalent to this? I could do regular expression, but that seems like it would be complicated.

cdiener · 2022-04-21T18:24:04Z

Yeah, they run no validations at all it seems. You could convert all keys in the mat struct to lower case and match on those only, maybe?

akaviaLab · 2022-04-21T18:27:13Z

Okay. I'll output what is supposed to be the official version.

…ion of model fields (metChebiID, metCHEBIID, metChEBIID) - revised reading to be tolerant of this

…ssed in the CHEBI website. removed unnecessary function create_mat_metabolite_id()

…ames are ignored)

added tests - test_compare_xml_to_written_matlab_model, test_mat_model_with_long_compartment_ids, test_mat_model_wrong_caps

akaviaLab · 2022-04-25T14:41:34Z

All done. Including tests for all kinds of weird cases.

cdiener

Okay, only a bunch of formatting/variable naming changes. The code is getting really hard to review at this point due to large number of commits. So I would probably recommend to leave most of those for a clean-up PR afterwards. The only thing for now would be to remove the code in the docstring that generates the wrong key model. Then we can merge and follow up with a PR to fix the wrong annotations in SBML models and another one for the variable name cleanup.

src/cobra/test/test_io/test_mat.py

src/cobra/test/test_io/test_sbml.py

src/cobra/io/mat.py

release-notes/next-release.md

akaviaLab

Fixed. Some questions remaining.

src/cobra/io/mat.py

src/cobra/test/test_io/test_mat.py

src/cobra/test/test_io/test_sbml.py

src/cobra/test/test_io/test_mat.py

next release corrected removed instructions to gist

akaviaLab · 2022-04-26T21:13:26Z

@cdiener - reverted the test_sbml.py changes
Will change the models and everything in a later PR. I'll add a note to the new PR to be merged after this one.
Yuck - this is not just changing the XML files, but changing how they are read. I'll open a discussion for that.

cdiener · 2022-04-27T16:45:50Z

Actually, my bad. I read up on it and I think the lower case providers in SBML are now the de facto standard and are allowed by identifiers.org. So annotations["sbo"] is fine and the SBML parser will consolidate into this format (so transform from "SBO"). But we can discuss this more in another PR that addresses this because it also needs some adaptations for IDs.

akaviaLab · 2022-04-28T16:11:00Z

@cdiener - I think we're done. Would you like another developer to review this?
I have the various improvements and cleanup ready to go in another PR once this is merged.

cdiener

Let's get this in.

cdiener self-requested a review March 31, 2022 22:45

cdiener requested changes Apr 1, 2022

View reviewed changes

akaviaLab commented Apr 1, 2022

View reviewed changes

src/cobra/io/mat.py Outdated Show resolved Hide resolved

akaviaLab mentioned this pull request Apr 6, 2022

problem in opening .mat model with cobraPy #1190

Closed

cdiener reviewed Apr 8, 2022

View reviewed changes

src/cobra/io/mat.py Show resolved Hide resolved

uri.akavia added 12 commits April 8, 2022 16:03

speedup mat reading

e853060

using dictionary comprehension to be a bit faster

52e893d

vectorized metabolite processing and added annotations

06e5039

vectorize reaction processing

e58f9cf

functionalize cell processing

189b869

functionalize cell processing for floats

3e811ec

added docstring

bc52649

added rxn annotations, including pubmed and EC

f9c1b8a

added rxn annotations, including pubmed and EC

12a0c56

tidying

9020a70

functionalize annotations

c3d6faa

fixed minor mistake with REV

ebc58d6

uri.akavia added 4 commits April 20, 2022 09:50

some corrections according to cobratoolbox test models

a7331e6

added subsystem to groups. some tidying

e95b47d

removed unnecessary functions now need to update tests for coverage

a397c98

not using enumerate

02e70ac

added some tests

510bc23

raven style model should be written properly

e684f6f

made failure better added tests to fail when compartments are missing or problematic added test to see that mat models with no genes are imported correctly

uri.akavia and others added 7 commits April 22, 2022 17:25

cobratoolbox created models seem to be inconsistent about capitalizat…

8c75456

…ion of model fields (metChebiID, metCHEBIID, metChEBIID) - revised reading to be tolerant of this

CHEBI: shouldn't be removed from metCHEBIID since that how it is acce…

cd9a0c3

…ssed in the CHEBI website. removed unnecessary function create_mat_metabolite_id()

mat models seem to be read and rewritten correctly

548c17e

read gene names and make sure to add genes to model (otherwise gene n…

3b8d92f

…ames are ignored)

fix for old cobratoolbox fields

ffc57d8

added tests - test_compare_xml_to_written_matlab_model, test_mat_model_with_long_compartment_ids, test_mat_model_wrong_caps

save model name only if it exists

d2a9d4e

Merge branch 'devel' into improve_mat

a15fb54

cdiener requested changes Apr 26, 2022

View reviewed changes

akaviaLab commented Apr 26, 2022

View reviewed changes

uri.akavia added 2 commits April 26, 2022 14:26

variable name fixes

ef20327

next release corrected removed instructions to gist

revert changes in test_sbml.py

49cc180

cdiener approved these changes Apr 28, 2022

View reviewed changes

cdiener merged commit f5a1b96 into opencobra:devel Apr 28, 2022

This was referenced May 2, 2022

Gene-reaction matrix in pymatbridge #1004

Closed

Help! Cobra toolbox with CobraPy doesn't work #517

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve mat #1197

Improve mat #1197

akaviaLab commented Mar 31, 2022

codecov-commenter commented Mar 31, 2022 •

edited

Loading

cdiener commented Mar 31, 2022 •

edited

Loading

cdiener commented Apr 1, 2022 •

edited

Loading

akaviaLab commented Apr 1, 2022

cdiener commented Apr 1, 2022 •

edited

Loading

cdiener commented Apr 1, 2022

cdiener left a comment

akaviaLab left a comment

akaviaLab commented Apr 1, 2022

akaviaLab commented Apr 4, 2022

cdiener commented Apr 6, 2022

cdiener commented Apr 6, 2022 •

edited

Loading

akaviaLab commented Apr 7, 2022

cdiener commented Apr 7, 2022

akaviaLab commented Apr 7, 2022 via email

akaviaLab commented Apr 20, 2022

cdiener commented Apr 20, 2022

akaviaLab commented Apr 21, 2022

cdiener commented Apr 21, 2022

akaviaLab commented Apr 21, 2022

akaviaLab commented Apr 25, 2022

cdiener left a comment

akaviaLab left a comment

akaviaLab commented Apr 26, 2022 •

edited

Loading

cdiener commented Apr 27, 2022

akaviaLab commented Apr 28, 2022

cdiener left a comment

Improve mat #1197

Improve mat #1197

Conversation

akaviaLab commented Mar 31, 2022

codecov-commenter commented Mar 31, 2022 • edited Loading

Codecov Report

cdiener commented Mar 31, 2022 • edited Loading

cdiener commented Apr 1, 2022 • edited Loading

akaviaLab commented Apr 1, 2022

cdiener commented Apr 1, 2022 • edited Loading

cdiener commented Apr 1, 2022

cdiener left a comment

Choose a reason for hiding this comment

akaviaLab left a comment

Choose a reason for hiding this comment

akaviaLab commented Apr 1, 2022

akaviaLab commented Apr 4, 2022

cdiener commented Apr 6, 2022

cdiener commented Apr 6, 2022 • edited Loading

akaviaLab commented Apr 7, 2022

cdiener commented Apr 7, 2022

akaviaLab commented Apr 7, 2022 via email

akaviaLab commented Apr 20, 2022

cdiener commented Apr 20, 2022

akaviaLab commented Apr 21, 2022

cdiener commented Apr 21, 2022

akaviaLab commented Apr 21, 2022

akaviaLab commented Apr 25, 2022

cdiener left a comment

Choose a reason for hiding this comment

akaviaLab left a comment

Choose a reason for hiding this comment

akaviaLab commented Apr 26, 2022 • edited Loading

cdiener commented Apr 27, 2022

akaviaLab commented Apr 28, 2022

cdiener left a comment

Choose a reason for hiding this comment

codecov-commenter commented Mar 31, 2022 •

edited

Loading

cdiener commented Mar 31, 2022 •

edited

Loading

cdiener commented Apr 1, 2022 •

edited

Loading

cdiener commented Apr 1, 2022 •

edited

Loading

cdiener commented Apr 6, 2022 •

edited

Loading

akaviaLab commented Apr 26, 2022 •

edited

Loading