Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBOL 3->2 needs to remap sequence encodings and component types #16

Open
jakebeal opened this issue Sep 10, 2021 · 8 comments
Open

SBOL 3->2 needs to remap sequence encodings and component types #16

jakebeal opened this issue Sep 10, 2021 · 8 comments

Comments

@jakebeal
Copy link

My current workaround in python:

    # remap sequence encodings:
    encoding_remapping = {
        sbol3.IUPAC_DNA_ENCODING: sbol2.SBOL_ENCODING_IUPAC,
        sbol3.IUPAC_PROTEIN_ENCODING: sbol2.SBOL_ENCODING_IUPAC_PROTEIN,
        sbol3.SMILES_ENCODING: sbol3.SMILES_ENCODING
    }
    for s in (o for o in doc3.objects if isinstance(o, sbol3.Sequence)):
        if s.encoding in encoding_remapping:
            s.encoding = encoding_remapping[s.encoding]
    # remap component types:
    type_remapping = {
        sbol3.SBO_DNA: sbol2.BIOPAX_DNA,
        sbol3.SBO_RNA: sbol2.BIOPAX_RNA,
        sbol3.SBO_PROTEIN: sbol2.BIOPAX_PROTEIN,
        sbol3.SBO_SIMPLE_CHEMICAL: sbol2.BIOPAX_SMALL_MOLECULE,
        sbol3.SBO_NON_COVALENT_COMPLEX: sbol2.BIOPAX_COMPLEX
    }
    for c in (o for o in doc3.objects if isinstance(o, sbol3.Component)):
        c.types = [(type_remapping[t] if t in type_remapping else t) for t in c.types]

@jakebeal
Copy link
Author

orientations also appear to be failing in the same manner.

@isaacguerreiros
Copy link

take a look at this issue today, and looks like we should transfer this remapping you made to sbolgraph.

the workaround @jakebeal have made could be found here: https://github.com/iGEM-Engineering/iGEM-distribution/blob/a697bfcb9da4db38da07e19b379968013f284a35/scripts/scriptutils/conversions.py#L121

the encoding constants could be found here and here. should we move these constants to sbolgraph or most of them will be unnecessary? bioterms doesn't have the encoding constants either

so, last but not least, I clone sbolgraph repo, run npm install and npm test, and apparently shows up the project doesn't have any test. after some minutes I clone the SBOLTestSuite, inside the sbolgraph, and tried to run again npm test but the only output i received was this:

[email protected] test                                                                                                                                                              
bash test.sh
🔄 Converting file: SBOLTestSuite/GenBank/EF587312.gb  

...and that's it.

i think will be interesting to be able to run and create some tests for this issue and also know if it's necessary to move the constants to sbolgraph.

@jakebeal
Copy link
Author

@isaacguerreiros I believe that the conversion tests in sbol-utilities will be good set of test cases to use here. The same conversions should be true, given that this issue is essentially asking for the corrective RDF changes in that library to be brought upstream into this library.

With regards to the constants --- anything that appears in the SBOL specification is, I think, fine to encode in the library. If you disagree, @udp , please comment.

@isaacguerreiros : do you need any other information in order to proceed?

@isaacguerreir
Copy link

I analyzed some of the code, and apparently SBOL Specification constants from pySBOL3 and bioterms are different. Bioterms have the same URI for encoding in SBOL2 and SBOL3 (see permalinks for the exact lines) while pySBOL3 have different identifiers from identifiers.org.

For me, looks like if we make bioterms specifiers and pySBOL3 specifiers for encoding equal it will be not necessary anymore to remap sequence encodings.

My understanding is: because the bioterms specification of SBOL3 and SBOL2 for encoding is the same as the pySBOL2, looks like it's important to convert making this remapping. But maybe if bioterms and pySBOL3 agree with the specification for encoding this remapping step will be unnecessary.

My pull request in bioterms is my attempt to resolve this.

Also, will be interesting to start discussing how I could test this change #19

@isaacguerreir
Copy link

Last, but not least: I could not find the smiles encoding at bioterms or sbolgraph. Is this a concern? At least, by looking the remapping you made @jakebeal, this could be a problem.

@jakebeal
Copy link
Author

jakebeal commented Feb 3, 2022

@isaacguerreir The pySBOL3 constants follow the SBOL 3.0.1 specification. If I'm understanding the constants file here correctly, it looks like the terms you identify just didn't get updated to their new values yet.

Also agree that it looks like the smiles term just isn't there; I don't see it anywhere in the library with a search.

@isaacguerreir
Copy link

isaacguerreir commented Feb 3, 2022

Perfect. So the bioterms pull request could resolve the first part of the problem.

@isaacguerreir
Copy link

Take a look and the same problem occurs at SBOL3 Specification for Types. Added similar changes in the PR to correct the problem with type remapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants