Skip to content

Commit

Permalink
Map jupyter tutorial on SBOL data model as is
Browse files Browse the repository at this point in the history
  • Loading branch information
PiotrZakrzewski committed Mar 22, 2024
1 parent 3f1f0f6 commit 9f9e41a
Showing 1 changed file with 260 additions and 40 deletions.
300 changes: 260 additions & 40 deletions docs/sbol_data_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,10 @@ It is available on PyPI and can be installed using pip.
This will also install `pySBOL3` and `tyto`, which are dependencies of `sbol_utilities`.

Using the Data Model
====================
Using the SBOLv3 Data Model
===========================

Import Modules
--------------
Import the necessary modules from the `sbol3` and `sbol_utilities` packages.

.. code-block:: python
Expand All @@ -37,82 +36,310 @@ Import Modules
from sbol_utilities.helper_functions import url_to_identity
import tyto
Set Default Namespace and Create Document
-----------------------------------------
We will use `igem` suffix as the default namespace for the examples in this tutorial.

.. code-block:: python
set_namespace('https://synbiohub.org/public/igem/')
doc = Document()
Create and Add Components
-------------------------
GFP Expression Cassette
=======================

Example: Creating a GFP Expression Cassette
Construct a simple part and add it to the Document.

.. code-block:: python
i13504 = Component('i13504', SBO_DNA)
i13504.name = 'iGEM 2016 interlab reporter'
i13504.description = 'GFP expression cassette used for 2016 iGEM interlab study'
i13504.roles.append(tyto.SO.engineered_region)
Add the GFP expression cassette to the document. Notice that the object added is also returned, so this can be used as a pass-through call.

.. code-block:: python
doc.add(i13504)
Construct Part-Subpart Hierarchy
--------------------------------
Expression Cassette parts
==========================

Here we will create a part-subpart hierarchy. We will also start using `SBOL-Utilities <https://github.com/synbiodex/sbol-utilities>` _ to make it easier to create parts and to assemble those parts into a hierarchy.
First, create the RBS component...

.. code-block:: python
b0034, b0034_seq = doc.add(rbs('B0034', sequence='aaagaggagaaa', name='RBS (Elowitz 1999)'))
e0040_sequence = '...'
Next, create the GFP component

.. code-block:: python
e0040_sequence = 'atgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcggttatggtgttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataataa'
e0040, _ = doc.add(cds('E0040', sequence=e0040_sequence, name='GFP'))
b0015_sequence = '...'
Finally, create the terminator component

.. code-block:: python
b0015_sequence = 'ccaggcatcaaataaaacgaaaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaacgctctctactagagtcacactggctcaccttcgggtgggcctttctgcgtttata'
b0015, _ = doc.add(terminator('B0015', sequence=b0015_sequence, name='double terminator'))
Now construct the part-subpart hierarchy and order the parts: RBS before CDS, CDS before terminator

.. code-block:: python
order(b0034, e0040, i13504)
order(e0040, b0015, i13504)
Linking Components with Interactions
------------------------------------
Location of a SubComponent
==========================

Here we add base coordinates to SubComponents.
But first, use compute_sequence to get the full sequence for the BBa_I13504 device
See http://parts.igem.org/Part:BBa_I13504

.. code-block:: python
i13504_seq = compute_sequence(i13504)
compute_sequence added Ranges to the subcomponents. Check one of those ranges to see that the values are what we expect.
The expected range of the terminator is (733, 861).

.. code-block:: python
b0015_subcomponent = next(f for f in i13504.features if f.instance_of == b0015.identity)
b0015_range = b0015_subcomponent.locations[0]
print(f'Range of {b0015.display_name}: ({b0015_range.start}, {b0015_range.end})')
GFP production from expression cassette
=======================================

In this example, we will create a system representation that includes DNA, proteins, and interactions.
First, create the system representation. functional_component creates this for us.

.. code-block:: python
i13504_system = functional_component('i13504_system')
doc.add(i13504_system)
The system has two physical subcomponents, the expression construct and the expressed GFP protein. We already created the expression construct. Now create the GFP protein. ed_protein creates an "externally defined protein"

.. code-block:: python
gfp = add_feature(i13504_system, ed_protein('https://www.fpbase.org/protein/gfpmut3/', name='GFP'))
Now create the part-subpart hierarchy.

.. code-block:: python
i13504_subcomponent = add_feature(i13504_system, i13504)
Use a ComponentReference to link SubComponents in a multi-level hierarchy.

.. code-block:: python
e0040_subcomponent = next(f for f in i13504.features if f.instance_of == e0040.identity)
e0040_reference = ComponentReference(i13504_subcomponent, e0040_subcomponent)
i13504_system.features.append(e0040_reference)
Make the Interaction.
Interaction type: SBO:0000589 (genetic production)
Participation roles: SBO:0000645 (template), SBO:0000011 (product)

.. code-block:: python
add_interaction(tyto.SBO.genetic_production,
participants={gfp: tyto.SBO.product, e0040_reference: tyto.SBO.template})
participants={gfp: tyto.SBO.product, e0040_reference: tyto.SBO.template})
Concatenating and Reusing Components
====================================

Connecting the i13504_system with promoters to drive expression is much like building i13504: selecting features and ordering them.
First, we create the two promoters:

.. code-block:: python
J23101_sequence = 'tttacagctagctcagtcctaggtattatgctagc'
J23101, _ = doc.add(promoter('J23101', sequence=J23101_sequence))
J23106_sequence = 'tttacggctagctcagtcctaggtatagtgctagc'
J23106, _ = doc.add(promoter('J23106', sequence=J23106_sequence))
Then we connect them to ComponentReference objects that reference the i13504 SubComponents.

.. code-block:: python
device1 = doc.add(functional_component('interlab16device1'))
device1_i13504_system = add_feature(device1, SubComponent(i13504_system))
order(J23101, ComponentReference(device1_i13504_system, i13504_subcomponent), device1)
device2 = doc.add(functional_component('interlab16device2'))
device2_i13504_system = add_feature(device2, SubComponent(i13504_system))
order(J23106, ComponentReference(device2_i13504_system, i13504_subcomponent), device2)
print(f'Device 1 second subcomponent points to {device1.constraints[0].object.lookup().refers_to.lookup().instance_of}')
Making a Collection
===================

We will just add the two devices that we built here, not all five on the slide.

.. code-block:: python
interlab16 = doc.add(Collection('interlab16',members=[device1, device2]))
print(f'Members are {", ".join(m.lookup().display_id for m in interlab16.members)}')
Creating Strains
================

Describing an engineered strain is much like the other components we have defined, just with different types.
First, we create Component objects for the DH5-a E. coli strain and the backbone vector we will use for the transfection.

.. code-block:: python
ecoli = doc.add(strain('Ecoli_DH5_alpha'))
pSB1C3 = doc.add(Component('pSB1C3', SBO_DNA, roles=[tyto.SO.plasmid_vector]))
Now create the engineered strain

Working with SubComponent Locations
-----------------------------------------
.. code-block:: python
device1_ecoli = doc.add(strain('device1_ecoli'))
Create a local description of the vector as the combination of Device 1 and pSB1C3.

.. code-block:: python
plasmid = LocalSubComponent(SBO_DNA, roles=[tyto.SO.plasmid_vector], name="Interlab Device 1 in pSB1C3")
device1_ecoli.features.append(plasmid)
device1_subcomponent = contains(plasmid, device1)
contains(plasmid, pSB1C3)
order(device1, pSB1C3, device1_ecoli)
And put the vector into the transformed strain

.. code-block:: python
contains(ecoli, plasmid, device1_ecoli)
Defining an abstract interface
==============================

To refer to the GFP, we need to peer down two levels of hierarchy

.. code-block:: python
gfp_in_i13504_system = add_feature(device1_ecoli, ComponentReference(in_child_of=device1_i13504_system, refers_to=gfp))
gfp_in_strain = add_feature(device1_ecoli, ComponentReference(in_child_of=device1_subcomponent, refers_to=gfp_in_i13504_system))
device1_ecoli.interface = Interface(outputs=[gfp_in_strain])
Linking to a Model
==================

.. code-block:: python
ode_model = doc.add(Model('my_iBioSIM_ODE', 'https://synbiohub...', tyto.EDAM.SBML, tyto.SBO.continuous_framework))
device1_ecoli.models.append(ode_model)
Describing an experimental condition
====================================

First, define M9 media from its recipe. In this case, unfortunately, tyto has a hard time with ambiguities in the catalog, so we have to look up the PubMed compound IDs directly.

.. code-block:: python
pubchem_water = 'https://identifiers.org/pubchem.compound:962'
pubchem_glucose = 'https://identifiers.org/pubchem.compound:5793'
pubchem_MgSO4 = 'https://identifiers.org/pubchem.compound:24083'
pubchem_CaCl2 = 'https://identifiers.org/pubchem.compound:5284359'
The media recipe can be expressed using a map from ingredients to Measure objects:

.. code-block:: python
m9_minimal_media_recipe = {
LocalSubComponent(SBO_FUNCTIONAL_ENTITY, name="M9 salts"): (20, tyto.OM.milliliter),
ed_simple_chemical(pubchem_water): (78, tyto.OM.milliliter),
ed_simple_chemical(pubchem_glucose): (2, tyto.OM.milliliter),
ed_simple_chemical(pubchem_MgSO4): (200, tyto.OM.microliter),
ed_simple_chemical(pubchem_CaCl2): (10, tyto.OM.microliter)
}
m9_media = doc.add(media("M9_media", m9_minimal_media_recipe))
Then we do the same to describe the sample as a mixture of cells, media, and additional carbon source:

In order to specify the exact range (start and end positions) on the parent component sequence where the child
component is located, use the ``Range`` class. The ``Range`` class takes two required arguments, ``start`` and
``end``, which are the start and end positions of the child component on the parent component sequence.
The ``Range`` class also takes an optional argument, ``sequence``, which is the sequence of the child component.
The ``Range`` class is then used as the value of the ``locations`` attribute of the ``SubComponent``.
Example for a DNA component with a DNA SubComponent:
.. code-block:: python
sample1 = doc.add(functional_component("Sample1"))
add_feature(sample1, m9_media).measures.append(Measure(200, tyto.OM.microliter, types=tyto.SBO.volume))
add_feature(sample1, device1_ecoli).measures.append(Measure(10000, tyto.OM.count, types=tyto.SBO.number_of_entity_pool_constituents))
add_feature(sample1, ed_simple_chemical(pubchem_glucose)).measures.append(Measure(2.5, tyto.OM.milligram, types=tyto.SBO.mass_of_an_entity_pool))
Designing a multi-factor experiment
===================================

Here we will use a CombinatorialDerivation

First, we create the template Component, using LocalSubComponent placeholders for the variables to fill in, following much the same pattern as for the single sample:

.. code-block:: python
template = doc.add(functional_component("SampleSpec"))
add_feature(template, m9_media).measures.append(Measure(200, tyto.OM.microliter, types=tyto.SBO.volume))
sample_strain = add_feature(template, LocalSubComponent(tyto.NCIT.Strain))
sample_strain.measures.append(Measure(10000, tyto.OM.count, types=tyto.SBO.number_of_entity_pool_constituents))
sample_carbon_source = add_feature(template, LocalSubComponent(SBO_SIMPLE_CHEMICAL))
sample_carbon_source.measures.append(Measure(2.5, tyto.OM.milligram, types=tyto.SBO.mass_of_an_entity_pool))
.. code:: python
For this, we need our sugars to be Component objects that can be referenced independently from the CombinatorialDerivation, rather than Features:

.. code-block:: python
pubchem_arabinose = 'https://identifiers.org/pubchem.compound:5460291'
pubchem_maltose = 'https://identifiers.org/pubchem.compound:6255'
pubchem_lactose = 'https://identifiers.org/pubchem.compound:6134'
arabinose = doc.add(Component(url_to_identity(pubchem_arabinose), SBO_SIMPLE_CHEMICAL))
glucose = doc.add(Component(url_to_identity(pubchem_glucose), SBO_SIMPLE_CHEMICAL))
maltose = doc.add(Component(url_to_identity(pubchem_maltose), SBO_SIMPLE_CHEMICAL))
lactose = doc.add(Component(url_to_identity(pubchem_lactose), SBO_SIMPLE_CHEMICAL))
Then we create the derivation itself as a combination of alternatives:

.. code-block:: python
start = 1
end = 4
sub_sequence = sbol3.Sequence("LysineCodon", elements=b0034_seq.elements[start - 1 : end - 1])
range_location = sbol3.Range(start=start, end=end, sequence=sub_sequence)
subcomponent = sbol3.SubComponent(gfp, name="LysineCodon", roles=[tyto.SO.codon], locations=range_location)
carbon_source_experiment = CombinatorialDerivation("VaryCarbon", template, strategy=SBOL_ENUMERATE)
carbon_source_experiment.variable_features = [
VariableFeature(cardinality=SBOL_ONE, variable=sample_strain, variant_collections=[interlab16]),
VariableFeature(cardinality=SBOL_ONE, variable=sample_carbon_source, variants=[arabinose, glucose, maltose, lactose])
]
.. end
Samples in Triplicate
=====================

Document Validation
-------------------
Each sample is represented by an Implementation, to which we attach and FCS file with flow cytometry data from the sample.

.. code-block:: python
replicate1 = doc.add(Implementation("Replicate1", built=sample1))
replicate1.attachments.append(doc.add(Attachment("Replicate1_cytometry_fcs", "https://...")))
replicate2 = doc.add(Implementation("Replicate2", built=sample1))
replicate2.attachments.append(doc.add(Attachment("Replicate2_cytometry_fcs", "https://...")))
replicate3 = doc.add(Implementation("Replicate3", built=sample1))
replicate3.attachments.append(doc.add(Attachment("Replicate3_cytometry_fcs", "https://...")))
Using Provenance to Connect Design, Build and Test
==================================================

We will show how to do one representative link here:

.. code-block:: python
measure_sample_1 = doc.add(Activity("measure_sample_1", types=tyto.NCIT.flow_cytometry, usage=Usage(replicate1.identity)))
doc.find("Replicate1_cytometry_fcs").generated_by.append(measure_sample_1)
Validation
==========

Document.validate returns a validation report. If the report is empty, the document is valid.

.. code-block:: python
Expand All @@ -123,10 +350,3 @@ Document Validation
print(f'Document has {len(report.warnings)} warnings')
else:
print('Document is valid')
Exporting the Document
----------------------

.. code-block:: python
doc.write('i13504.nt', file_format=SORTED_NTRIPLES)

0 comments on commit 9f9e41a

Please sign in to comment.