diff --git a/docs/sbol_data_model.rst b/docs/sbol_data_model.rst index 5b76d7a..94f3d80 100644 --- a/docs/sbol_data_model.rst +++ b/docs/sbol_data_model.rst @@ -23,11 +23,10 @@ It is available on PyPI and can be installed using pip. This will also install `pySBOL3` and `tyto`, which are dependencies of `sbol_utilities`. -Using the Data Model -==================== +Using the SBOLv3 Data Model +=========================== -Import Modules --------------- +Import the necessary modules from the `sbol3` and `sbol_utilities` packages. .. code-block:: python @@ -37,18 +36,17 @@ Import Modules from sbol_utilities.helper_functions import url_to_identity import tyto -Set Default Namespace and Create Document ------------------------------------------ +We will use `igem` suffix as the default namespace for the examples in this tutorial. .. code-block:: python set_namespace('https://synbiohub.org/public/igem/') doc = Document() -Create and Add Components -------------------------- +GFP Expression Cassette +======================= -Example: Creating a GFP Expression Cassette +Construct a simple part and add it to the Document. .. code-block:: python @@ -56,63 +54,292 @@ Example: Creating a GFP Expression Cassette i13504.name = 'iGEM 2016 interlab reporter' i13504.description = 'GFP expression cassette used for 2016 iGEM interlab study' i13504.roles.append(tyto.SO.engineered_region) + +Add the GFP expression cassette to the document. Notice that the object added is also returned, so this can be used as a pass-through call. + +.. code-block:: python + doc.add(i13504) -Construct Part-Subpart Hierarchy --------------------------------- +Expression Cassette parts +========================== + +Here we will create a part-subpart hierarchy. We will also start using `SBOL-Utilities ` _ to make it easier to create parts and to assemble those parts into a hierarchy. +First, create the RBS component... .. code-block:: python b0034, b0034_seq = doc.add(rbs('B0034', sequence='aaagaggagaaa', name='RBS (Elowitz 1999)')) - e0040_sequence = '...' + +Next, create the GFP component + +.. code-block:: python + + e0040_sequence = 'atgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcggttatggtgttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataataa' e0040, _ = doc.add(cds('E0040', sequence=e0040_sequence, name='GFP')) - b0015_sequence = '...' + +Finally, create the terminator component + +.. code-block:: python + + b0015_sequence = 'ccaggcatcaaataaaacgaaaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaacgctctctactagagtcacactggctcaccttcgggtgggcctttctgcgtttata' b0015, _ = doc.add(terminator('B0015', sequence=b0015_sequence, name='double terminator')) +Now construct the part-subpart hierarchy and order the parts: RBS before CDS, CDS before terminator + +.. code-block:: python + order(b0034, e0040, i13504) order(e0040, b0015, i13504) -Linking Components with Interactions ------------------------------------- +Location of a SubComponent +========================== + +Here we add base coordinates to SubComponents. +But first, use compute_sequence to get the full sequence for the BBa_I13504 device +See http://parts.igem.org/Part:BBa_I13504 + +.. code-block:: python + + i13504_seq = compute_sequence(i13504) + +compute_sequence added Ranges to the subcomponents. Check one of those ranges to see that the values are what we expect. +The expected range of the terminator is (733, 861). + +.. code-block:: python + + b0015_subcomponent = next(f for f in i13504.features if f.instance_of == b0015.identity) + b0015_range = b0015_subcomponent.locations[0] + print(f'Range of {b0015.display_name}: ({b0015_range.start}, {b0015_range.end})') + +GFP production from expression cassette +======================================= + +In this example, we will create a system representation that includes DNA, proteins, and interactions. +First, create the system representation. functional_component creates this for us. .. code-block:: python i13504_system = functional_component('i13504_system') doc.add(i13504_system) +The system has two physical subcomponents, the expression construct and the expressed GFP protein. We already created the expression construct. Now create the GFP protein. ed_protein creates an "externally defined protein" + +.. code-block:: python + gfp = add_feature(i13504_system, ed_protein('https://www.fpbase.org/protein/gfpmut3/', name='GFP')) + +Now create the part-subpart hierarchy. + +.. code-block:: python + i13504_subcomponent = add_feature(i13504_system, i13504) +Use a ComponentReference to link SubComponents in a multi-level hierarchy. + +.. code-block:: python + e0040_subcomponent = next(f for f in i13504.features if f.instance_of == e0040.identity) e0040_reference = ComponentReference(i13504_subcomponent, e0040_subcomponent) i13504_system.features.append(e0040_reference) +Make the Interaction. +Interaction type: SBO:0000589 (genetic production) +Participation roles: SBO:0000645 (template), SBO:0000011 (product) + +.. code-block:: python + add_interaction(tyto.SBO.genetic_production, - participants={gfp: tyto.SBO.product, e0040_reference: tyto.SBO.template}) + participants={gfp: tyto.SBO.product, e0040_reference: tyto.SBO.template}) + +Concatenating and Reusing Components +==================================== + +Connecting the i13504_system with promoters to drive expression is much like building i13504: selecting features and ordering them. +First, we create the two promoters: + +.. code-block:: python + + J23101_sequence = 'tttacagctagctcagtcctaggtattatgctagc' + J23101, _ = doc.add(promoter('J23101', sequence=J23101_sequence)) + J23106_sequence = 'tttacggctagctcagtcctaggtatagtgctagc' + J23106, _ = doc.add(promoter('J23106', sequence=J23106_sequence)) + +Then we connect them to ComponentReference objects that reference the i13504 SubComponents. + +.. code-block:: python + + device1 = doc.add(functional_component('interlab16device1')) + device1_i13504_system = add_feature(device1, SubComponent(i13504_system)) + order(J23101, ComponentReference(device1_i13504_system, i13504_subcomponent), device1) + device2 = doc.add(functional_component('interlab16device2')) + device2_i13504_system = add_feature(device2, SubComponent(i13504_system)) + order(J23106, ComponentReference(device2_i13504_system, i13504_subcomponent), device2) + print(f'Device 1 second subcomponent points to {device1.constraints[0].object.lookup().refers_to.lookup().instance_of}') + +Making a Collection +=================== + +We will just add the two devices that we built here, not all five on the slide. + +.. code-block:: python + + interlab16 = doc.add(Collection('interlab16',members=[device1, device2])) + print(f'Members are {", ".join(m.lookup().display_id for m in interlab16.members)}') + +Creating Strains +================ + +Describing an engineered strain is much like the other components we have defined, just with different types. +First, we create Component objects for the DH5-a E. coli strain and the backbone vector we will use for the transfection. + +.. code-block:: python + + ecoli = doc.add(strain('Ecoli_DH5_alpha')) + pSB1C3 = doc.add(Component('pSB1C3', SBO_DNA, roles=[tyto.SO.plasmid_vector])) +Now create the engineered strain -Working with SubComponent Locations ------------------------------------------ +.. code-block:: python + + device1_ecoli = doc.add(strain('device1_ecoli')) + +Create a local description of the vector as the combination of Device 1 and pSB1C3. + +.. code-block:: python + + plasmid = LocalSubComponent(SBO_DNA, roles=[tyto.SO.plasmid_vector], name="Interlab Device 1 in pSB1C3") + device1_ecoli.features.append(plasmid) + device1_subcomponent = contains(plasmid, device1) + contains(plasmid, pSB1C3) + order(device1, pSB1C3, device1_ecoli) + +And put the vector into the transformed strain + +.. code-block:: python + + contains(ecoli, plasmid, device1_ecoli) + +Defining an abstract interface +============================== + +To refer to the GFP, we need to peer down two levels of hierarchy + +.. code-block:: python + + gfp_in_i13504_system = add_feature(device1_ecoli, ComponentReference(in_child_of=device1_i13504_system, refers_to=gfp)) + gfp_in_strain = add_feature(device1_ecoli, ComponentReference(in_child_of=device1_subcomponent, refers_to=gfp_in_i13504_system)) + device1_ecoli.interface = Interface(outputs=[gfp_in_strain]) + +Linking to a Model +================== + +.. code-block:: python + + ode_model = doc.add(Model('my_iBioSIM_ODE', 'https://synbiohub...', tyto.EDAM.SBML, tyto.SBO.continuous_framework)) + device1_ecoli.models.append(ode_model) + +Describing an experimental condition +==================================== + +First, define M9 media from its recipe. In this case, unfortunately, tyto has a hard time with ambiguities in the catalog, so we have to look up the PubMed compound IDs directly. + +.. code-block:: python + + pubchem_water = 'https://identifiers.org/pubchem.compound:962' + pubchem_glucose = 'https://identifiers.org/pubchem.compound:5793' + pubchem_MgSO4 = 'https://identifiers.org/pubchem.compound:24083' + pubchem_CaCl2 = 'https://identifiers.org/pubchem.compound:5284359' + +The media recipe can be expressed using a map from ingredients to Measure objects: + +.. code-block:: python + + m9_minimal_media_recipe = { + LocalSubComponent(SBO_FUNCTIONAL_ENTITY, name="M9 salts"): (20, tyto.OM.milliliter), + ed_simple_chemical(pubchem_water): (78, tyto.OM.milliliter), + ed_simple_chemical(pubchem_glucose): (2, tyto.OM.milliliter), + ed_simple_chemical(pubchem_MgSO4): (200, tyto.OM.microliter), + ed_simple_chemical(pubchem_CaCl2): (10, tyto.OM.microliter) + } + m9_media = doc.add(media("M9_media", m9_minimal_media_recipe)) + +Then we do the same to describe the sample as a mixture of cells, media, and additional carbon source: -In order to specify the exact range (start and end positions) on the parent component sequence where the child -component is located, use the ``Range`` class. The ``Range`` class takes two required arguments, ``start`` and -``end``, which are the start and end positions of the child component on the parent component sequence. -The ``Range`` class also takes an optional argument, ``sequence``, which is the sequence of the child component. -The ``Range`` class is then used as the value of the ``locations`` attribute of the ``SubComponent``. -Example for a DNA component with a DNA SubComponent: +.. code-block:: python + + sample1 = doc.add(functional_component("Sample1")) + add_feature(sample1, m9_media).measures.append(Measure(200, tyto.OM.microliter, types=tyto.SBO.volume)) + add_feature(sample1, device1_ecoli).measures.append(Measure(10000, tyto.OM.count, types=tyto.SBO.number_of_entity_pool_constituents)) + add_feature(sample1, ed_simple_chemical(pubchem_glucose)).measures.append(Measure(2.5, tyto.OM.milligram, types=tyto.SBO.mass_of_an_entity_pool)) + +Designing a multi-factor experiment +=================================== + +Here we will use a CombinatorialDerivation + +First, we create the template Component, using LocalSubComponent placeholders for the variables to fill in, following much the same pattern as for the single sample: + +.. code-block:: python + + template = doc.add(functional_component("SampleSpec")) + add_feature(template, m9_media).measures.append(Measure(200, tyto.OM.microliter, types=tyto.SBO.volume)) + sample_strain = add_feature(template, LocalSubComponent(tyto.NCIT.Strain)) + sample_strain.measures.append(Measure(10000, tyto.OM.count, types=tyto.SBO.number_of_entity_pool_constituents)) + sample_carbon_source = add_feature(template, LocalSubComponent(SBO_SIMPLE_CHEMICAL)) + sample_carbon_source.measures.append(Measure(2.5, tyto.OM.milligram, types=tyto.SBO.mass_of_an_entity_pool)) -.. code:: python +For this, we need our sugars to be Component objects that can be referenced independently from the CombinatorialDerivation, rather than Features: + +.. code-block:: python + + pubchem_arabinose = 'https://identifiers.org/pubchem.compound:5460291' + pubchem_maltose = 'https://identifiers.org/pubchem.compound:6255' + pubchem_lactose = 'https://identifiers.org/pubchem.compound:6134' + + arabinose = doc.add(Component(url_to_identity(pubchem_arabinose), SBO_SIMPLE_CHEMICAL)) + glucose = doc.add(Component(url_to_identity(pubchem_glucose), SBO_SIMPLE_CHEMICAL)) + maltose = doc.add(Component(url_to_identity(pubchem_maltose), SBO_SIMPLE_CHEMICAL)) + lactose = doc.add(Component(url_to_identity(pubchem_lactose), SBO_SIMPLE_CHEMICAL)) + +Then we create the derivation itself as a combination of alternatives: + +.. code-block:: python - start = 1 - end = 4 - sub_sequence = sbol3.Sequence("LysineCodon", elements=b0034_seq.elements[start - 1 : end - 1]) - range_location = sbol3.Range(start=start, end=end, sequence=sub_sequence) - subcomponent = sbol3.SubComponent(gfp, name="LysineCodon", roles=[tyto.SO.codon], locations=range_location) + carbon_source_experiment = CombinatorialDerivation("VaryCarbon", template, strategy=SBOL_ENUMERATE) + carbon_source_experiment.variable_features = [ + VariableFeature(cardinality=SBOL_ONE, variable=sample_strain, variant_collections=[interlab16]), + VariableFeature(cardinality=SBOL_ONE, variable=sample_carbon_source, variants=[arabinose, glucose, maltose, lactose]) + ] -.. end +Samples in Triplicate +===================== -Document Validation -------------------- +Each sample is represented by an Implementation, to which we attach and FCS file with flow cytometry data from the sample. + +.. code-block:: python + + replicate1 = doc.add(Implementation("Replicate1", built=sample1)) + replicate1.attachments.append(doc.add(Attachment("Replicate1_cytometry_fcs", "https://..."))) + replicate2 = doc.add(Implementation("Replicate2", built=sample1)) + replicate2.attachments.append(doc.add(Attachment("Replicate2_cytometry_fcs", "https://..."))) + replicate3 = doc.add(Implementation("Replicate3", built=sample1)) + replicate3.attachments.append(doc.add(Attachment("Replicate3_cytometry_fcs", "https://..."))) + +Using Provenance to Connect Design, Build and Test +================================================== + +We will show how to do one representative link here: + +.. code-block:: python + + measure_sample_1 = doc.add(Activity("measure_sample_1", types=tyto.NCIT.flow_cytometry, usage=Usage(replicate1.identity))) + doc.find("Replicate1_cytometry_fcs").generated_by.append(measure_sample_1) + +Validation +========== + +Document.validate returns a validation report. If the report is empty, the document is valid. .. code-block:: python @@ -123,10 +350,3 @@ Document Validation print(f'Document has {len(report.warnings)} warnings') else: print('Document is valid') - -Exporting the Document ----------------------- - -.. code-block:: python - - doc.write('i13504.nt', file_format=SORTED_NTRIPLES)