-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify insertion of sequences (sub-parts) in larger Sequences (e.g. plasmids, genomes) #6
Comments
I think you might be able to do this. Assume you want to insert a sequence of 4 bases within another of 8 bases in the middle. You could do this as follows:
CD insert
Sequence agct
CD original
Sequence ggggcccc
CD final
Component insert
Component original
SA
Location 1..4
Location 9..12
Component original
SA
Location 5..8
Component insert
Sequence ggggagctcccc
This specifies that the original sequence can be found split over Locations 1..4 and 9..12, while the insert found at Location 5..8.
… On Jan 12, 2017, at 4:44 PM, Raik Grünberg ***@***.***> wrote:
Currently there is no clear way how to specify that a sub-part (ComponentDefinition) is inserted at a specific location within another ComponentDefinition.
This would be important for the description of genome editing experiments but also regular plasmid constructs and would allow to reference the larger original genome without copying it.
Insertion should also include the possibility of replacing part of the original genome.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#6>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADWD99GQ_P22ZXUra_JZV9H4BL07_2tWks5rRlhWgaJpZM4Lh-I1>.
|
Nice example. I would re-order things a little bit. According to spec,
Location always needs a sequence on the *parent* ComponentDefinition that
it refers to. (Though I guess we allow that this sequence is missing.) If
we formulate your example from parent to daughters, I think the current
model works:
CD final
Sequence ggggagctcccc
Component backbone
SA
Location 1..4
Location 9..12
Component backbone
Component insert
SA
Location 5..8
Component insert
CD insert
Sequence agct (optional)
CD backbone
Sequence ggggcccc (optional)
The question is: Can we leave out the sequence of "final" at the top and
still assume tools will know exactly how to assemble it from the
sub-sequences? I guess this also relates with the discussion of fully
defined sequences.
BTW, for any features or sub-parts within the backbone, Locations will map
to the ggggcccc "backbone" sequence. So it is not really all that optional.
… This specifies that the original sequence can be found split over
Locations 1..4 and 9..12, while the insert found at Location 5..8.
> On Jan 12, 2017, at 4:44 PM, Raik Grünberg ***@***.***>
wrote:
>
> Currently there is no clear way how to specify that a sub-part
(ComponentDefinition) is inserted at a specific location within another
ComponentDefinition.
>
> This would be important for the description of genome editing
experiments but also regular plasmid constructs and would allow to
reference the larger original genome without copying it.
>
> Insertion should also include the possibility of replacing part of the
original genome.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub <
#6>, or mute the
thread <https://github.com/notifications/unsubscribe-
auth/ADWD99GQ_P22ZXUra_JZV9H4BL07_2tWks5rRlhWgaJpZM4Lh-I1>.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#6>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABxs3UbJ5oecEQ7c-1C_hij_Sm54eNGMks5rRwvMgaJpZM4Lh-I1>
.
--
___________________________________
Raik Grünberg
http://www.raiks.de/contact.html
___________________________________
|
I tend to like to declare before use, which is why I put the “daughters” before “parents”, but it is fine either way. I also list Components together and SAs together, since that is how they are serialized, but again, not required.
All sequences are optional though would violate some best practice and completeness rules. I would assume that tooling would be able to create the top sequence for you given the daughter sequences. We do this in SBOLDesigner for abutted parts. Of course, this would only work if the daughter sequences are provided.
Chris
… On Jan 13, 2017, at 9:47 AM, Raik Grünberg ***@***.***> wrote:
Nice example. I would re-order things a little bit. According to spec,
Location always needs a sequence on the *parent* ComponentDefinition that
it refers to. (Though I guess we allow that this sequence is missing.) If
we formulate your example from parent to daughters, I think the current
model works:
CD final
Sequence ggggagctcccc
Component backbone
SA
Location 1..4
Location 9..12
Component backbone
Component insert
SA
Location 5..8
Component insert
CD insert
Sequence agct (optional)
CD backbone
Sequence ggggcccc (optional)
The question is: Can we leave out the sequence of "final" at the top and
still assume tools will know exactly how to assemble it from the
sub-sequences? I guess this also relates with the discussion of fully
defined sequences.
BTW, for any features or sub-parts within the backbone, Locations will map
to the ggggcccc "backbone" sequence. So it is not really all that optional.
> This specifies that the original sequence can be found split over
> Locations 1..4 and 9..12, while the insert found at Location 5..8.
>
> > On Jan 12, 2017, at 4:44 PM, Raik Grünberg ***@***.***>
> wrote:
> >
> > Currently there is no clear way how to specify that a sub-part
> (ComponentDefinition) is inserted at a specific location within another
> ComponentDefinition.
> >
> > This would be important for the description of genome editing
> experiments but also regular plasmid constructs and would allow to
> reference the larger original genome without copying it.
> >
> > Insertion should also include the possibility of replacing part of the
> original genome.
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub <
> #6>, or mute the
> thread <https://github.com/notifications/unsubscribe-
> auth/ADWD99GQ_P22ZXUra_JZV9H4BL07_2tWks5rRlhWgaJpZM4Lh-I1>.
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#6>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABxs3UbJ5oecEQ7c-1C_hij_Sm54eNGMks5rRwvMgaJpZM4Lh-I1>
> .
>
--
___________________________________
Raik Grünberg
http://www.raiks.de/contact.html
___________________________________
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#6>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADWD98G_r62Pfc0AKzZK_o7vG1VmTcYwks5rR0hFgaJpZM4Lh-I1>.
|
OK, so does that mean this is basically a solved issue? In which case we
should put a paragraph into the spec showing how it is done.
Which best practices or validation rules would be violated?
…On Fri, Jan 13, 2017 at 1:01 PM, cjmyers ***@***.***> wrote:
I tend to like to declare before use, which is why I put the “daughters”
before “parents”, but it is fine either way. I also list Components
together and SAs together, since that is how they are serialized, but
again, not required.
All sequences are optional though would violate some best practice and
completeness rules. I would assume that tooling would be able to create the
top sequence for you given the daughter sequences. We do this in
SBOLDesigner for abutted parts. Of course, this would only work if the
daughter sequences are provided.
Chris
> On Jan 13, 2017, at 9:47 AM, Raik Grünberg ***@***.***>
wrote:
>
> Nice example. I would re-order things a little bit. According to spec,
> Location always needs a sequence on the *parent* ComponentDefinition that
> it refers to. (Though I guess we allow that this sequence is missing.) If
> we formulate your example from parent to daughters, I think the current
> model works:
>
> CD final
> Sequence ggggagctcccc
>
> Component backbone
> SA
> Location 1..4
> Location 9..12
> Component backbone
>
> Component insert
> SA
> Location 5..8
> Component insert
>
> CD insert
> Sequence agct (optional)
>
> CD backbone
> Sequence ggggcccc (optional)
>
> The question is: Can we leave out the sequence of "final" at the top and
> still assume tools will know exactly how to assemble it from the
> sub-sequences? I guess this also relates with the discussion of fully
> defined sequences.
>
> BTW, for any features or sub-parts within the backbone, Locations will
map
> to the ggggcccc "backbone" sequence. So it is not really all that
optional.
>
>
>
> > This specifies that the original sequence can be found split over
> > Locations 1..4 and 9..12, while the insert found at Location 5..8.
> >
> > > On Jan 12, 2017, at 4:44 PM, Raik Grünberg ***@***.***
>
> > wrote:
> > >
> > > Currently there is no clear way how to specify that a sub-part
> > (ComponentDefinition) is inserted at a specific location within another
> > ComponentDefinition.
> > >
> > > This would be important for the description of genome editing
> > experiments but also regular plasmid constructs and would allow to
> > reference the larger original genome without copying it.
> > >
> > > Insertion should also include the possibility of replacing part of
the
> > original genome.
> > >
> > > —
> > > You are receiving this because you are subscribed to this thread.
> > > Reply to this email directly, view it on GitHub <
> > #6>, or mute
the
> > thread <https://github.com/notifications/unsubscribe-
> > auth/ADWD99GQ_P22ZXUra_JZV9H4BL07_2tWks5rRlhWgaJpZM4Lh-I1>.
> > >
> >
> > —
> > You are receiving this because you authored the thread.
> > Reply to this email directly, view it on GitHub
> > <#6#
issuecomment-272364604>,
> > or mute the thread
> > <https://github.com/notifications/unsubscribe-
auth/ABxs3UbJ5oecEQ7c-1C_hij_Sm54eNGMks5rRwvMgaJpZM4Lh-I1>
> > .
> >
>
>
>
> --
> ___________________________________
> Raik Grünberg
> http://www.raiks.de/contact.html
> ___________________________________
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub <
#6#
issuecomment-272402520>, or mute the thread <https://github.com/
notifications/unsubscribe-auth/ADWD98G_r62Pfc0AKzZK_
o7vG1VmTcYwks5rR0hFgaJpZM4Lh-I1>.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#6>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABxs3RWloAxcgWhhwJvGtRs4WGLhS1NCks5rR0txgaJpZM4Lh-I1>
.
--
___________________________________
Raik Grünberg
http://www.raiks.de/contact.html
___________________________________
|
Left a comment on sbol-dev. Problems I still see are (1) how to formulate replacements, and (2) that all lengths of all components need to be known which makes it likely that things break if sub-parts records change. |
Yes, I think this case is covered, and adding an example to the specification would certainly be useful.
As to validation rules, I’m not sure now. I looked, and it appears that the language might be such that they are not violated. For example,
sbol-10523: If the sequences property of a ComponentDefinition refers to a Sequence with an IUPAC encoding from Table 1, then each SequenceAnnotation that includes a Range and/or Cut in the sequenceAnnotations property of the ComponentDefinition SHOULD specify a region on the elements of this Sequence.
Reference: Section 7.7 on page 22
Since it says this is only checked if the sequence exists, then I think it is okay for the case where one is not provided. You could always mock up an example and run through the validator.
… On Jan 13, 2017, at 5:16 PM, Raik Grünberg ***@***.***> wrote:
OK, so does that mean this is basically a solved issue? In which case we
should put a paragraph into the spec showing how it is done.
Which best practices or validation rules would be violated?
On Fri, Jan 13, 2017 at 1:01 PM, cjmyers ***@***.***> wrote:
> I tend to like to declare before use, which is why I put the “daughters”
> before “parents”, but it is fine either way. I also list Components
> together and SAs together, since that is how they are serialized, but
> again, not required.
>
> All sequences are optional though would violate some best practice and
> completeness rules. I would assume that tooling would be able to create the
> top sequence for you given the daughter sequences. We do this in
> SBOLDesigner for abutted parts. Of course, this would only work if the
> daughter sequences are provided.
>
> Chris
>
>
> > On Jan 13, 2017, at 9:47 AM, Raik Grünberg ***@***.***>
> wrote:
> >
> > Nice example. I would re-order things a little bit. According to spec,
> > Location always needs a sequence on the *parent* ComponentDefinition that
> > it refers to. (Though I guess we allow that this sequence is missing.) If
> > we formulate your example from parent to daughters, I think the current
> > model works:
> >
> > CD final
> > Sequence ggggagctcccc
> >
> > Component backbone
> > SA
> > Location 1..4
> > Location 9..12
> > Component backbone
> >
> > Component insert
> > SA
> > Location 5..8
> > Component insert
> >
> > CD insert
> > Sequence agct (optional)
> >
> > CD backbone
> > Sequence ggggcccc (optional)
> >
> > The question is: Can we leave out the sequence of "final" at the top and
> > still assume tools will know exactly how to assemble it from the
> > sub-sequences? I guess this also relates with the discussion of fully
> > defined sequences.
> >
> > BTW, for any features or sub-parts within the backbone, Locations will
> map
> > to the ggggcccc "backbone" sequence. So it is not really all that
> optional.
> >
> >
> >
> > > This specifies that the original sequence can be found split over
> > > Locations 1..4 and 9..12, while the insert found at Location 5..8.
> > >
> > > > On Jan 12, 2017, at 4:44 PM, Raik Grünberg ***@***.***
> >
> > > wrote:
> > > >
> > > > Currently there is no clear way how to specify that a sub-part
> > > (ComponentDefinition) is inserted at a specific location within another
> > > ComponentDefinition.
> > > >
> > > > This would be important for the description of genome editing
> > > experiments but also regular plasmid constructs and would allow to
> > > reference the larger original genome without copying it.
> > > >
> > > > Insertion should also include the possibility of replacing part of
> the
> > > original genome.
> > > >
> > > > —
> > > > You are receiving this because you are subscribed to this thread.
> > > > Reply to this email directly, view it on GitHub <
> > > #6>, or mute
> the
> > > thread <https://github.com/notifications/unsubscribe-
> > > auth/ADWD99GQ_P22ZXUra_JZV9H4BL07_2tWks5rRlhWgaJpZM4Lh-I1>.
> > > >
> > >
> > > —
> > > You are receiving this because you authored the thread.
> > > Reply to this email directly, view it on GitHub
> > > <#6#
> issuecomment-272364604>,
> > > or mute the thread
> > > <https://github.com/notifications/unsubscribe-
> auth/ABxs3UbJ5oecEQ7c-1C_hij_Sm54eNGMks5rRwvMgaJpZM4Lh-I1>
> > > .
> > >
> >
> >
> >
> > --
> > ___________________________________
> > Raik Grünberg
> > http://www.raiks.de/contact.html
> > ___________________________________
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub <
> #6#
> issuecomment-272402520>, or mute the thread <https://github.com/
> notifications/unsubscribe-auth/ADWD98G_r62Pfc0AKzZK_
> o7vG1VmTcYwks5rR0hFgaJpZM4Lh-I1>.
>
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#6>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABxs3RWloAxcgWhhwJvGtRs4WGLhS1NCks5rR0txgaJpZM4Lh-I1>
> .
>
--
___________________________________
Raik Grünberg
http://www.raiks.de/contact.html
___________________________________
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#6>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADWD9w-QF88nrtHM4h7uvmIElaMTcrs2ks5rR7GHgaJpZM4Lh-I1>.
|
If you want to do a replacement, the means for doing that is using MapsTo. Namely, you would breakup the object you are working with into Components, where one Component would be the one being replaced. You would use a MapsTo to replace it with a new Component. The one thing I’m not certain of is how the Locations are worked out. I think the notion of flattening has never been precisely defined, so this might need a bit of thought to come up with a coherent algorithm for executing this, but I believe the pieces are there.
As to changing sub-records, this is not exactly permitted. If you are using published records as sub-components, they are REQUIRED to never change. If you change them, you are supposed to mint a new version with a new URI. However, there is one caveat to this which is if the reference is by persistentIdentity. However, details with using persistentIdentities for reference still exist.
Anyway, in both cases, I think clarifications and examples in the specification are clearly needed. I think things are certainly ill-defined enough that we likely can resolve your issues with better definitions. However, we should try a few concrete use cases first to see where it stands.
… On Jan 13, 2017, at 5:54 PM, Raik Grünberg ***@***.***> wrote:
Left a comment on sbol-dev. Problems I still see are (1) how to formulate replacements, and (2) that all lengths of all components need to be known which makes it likely that things break if sub-parts records change.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#6>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADWD9xTzdafTg2XHvIIHXHDCUkNh4LY1ks5rR7plgaJpZM4Lh-I1>.
|
Delaying this one until sourceLocation is added, as well as Location on Component (2.3 or 3.0). These changes will make this more flexible. |
Should we restart the conversation on this issue now that it looks like sourceLocation will become part of the Spec |
The needed changes for insertions, deletions, and edits are all there in SBOL 3, but we still need nice examples for people. |
Move to the examples repository |
Currently there is no clear way how to specify that a sub-part (ComponentDefinition) is inserted at a specific location within another ComponentDefinition.
This would be important for the description of genome editing experiments but also regular plasmid constructs and would allow to reference the larger original genome without copying it.
Insertion should also include the possibility of replacing part of the original genome.
The text was updated successfully, but these errors were encountered: