Collection of Classes and functions for working with biological sequences
There are two Public classes
SequencePoint
, useful for emulating Mutations, SNPs, PTM's etc., it's two most important attributes are:SequencePoint.pos
, the human readable number, counting from 1SequencePoint.index
, the python readable number counting from 0
SequenedRange
, useful for emulating Proteins, domains, secondary structure etc.Its 3 most important attributes are:
SequenceRange.start
is aSequencePoint
pointing to the first amino acidSequenceRange.stop
is aSequencePoint
pointing to the last amino acidSequenceRange.slice[start, stop]
: The python slice object, to index strings
It also has the following two properties for easy conversion to tuple
SequencePoint.pos.[start, stop]
: tuple containing (self.start.pos
,self.stop.pos
)SequencePoint.index.[start, stop]
: tuple containing (self.start.index
,self.stop.index
)
Example code, lets make glucagon
>>> from sequtils import SequenceRange, SequencePoint
>>> glucagon_sequence = ("MKTIYFVAGLLIMLVQGSWQHALQDTEENPRSFPASQTEAHEDPDEMNEDKRHSQGTFTS"
... "DYSKYLDSRRAQDFVQWLMNTKRNRNNIAKRHDEFERHAEGTFTSDVSSYLEGQAAKEFI"
... "AWLVKGRGRRDFPEEVAIAEELGRRHADGSFSDEMSTILDNLATRDFINWLIQTKITDKK")
>>> glucagon = SequenceRange(1, seq=glucagon_sequence)
>>> glucagon
SequenceRange(1, 180, seq="MKTIY..ITDKK")
So we now have a protein object, where the stop was inferred from the sequence, glp1
is a peptide
>>> glp1 = SequenceRange(98, 127, full_sequence=glucagon_sequence)
>>> glp1
SequenceRange(98, 127, seq="HAEGTFTSDVSSYLEGQAAKEFIAWLVKGR")
A SequenceRange
from 98 to 127 is created, with the peptide sequence inferred
from the protein sequence
Lets see the start
and stop
attributes of the peptide:
>>> glp1.start
SequencePoint(98)
>>> glucagon_sequence[glp1.start.index] == glp1.seq[0]
True
>>> glp1.stop
SequencePoint(127)
>>> glucagon_sequence[glp1.stop.index] == glp1.seq[-1]
True
Lets try to use the slice object to cut the peptide sequence out of the protein
>>> glp1.slice
slice(97, 127, None)
>>> glucagon_sequence[glp1.slice]
'HAEGTFTSDVSSYLEGQAAKEFIAWLVKGR'
>>> glp1.seq == glucagon.seq[glp1.slice]
True
GLP-1 is famous for having a canonical G[KR][KR] motif, this motif is the 3 N-terminal flaking amino acids, let's find it
>>> motif = SequenceRange(1 + glp1.stop.pos, 3 + glp1.stop.pos)
>>> glucagon.seq[motif.slice]
'GRR'
The objects also supports math... So lets try to do the above with math, but first an explanation.
All math on these objects are performed based on the Indexes, thus
>>> SequencePoint(1) + SequencePoint(1)
SequencePoint(1)
>>> SequenceRange(1, 1) + SequenceRange(1, 1)
SequenceRange(1, 1, seq=None)
Because SequencePoint(1).index
is 0 and 0 + 0 = 0
The above code is equivalent to the following:
>>> SequencePoint.from_index((SequencePoint(1).index + SequencePoint(1).index))
SequencePoint(1)
The math is super intuitive for scalars
>>> SequenceRange(2, 5) + 2
SequenceRange(4, 7, seq=None)
>>> SequenceRange(2, 5, seq="EVIL") + 2
SequenceRange(4, 7, seq="EVIL")
It also works for non scalars, but then seq becomes None
because the length has changed
>>> SequenceRange(2, 5, seq="EVIL") + SequenceRange(3, 6)
SequenceRange(4, 10, seq=None)
If you add numbers or tuples, the code will assume that those are indexes,
thus the following 3 all gives the GRR motif by moving glp1.stop
by (1, 3)
Create new object moving glp1.stop
>>> SequenceRange(glp1.stop + 1, glp1.stop + 3)
SequenceRange(128, 130, seq=None)
Create new object via math, here we perform SequenceRange
+ SequencePoint
>>> glp1.stop + SequenceRange.from_index(1, 3)
SequenceRange(128, 130, seq=None)
>>> glp1.stop + SequenceRange(2, 4)
SequenceRange(128, 130, seq=None)
Convert SequencePoint
to SequenceRange
and then add an offset tuple, note
that SequencePoint
only knows 'scalar' math, so we have to ether convert it
to a SequenceRange
as here, or convert the (1, 3)
tuple to a SequnceRange
as we did above
>>> SequenceRange(glp1.stop) + (1, 3)
SequenceRange(128, 130, seq=None)