Sequtils Tutorial

Collection of Classes and functions for working with biological sequences

Overview

There are two Public classes

SequencePoint, useful for emulating Mutations, SNPs, PTM's etc., it's two most important attributes are:
- SequencePoint.pos, the human readable number, counting from 1
- SequencePoint.index, the python readable number counting from 0
SequenedRange, useful for emulating Proteins, domains, secondary structure etc.
- Its 3 most important attributes are:
  - SequenceRange.start is a SequencePoint pointing to the first amino acid
  - SequenceRange.stop is a SequencePoint pointing to the last amino acid
  - SequenceRange.slice[start, stop]: The python slice object, to index strings
- It also has the following two properties for easy conversion to tuple
  - SequencePoint.pos.[start, stop]: tuple containing (self.start.pos, self.stop.pos)
  - SequencePoint.index.[start, stop]: tuple containing (self.start.index, self.stop.index)

Example Usage

Example code, lets make glucagon

>>> from sequtils import SequenceRange, SequencePoint
>>> glucagon_sequence = ("MKTIYFVAGLLIMLVQGSWQHALQDTEENPRSFPASQTEAHEDPDEMNEDKRHSQGTFTS"
...                      "DYSKYLDSRRAQDFVQWLMNTKRNRNNIAKRHDEFERHAEGTFTSDVSSYLEGQAAKEFI"
...                      "AWLVKGRGRRDFPEEVAIAEELGRRHADGSFSDEMSTILDNLATRDFINWLIQTKITDKK")
>>> glucagon = SequenceRange(1, seq=glucagon_sequence)
>>> glucagon
SequenceRange(1, 180, seq="MKTIY..ITDKK")

So we now have a protein object, where the stop was inferred from the sequence, glp1 is a peptide

>>> glp1 = SequenceRange(98, 127, full_sequence=glucagon_sequence)
>>> glp1
SequenceRange(98, 127, seq="HAEGTFTSDVSSYLEGQAAKEFIAWLVKGR")

A SequenceRange from 98 to 127 is created, with the peptide sequence inferred from the protein sequence

Lets see the start and stop attributes of the peptide:

>>> glp1.start
SequencePoint(98)

>>> glucagon_sequence[glp1.start.index] == glp1.seq[0]
True

>>> glp1.stop
SequencePoint(127)

>>> glucagon_sequence[glp1.stop.index] == glp1.seq[-1]
True

Lets try to use the slice object to cut the peptide sequence out of the protein

>>> glp1.slice
slice(97, 127, None)

>>> glucagon_sequence[glp1.slice]
'HAEGTFTSDVSSYLEGQAAKEFIAWLVKGR'

>>> glp1.seq == glucagon.seq[glp1.slice]
True

GLP-1 is famous for having a canonical G[KR][KR] motif, this motif is the 3 N-terminal flaking amino acids, let's find it

>>> motif = SequenceRange(1 + glp1.stop.pos, 3 + glp1.stop.pos)
>>> glucagon.seq[motif.slice]
'GRR'

Math Examples

The objects also supports math... So lets try to do the above with math, but first an explanation.

All math on these objects are performed based on the Indexes, thus

>>> SequencePoint(1) + SequencePoint(1)
SequencePoint(1)

>>> SequenceRange(1, 1) + SequenceRange(1, 1)
SequenceRange(1, 1, seq=None)

Because SequencePoint(1).index is 0 and 0 + 0 = 0

The above code is equivalent to the following:

>>> SequencePoint.from_index((SequencePoint(1).index + SequencePoint(1).index))
SequencePoint(1)

The math is super intuitive for scalars

>>> SequenceRange(2, 5) + 2
SequenceRange(4, 7, seq=None)

>>> SequenceRange(2, 5, seq="EVIL") + 2
SequenceRange(4, 7, seq="EVIL")

It also works for non scalars, but then seq becomes None because the length has changed

>>> SequenceRange(2, 5, seq="EVIL") + SequenceRange(3, 6)
SequenceRange(4, 10, seq=None)

If you add numbers or tuples, the code will assume that those are indexes, thus the following 3 all gives the GRR motif by moving glp1.stop by (1, 3)

Create new object moving glp1.stop

>>> SequenceRange(glp1.stop + 1, glp1.stop + 3)
SequenceRange(128, 130, seq=None)

Create new object via math, here we perform SequenceRange + SequencePoint

>>> glp1.stop + SequenceRange.from_index(1, 3)
SequenceRange(128, 130, seq=None)

>>> glp1.stop + SequenceRange(2, 4)
SequenceRange(128, 130, seq=None)

Convert SequencePoint to SequenceRange and then add an offset tuple, note that SequencePoint only knows 'scalar' math, so we have to ether convert it to a SequenceRange as here, or convert the (1, 3) tuple to a SequnceRange as we did above

>>> SequenceRange(glp1.stop) + (1, 3)
SequenceRange(128, 130, seq=None)

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
docs		docs
sequtils		sequtils
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENSE		LICENSE
README.rst		README.rst
pytest.ini		pytest.ini
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequtils Tutorial

Overview

Example Usage

Math Examples

About

Releases

Packages

Languages

License

jancr/sequtils

Folders and files

Latest commit

History

Repository files navigation

Sequtils Tutorial

Overview

Example Usage

Math Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages