-
Notifications
You must be signed in to change notification settings - Fork 2
/
06_python_appl.Rmd
171 lines (123 loc) · 5 KB
/
06_python_appl.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
jupyter:
jupytext:
cell_metadata_filter: name,-all
formats: ipynb,Rmd
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.4.2
kernelspec:
display_name: Python 3
language: python
name: python3
---
# String manipulation
String manipulation is one of Python's strong suites. It comes built in with methods for strings, and the `re` module (for *regular expressions*) ups that power many fold.
Strings are objects that we typically see in quotes. We can also check if a variable is a string.
```{python 06-python-appl-1}
a = 'Les Miserable'
type(a)
```
Strings are a little funny. They look like they are one thing, but they can act like lists. In some sense they
are really a container of characters. So we can have
```{python 06-python-appl-2}
len(a)
```
```{python 06-python-appl-3}
a[:4]
```
```{python 06-python-appl-4}
a[3:6]
```
The rules are basically the same as lists. To make this explicit, let's consider the word 'bare'.
In terms of positions, we can write this out.
| | | | | |
| --------- | ---- | ---- | ---- | ---- |
| index | 0 | 1 | 2 | 3 |
| string | b | a | r | e |
| neg index | -4 | -3 | -2 | -1 |
| | | | | |
We can also slices strings (and lists for that matter) in intervals. So, going back to `a`,
```{python 06-python-appl-5}
a[::2]
```
slices every other character.
Strings come with several methods to manipulate them natively.
```{python 06-python-appl-6}
'White Knight'.capitalize()
"It's just a flesh wound".count('u')
'Almond'.endswith('nd')
'White Knight'.lower()
'White Knight'.upper()
'flesh wound'.replace('flesh','bullet')
' This is my song '.strip()
'Hello, hello, hello'.split(',')
```
One of the most powerful string methods is `join`. This allows us to take a list of characters, and then
put them together using a particular separator.
```{python 06-python-appl-7}
' '.join(['This','is','my','song'])
```
Also recall that we are allowed "string arithmetic".
```{python 06-python-appl-8}
'g' + 'a' + 'f' + 'f' + 'e'
'a '*5
```
### String formatting
In older code, you will see a formal format statement.
```{python 06-python-appl-9}
var = 'horse'
var2 = 'car'
s = 'Get off my {}!'
s.format(var)
s.format(var2)
```
This is great for templates.
```{python 06-python-appl-10}
template_string = """
{country}, our native village
There was a {species} tree.
We used to sleep under it.
"""
print(template_string.format(country='India', species = 'banyan'))
print(template_string.format(country = 'Canada', species = 'maple'))
```
In Python 3.6+, the concept of `f-strings` or formatted strings was introduced. They can be easier to read, faster and have better performance.
```{python 06-python-appl-11}
country = 'USA'
f"This is my {country}!"
```
## Regular expressions
Regular expressions are amazingly powerful tools for string search and manipulation. They are available in pretty much every
computer language in some form or the other. I'll provide a short and far from comprehensive introduction here. The website [regex101.com](https://regex101.com) is a really good resource to learn and check your regular expressions.
### Pattern matching
| Syntax | Description |
| ------- | ------------------------------------------------------------ |
| `.` | Matches any one character |
| `^` | Matches from the beginning of a string |
| `$` | Matches to the end of a string |
| `*` | Matches 0 or more repetitions of the previous character |
| `+` | Matches 1 or more repetitions of the previous character |
| `?` | Matches 0 or 1 repetitions of the previous character |
| `{m}` | Matches `m` repetitions of the previous character |
| `{m,n}` | Matches any number from `m` to `n` of the previous character |
| `\` | Escape character |
| `[ ]` | A set of characters (e.g. `[A-Z]` will match any capital letter) |
| `( )` | Matches the pattern exactly |
| `|` | OR |
# BioPython
BioPython is a package aimed at bioinformatics work. As with many Python packages, it is opinionated towards the needs of the developers, so might not meet everyone's needs.
You can install BioPython using `conda install biopython`.
We'll do a short example
```{python 06-python-appl-12}
from Bio.Seq import Seq
#create a sequence object
my_seq = Seq("CATGTAGACTAG")
#print out some details about it
print("seq %s is %i bases long" % (my_seq, len(my_seq)))
print("reverse complement is %s" % my_seq.reverse_complement())
print("protein translation is %s" % my_seq.translate())
```
BioPython has capabilities for querying databases like `Entrez`, read sequences, do alignments using FASTA, and the like.