-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathswcadelaide-latest_Day2_2pm.html
824 lines (824 loc) · 31.1 KB
/
swcadelaide-latest_Day2_2pm.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Language" content="en-us" />
<title>/swcadelaide</title>
</head>
<body>Welcome to MoPad!<br
/><br
/>This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!<br
/><br
/><b>Day 1</b><br
/><br
/>9:00-10:30 Python basics<br
/><br
/>1. The Zen of Python, by Tim Peters<br
/>>>>import this<br
/><br
/>2. Convoluted counterpart of good old 'pwd' <br
/>>>>import os<br
/>>>>os.getcwd()<br
/>'/home/swc_trainee/Desktop/scripts'<br
/><br
/>3. To extend range() to floats you can use list comprehension:<br
/>>>>>>> [x + 0.5 for x in range(0, 10)]<br
/>[0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]<br
/><br
/><br
/><i>(10:30-11:00 break)</i><br
/>11:00-12:30 Python data structures<br
/><br
/><br
/><br
/><i>(12:30-1:00 lunch)</i><br
/>1:00-2:00 Python control flow<br
/>2:00-3:00 Python functions and modules<br
/><i>(3:00-3:30 break) </i><br
/>3:30-4:30 classes and objects<br
/><br
/><br
/><b>Day 2</b><br
/><br
/>9:00-10:30 testing<br
/><i>(10:30-11:00 break)</i><br
/>11:00-12:30 testing<br
/><i>(12:30-1:00 lunch)</i><br
/>1:00-3:00 version control<br
/><i>(3:00-3:30 break)</i><br
/>3:30-4:30 documentation<br
/><br
/> <br
/>Python cheatsheet:<br
/> # How to comment:<br
/># One line comment is with "#"<br
/># Multi-line comments start and end with """ or '''<br
/><br
/># How to define variables<br
/># string<br
/>dog_dragon = "falcon" # double-quote or single-quote, doesn't matter<br
/># integer<br
/>number_of_seasons = 4<br
/>#be careful!<br
/>number_of_seasons / 5<br
/># python returns 0!<br
/># instead:<br
/>number_of_seasons / float(5)<br
/>#= 0.8. That's better.<br
/># float<br
/>half = 0.5<br
/># boolean<br
/>is_world_ending = False<br
/><br
/><br
/># How to get help<br
/>help(str)<br
/><br
/># How to get the type of a variable<br
/>type(dog_dragon)<br
/>>>> 'str'<br
/><br
/># How to print<br
/>print "hello world"<br
/># or<br
/>print("hello world")<br
/><br
/># get your current working directory<br
/>import os<br
/>os.getcwd()<br
/><br
/># How to create list containing artithmetic progression from 0-9<br
/>range(10)<br
/># How to generate the list from 1-9<br
/>range(1,10)<br
/># how to change the step size of the progression from 1 to 5<br
/>range(1,10,5)<br
/><br
/># How to define a function<br
/>def a_name(first_argument, second_argument, argument_with_default = 5):<br
/> """ This is a simple function, and here is a simple description.<br
/> Remember to always indent! <br
/> """<br
/> print first_argument, second_argument, argument_with_default<br
/> x = 4<br
/> return x<br
/><br
/># How to define a class<br
/>class Person:<br
/> # The initialization method<br
/> def __init__(self, person_name):<br
/> self.name = person_name<br
/> # An example method<br
/> def introduce(self):<br
/> return "Hello, my name is " + self.name<br
/><br
/># How to open a file for reading<br
/>fh = open("hello.txt", "r")<br
/># Identical<br
/>fh = open("hello.txt")<br
/># How to read a line from a file<br
/>fh.readline()<br
/># How to read three characters from a file<br
/>fh.read(3)<br
/># How to write to a file<br
/>write_file = open("example.txt", "w")<br
/># How to write to that file<br
/>write_file.write("Hello, I'm a line\n")<br
/><br
/># How to import a module<br
/>import Bio<br
/># How to import a method from a module<br
/>from Bio import SeqIO<br
/><br
/># Some important modules for bioinformaticians:<br
/>#biopython for all kinds of sequence manipulations, phylogenetic trees, FASTQ/FASTA parsing etc.<br
/>#matplotlib for plotting graphs<br
/>#numpy for matrix manipulations<br
/>#pysam for reading and writing SAM and BAM-files<br
/>#scipy for all kinds of statistics<br
/><br
/><br
/>#Start python<br
/>python<br
/>#quit the python console<br
/>quit()<br
/>#Run python script<br
/>python example.py<br
/><br
/><br
/>Try defining a string, an integer, a float<br
/>For extra fun, try adding each type together. Add an integer to a string, a float to an integer, a float to a string. See what happens.<br
/><br
/>Hint: try converting your numeric types to strings to get the concatenations to work<br
/>Convert-methods: str(), float(), int()<br
/><br
/># use dir() on an object to peek inside that object to see what methods are available<br
/>dir('string')<br
/><br
/><br
/># indices in Python are zero-based<br
/>my_list = [1,2,3,4,5,4]<br
/># the first element in the list<br
/>my_list[0]<br
/># the last element in the list<br
/>my_list[-1]<br
/><br
/>...<br
/>import antigravity<br
/>import this<br
/>from __future_ import braces<br
/><br
/> <br
/><br
/><b>Exercise 1</b><br
/><br
/>Write a Python script that takes a list of three<br
/>or more words as arguments and prints those words<br
/>separated by commas and sorted alphabetically, with<br
/>the final word preceded by "and", with a period at<br
/>the end.<br
/><br
/>1. get the list of arguments<br
/>2. sort them<br
/>3. join all of the words except the last one with a comma<br
/>[4. capitalize the first letter]<br
/>5. append the word "and" and the final word, and a period<br
/>6. print the result<br
/><br
/>Extra credit: capitalize the first letter.<br
/><br
/>For example:<br
/><br
/>python my_script.py apple strawberry banana<br
/>apple, banana, and strawberry.<br
/><br
/>Sample solution:<br
/><i>import sys</i><br
/><i># 1. get the list of arguments</i><br
/><i># 2. sort them</i><br
/><i> sorted_list = sorted(sys.argv[1:])</i><br
/><i># [4. capitalize the first letter] </i><br
/><i> sorted_list[0] =(sorted_list[0].capitalize())</i><br
/><br
/><i>#3. join all of the words except the last one with a comma</i><br
/><i>#5. append the word "and" and the final word, and a period</i><br
/><i>#6. print the result</i><br
/><i> print ', '.join(sorted_list[:-1]) + ' and ' + sorted_list[-1] + '.'</i><br
/><br
/><br
/>You can use the Ctrl + Alt + K key combination to enable/disable the catching of Alt+Tab and Print Screen keys within the NX session. <- brilliant to know for future setups!<br
/><br
/><br
/><br
/># SETS<br
/><br
/># list set functions<br
/>dir(set)<br
/><br
/><b>Exercise 2</b><br
/><br
/>given a string (a sentence), find out how many<br
/>unique letters A-Z it contains - capital and<br
/>lower case shouldn't be double-counted<br
/><br
/>'AaAa'<br
/><br
/>input: some string<br
/><br
/>input_string = 'some string here'<br
/>...<br
/>print (the number of unique letters in the string)<br
/><br
/><br
/><b>Exercise 2 Solution</b><br
/><br
/>input_string = input_string.lower()<br
/><br
/>letters = set(input_string)<br
/>letters.remove(',')<br
/>length = len(set(input_string))<br
/>print letters<br
/>print length<br
/><br
/>Alternate, slightly advanced solution:<br
/>def count_unique(sentence):<br
/> #create a string containing all the non-alpha numeric characters in the latin-alphabet<br
/> delchars = ''.join(c for c in map(chr, range(256)) if not c.isalnum())<br
/> #use .upper() to ensure we don't double count<br
/> #use string.translate() to remove the non alpha-numeric characters that we looked up before: <a href="http://docs.python.org/2/library/string.html#string.translate">http://docs.python.org/2/library/string.html#string.translate</a><br
/> sentence = set(sentence.upper().translate(None, delchars))<br
/> print sentence<br
/> return len(sentence)<br
/><br
/>extra credit:<br
/><br
/>given two sentences, find the # of letters they have in common, and<br
/>the number of letters that are unique to each<br
/><br
/><br
/>e.g.:<br
/><br
/>string1 = 'AAAAaaaa'<br
/>string2 = 'AAAAaaaaBBBB<br
/><br
/>string1 = string1.lower()<br
/>string2 = string3.lower()<br
/><br
/><br
/><br
/>set1 = set(string1)<br
/>set2 = set(string2)<br
/><br
/><br
/><br
/>print (# of letters in common)<br
/>print (# of letters unique to sentence 1)<br
/>print (# of letters unique to sentence 2)<br
/><br
/><br
/><br
/># dictionaries<br
/><br
/>my_dict = {'a' : 1, 'b' :2, 'c' :3}<br
/><br
/># 'a' key<br
/># 1 value<br
/><br
/><br
/>my_dict['a'] = 4<br
/><br
/>my_dict.keys()<br
/><br
/>my_dict['a']<br
/><br
/><br
/><br
/><b>Exercise 3</b><br
/><br
/> <br
/><br
/>create a dictionary in the following format:<br
/>{'G': (# of occurences in the string),<br
/>'A': ...<br
/>}<br
/><br
/>print the dictionary<br
/><br
/>hint: strings have a "count" method - see the help function to find out how to use it<br
/><br
/><br
/>extra credit: print the GC content (the proportion of the string that is either G's or C's, from 0 to 1)<br
/><br
/><br
/><b>Exercise 3 Solution</b><br
/><br
/>help(str.count)<br
/><br
/>input_string = "GATCAGTCGATCGACTGCTAGCTAGCTAGTACGGCGTATA"<br
/>countA = input_string.count('A')<br
/>countC = input_string.count('C')<br
/>countT = input_string.count('T')<br
/>countG = input_string.count('G')<br
/><br
/>dna_dict = {'A' : countA, 'G' : countG, 'C' : countC, 'T' : countT}<br
/><br
/>dna_dict['A']<br
/><br
/># GC content<br
/>print float(dna_dict['G'] + dna_dict['C']) / len(input_string)<br
/><br
/>Control Flow:<br
/><br
/>Selection:<br
/><br
/>number = int(sys.argv[1])<br
/><br
/>if number % 2 ==0:<br
/> print 'EVEN'<br
/>elif<br
/> print 'ODD'<br
/> <br
/>python checkoddeven.py 3<br
/><br
/>For Loop:<br
/><br
/>fruits = ['apple', 'orange', 'peach']<br
/><br
/>for fruit in fruits:<br
/> print "I am a " + fruit + "."<br
/> <br
/> # print fruit, len(fruit)<br
/> <br
/> <br
/>#Exercise: For each item in fruits, print the content of the item and the length of the item<br
/> <br
/><br
/># print fruit, len(fruit)<br
/><br
/><br
/>While Loop:<br
/><br
/>Exercise 1<br
/><br
/># print fruit, len(fruit)<br
/><br
/>reader = open('fruits.txt', 'r')<br
/>line = reader.readline()<br
/><br
/>while line != ' ':<br
/> print line<br
/>line.reader.readline()<br
/><br
/>Exercise 2<br
/><br
/>wget -U firefox <a href="http://www.gutenberg.org/cache/epub/76/pg76.txt"><u>http://www.gutenberg.org/cache/epub/76/pg76.txt</u></a><br
/>1) Read the contents of the file pg76.txt<br
/>2) get the length of each line and sum the lines as you go<br
/>3) count the total number of lines in the file<br
/><br
/><br
/>reader = open('pg76.txt', 'r')<br
/>line = reader.readline()<br
/><br
/>total_length = 0<br
/>line_count = 0<br
/><br
/>while line != ' ':<br
/> total_length = len(line)<br
/> sum = sum + length<br
/> line.reader.readline()<br
/><br
/>print length<br
/><br
/>Exercise 3<br
/><br
/>wget <a href="http://seanlahman.com/files/database/lahman2012-csv.zip"><u>http://seanlahman.com/files/database/lahman2012-csv.zip</u></a><br
/>unzip lahman2012-csv.zip<br
/>Pitching.csv<br
/><br
/># Open our input file.<br
/>reader = open('Pitching.csv', 'r')<br
/><br
/># Read the header line.<br
/>line = reader.readline()<br
/><br
/># Get the index of the 'IPouts' colum.<br
/>header = line.split(',')<br
/>ipout_index = header.index('IPouts')<br
/><br
/># Go to the first data line.<br
/>line = reader.readline()<br
/><br
/># Define our variables.<br
/>total_outs = 0<br
/>line_count = 0<br
/><br
/># Read the rest of the data line by line.<br
/>while line != '':<br
/> row = line.split(',')<br
/> value = row[ipout_index]<br
/> total_outs += float(value)<br
/> line_count += 1<br
/> line = reader.readline()<br
/><br
/># Print our results<br
/>average = total_outs / line_count<br
/>print 'Total Outs: ' + str(total_outs)<br
/>print 'Line Count: ' + str(line_count)<br
/>print 'Average: ' + str(average)<br
/><br
/>#The alternate method of finding the index of a string in a string or list is the index() method:<br
/>string = 'Hi, my name is bob, and I am great!'<br
/>string.index('bob') # 15<br
/>string.split().index('bob,') # 4<br
/><br
/>Thanks for this { .index() } -- that's nice<br
/><br
/>#So then we can loop through the file using either a while or a for loop:<br
/>ind = -1<br
/>sum = 0<br
/>count = 0<br
/>for each_line in open('Pitcher.csv','r'): # for-loops to loop through files in python are lovely<br
/> if ind == -1:<br
/> ind = each_line.split().index('IPouts')<br
/> continue<br
/> sum += eachline.split()[ind] #sums all the ipouts<br
/> count += 1<br
/> print eachline.split()[ind] #prints all the IPouts<br
/>print sum/float(count) #prints the average IPout<br
/><br
/>The Python preferred style guide is called PEP 8: <a href="http://www.python.org/dev/peps/pep-0008/">http://www.python.org/dev/peps/pep-0008/</a><br
/><br
/>#The python iterator interface<br
/>#It's worth noting that any object or type in python that implements the iterator interface can be looped through using the <br
/>for x in iterator:<br
/> x.do_stuff()<br
/>syntax. Examples include text files, lists, dictionaries, sets and strings. <br
/><br
/>for x in 'qwertyuiop':<br
/>... print x<br
/>... <br
/>q<br
/>w<br
/>e<br
/>r<br
/>t<br
/>y<br
/>u<br
/>i<br
/>o<br
/>p<br
/><br
/>Functions and modules:<br
/><br
/>Exercise 1:<br
/>#Exercise 1:<br
/><br
/>#given a string 'dna', remove all 'N', return the GC-content<br
/><br
/><br
/>dna = 'ATGCNNNNNNNN'<br
/>dna2 = 'NGGGGGGGGGGGC'<br
/>dna3 = 'GTGTGTGTGTGTTT'<br
/>Exercise 2:<br
/><br
/># exercise 2:<br
/><br
/>Given a string 'filename', write a function which opens that file, iterates over all sequences, and writes a bit of stats about each sequence:<br
/><br
/>- print the name of each sequence<br
/>- Count of Ns<br
/>- GC-content without Ns<br
/><br
/>Print amount of sequences in that file.<br
/><br
/>Tips: <br
/>- if line.startswith('>') - give the name<br
/><br
/>>Sequence 1<br
/>ATGGGGGTGTGTGNNNNNNTGA<br
/>>Sequence 2<br
/>ATGCCCGCGCGCGCTGA<br
/>>Sequence 3<br
/>GGGTGGTGTGTGACAAAAAAAA<br
/><br
/>Example-output:<br
/><br
/>The sequence has name 'Sequence 1'<br
/>It has 6 Ns, 0.5625 GC-content<br
/>The sequence has name 'Sequence 2'<br
/>It has 0 Ns, 0.7647058823529411 GC-content<br
/>The sequence has name 'Sequence 3'<br
/>It has 0 Ns, 0.4090909090909091 GC-content<br
/><br
/>There are three sequences in the file.<br
/><br
/>def give_stats(filename):<br
/> # do stuff<br
/> <br
/>give_stats('example.fasta')<br
/><br
/>Solution:<br
/><br
/>def give_dna_stats(filename):<br
/> fh = open(filename, 'r')<br
/> line = fh.readline()<br
/> sequence_counter = 0<br
/> while line != '':<br
/> line = line.rstrip()<br
/> if line.startswith('>'):<br
/> print 'The name of the sequence is ' + line<br
/> else:<br
/> line = line.upper()<br
/> gc_count = float(line.count('G') + line.count('C'))/len(line.replace('N', ''))<br
/> n_count = line.count('N')<br
/> print 'GC-content is ' + str(gc_count)<br
/> print 'There are ' + str(n_count) + ' Ns.'<br
/> sequence_counter += 1<br
/> line = fh.readline()<br
/> print 'There are ' + str(sequence_counter) + ' sequences in the file.'<br
/> <br
/>give_dna_stats('example.fasta')<br
/><br
/>Note: the 'pass' keyword means to do nothing. Why do we have it? Because the python interpreter needs to know where to check for indenting:<br
/><i>def my_function():</i><br
/><i> #not implemented yet</i><br
/><i> </i><br
/><i>def my_second_function():</i><br
/><i> do_stuff()</i><br
/> <br
/>Will raise a syntax error - the interpreter needs to see some code in the first function.<br
/>So we add pass:<br
/><br
/><i>def my_function():</i><br
/><i> #not implemented</i><br
/><i> pass</i><br
/><br
/>#New-style python classes inherit from object. You don't need to know what that means necessarily, but add (object) after your class declaration as below. #<a href="http://docs.python.org/2/reference/datamodel.html#newstyle">http://docs.python.org/2/reference/datamodel.html#newstyle</a> if you're curious. It's not important for most applications, but it can be a bit of a 'gotcha'.<br
/> <br
/>class Rodent(object):<br
/> def __init__(self, tag_id, size ):<br
/> self.tag_id = tag_id<br
/> self.size = size<br
/> self.sightings_per_month = {}<br
/> <br
/> def is_large(self):<br
/> # return True if size is > 5oz<br
/> return (self.size > 5)<br
/> <br
/> def is_small(self):<br
/> # return True if size is < 3oz<br
/> return (self.size < 3)<br
/> <br
/> def plot(self):<br
/> # return the letter of the plot at which<br
/> # this rodent was first captured<br
/> return self.tag_id[0]<br
/> <br
/> <br
/> def capture(self, month):<br
/> # we captured this rodent once in this month<br
/> if month not in self.sightings_per_month:<br
/> self.sightings_per_month[month] = 0<br
/> self.sightings_per_month[month] += 1<br
/><br
/><br
/># dna_string.py<br
/> def __init__(self, sequence):<br
/> self.seq = sequence<br
/><br
/> def base_count(self, base)<br
/> return self.sequence.count(base)<br
/><br
/> def gc_content(self):<br
/> g = self.base_count('G')<br
/> c = self.base_count('C')<br
/> return float(g+c)/len(self.sequence)<br
/><br
/><br
/>import dna_string<br
/><br
/>x = dna_string.DNAString('GATC')<br
/>x.reverse_complement<br
/><br
/><br
/><br
/><br
/>class NucleotideString:<br
/> base_complement = {'G': 'C', 'C':'G',<br
/> 'A': 'T', 'T': 'A'}<br
/> <br
/> def __init__(self, sequence):<br
/> self.sequence = sequence<br
/> self.bases = {}<br
/> <br
/> def base_count(self, base):<br
/> if base in self.bases:<br
/> return self.bases[base]<br
/> else:<br
/> self.bases[base] = self.sequence.count(base)<br
/> return self.bases[base]<br
/><br
/> def gc_content(self):<br
/> g = self.base_count('G')<br
/> c = self.base_count('C')<br
/> return float(g+c)/len(self.sequence)<br
/><br
/> def reverse_complement(self):<br
/> complement = '' <br
/> for base in self.sequence:<br
/> complement = self.base_complement[base] + complement<br
/> <br
/> return complement<br
/><br
/><br
/>class DNAString(NucleotideString):<br
/> pass<br
/><br
/><br
/>class RNAString(NucleotideString):<br
/> base_complement = {'G': 'C', 'C':'G',<br
/> 'A': 'U', 'U': 'A'}<br
/><br
/><br
/><br
/><a href="http://software-carpentry.org/blog/2013/09/how-much-testing-is-enough.html">http://software-carpentry.org/blog/2013/09/how-much-testing-is-enough.html</a><br
/><br
/><br
/>def nucleotideContent(dnaString): <br
/>'''This function must return the contribution <br
/>of nucleotides ATCG (as uppercase) from a given DNA <br
/>string inside a dictionary, where each key refers to <br
/>a nucleotide <br
/>''' <br
/>dnaDict = {} <br
/>uniques=set(dnaString.upper())<br
/>uniques=uniques.intersection(set('ACTG')) <br
/>for nucleotide in uniques: <br
/>dnaDict[nucleotide]=dnaString.count(nucleotide) <br
/> <br
/>return dnaDict<br
/><br
/># Run and report <br
/>passes = 0 <br
/>for (i, (seq, expected)) in enumerate(Tests): <br
/> if nucleotideContent(seq) == expected: <br
/> passes += 1 <br
/>else: <br
/> print('test %d failed' % i) <br
/> <br
/>print('%d/%d tests passed' % (passes, len(Tests)))<br
/><br
/>test = [<br
/> ['gtcagtc', {'G':2, 'T':2, 'C':2, 'A':2}],<br
/> ['gtagt', {'G':2, 'T':2, 'A':2}],<br
/> ['GTCNGAT', {'G': 2, 'T':2, 'C':1,'A':1}']<br
/>]<br
/><br
/><b><u>On importing:</u></b><br
/>Yesterday we covered the <i>import </i>keyword very briefly. Some import tricks:<br
/><i>import x</i><br
/>Will import the module x (x.py) into your file. To access any of the functions and classes of x you will need to refer to them in the x namespace like so:<br
/><i>x.y()</i><br
/><i>x.z()</i><br
/>However, there is another way of importing that we covered, the <i>from </i>keyword:<br
/><i>from x import y</i><br
/>This will <b>only</b> import y into your namespace. So:<br
/><i>y()</i><br
/>Will work, but<br
/><i>z()</i><br
/><i>x.z()</i><br
/>Will not. There is one final way of importing code from other modules:<br
/><i>from x import *</i><br
/>This will import everything from the x module, so<br
/><i>y()</i><br
/><i>z()</i><br
/>Will both work. <b>Use this with care</b> - more than one programmer has been burned by importing too much, and having similarly named functions/classes/methods to the ones they define.<br
/><br
/><br
/><br
/><br
/>#creating my test_dna_starts.py file<br
/>def dna_starts_with(st1, st2):<br
/> return st1[0:len(st2)]==st2<br
/> <br
/>def test_dna_starts_with_itself():<br
/> dna='acgtgtcgat'<br
/> assert dna_starts_with(dna, dna)<br
/> <br
/>def test_dna_starts_with_one():<br
/> assert dna_starts_with('cgtgc', 'c')<br
/> <br
/>def test_dna_starts_with_bigger():<br
/> dna='acgtgtcgat'<br
/> assert not dna_starts_with(dna, dna+dna)<br
/> <br
/>test_dna_starts_with_itself() <br
/>test_dna_starts_with_one()<br
/>test_dna_starts_with_bigger()<br
/><br
/>#end of file<br
/><br
/>on the command line (NOT ON THE PYTHON INTERPRETER), type: nosetests<br
/><br
/>#more about nose<br
/><a href="http://nose.readthedocs.org/en/latest/usage.html">http://nose.readthedocs.org/en/latest/usage.html</a><br
/># for more advanced testers, you might want to look at running nosetests like this:<br
/>nosetests --with-coverage --cover-tests --cover-html<br
/># what this does is run the tests and produces a HTML report (./cover/index.html) of what bits of code were covered by your tests. For more info see: <a href="http://nose.readthedocs.org/en/latest/plugins/cover.html">http://nose.readthedocs.org/en/latest/plugins/cover.html</a><br
/># This is really useful to identify and design tests to cover portions of your code which are not yet covered by existing tests<br
/><br
/><br
/><br
/>Another simple way to check whether a number is an integer:<br
/><br
/>number = 5<br
/>type(number) == int # returns True<br
/>type(number) == float # returns False<br
/><br
/><br
/><br
/>In nose, if you expect a function to fail (you give it invalid input, for example) you can test whether you get the exception you expected:<br
/><br
/>from nose.tools import assert_raises<br
/>assert_raises(ValueError)<br
/><br
/><br
/><br
/><br
/>def factorial(n):<br
/> '''Return the factorial of n, an integer >= 0<br
/> <br
/> >>> factorial(4)<br
/> 24<br
/> '''<br
/> import math<br
/> if not n >= 0:<br
/> raise ValueError('n must be >= 0')<br
/> <br
/> if math.floor(n) != n:<br
/> raise ValueError('n must be integer')<br
/> <br
/> result = 1<br
/> factor = 2<br
/> while factor <= n:<br
/> result *= factor<br
/> factor += 1<br
/> <br
/> <br
/> return result<br
/><br
/>if __name__ == '__main__':<br
/> # when importing this file, the doctests aren't run<br
/> # but when you run the file with python, the doctests are run<br
/> import doctest<br
/> doctest.testmod()<br
/><br
/><br
/><br
/>#User-friendly unit testing in R:<br
/><a href="http://www.johnmyleswhite.com/notebook/2010/08/17/unit-testing-in-r-the-bare-minimum/">http://www.johnmyleswhite.com/notebook/2010/08/17/unit-testing-in-r-the-bare-minimum/</a><br
/><br
/><br
/><br
/><b><u>HARDCORE MODE:</u></b><br
/>So, out of this workshop, you've decided that you really like Python. Of course you have, it's great. But you spend a lot of time doing <b><u>HARDCORE</u> </b>calculations with huge amounts of data, far beyond that sissy netbook that Nathan carries around. Here's some things to keep in mind:<br
/><b>Parallelisation</b><br
/>Parallelising code is tricky at the best of times, and the way to do it in Python is using the <br
/><i>multiprocessing </i>module. <a href="http://docs.python.org/2/library/multiprocehttps://etherpad.mozilla.orgssing.html">http://docs.python.org/2/library/multiprocehttps://etherpad.mozilla.orgssing.html</a> is the place to find information about that. If you're doing something that you expect will need parallelisation, you should read the docs <u>first. </u>There are some things that you should keep in mind at the design phase, so do your homework.<br
/><b>Speed and Memory Use</b><br
/>Python is pretty good for casual usage, and premature optimisation is the doom of all programmers. However, the time will come when you need to speed things up, or reduce that dictionary using all 128GB of your memory. You can write superfast C extensions to python, but for most of your huge bioinformatic data needs, I think NumPy and SciPy will be great. Go look them up. <br
/><b>Super Hardcore Mode</b><br
/>So most python users use CPython, the reference interpreter of Python. BUT other versions exist, and they are often faster. Check out PyPy, Cython and Jython! Be warned, these things tend to be somewhat enthusiast, and I don't label them <b>Super Hardcore </b>for no reason.<br
/><br
/><br
/><b>More Info</b><br
/>There was a really great talk on using Python in scientific computing at Pycon-AU 2013 in Hobart. Luckily for you, they recorded it for the benefit of all those without time machines:<br
/><a href="https://www.youtube.com/watch?v=hqOsfS3dP9w&list=PLs4CJRBY5F1KDIN6pv6daYWN_RnFOYvt0&index=18">https://www.youtube.com/watch?v=hqOsfS3dP9w&list=PLs4CJRBY5F1KDIN6pv6daYWN_RnFOYvt0&index=18</a><br
/>It's even in tutorial format, so great for learning!<br
/><br
/><br
/># You only need to perform git configuration once on each computer from which you are using git<br
/># These will be used to attribute your commits to you and to display nice readable names associated with those commits<br
/>git config --global user.name bendmorris<br
/>git config --global user.email [email protected]<br
/>git init # initialize a new repository i.e. the current working directory will become a repository. If you are outside that directory and run a git command, you will get an error<br
/>git add *.py # adds all files with extension .py into stage<br
/>git add . # add everything (use with care!!) into stage<br
/>git add dir # add directory "dir" and all its contents<br
/>git add -u # stage only files that have been updated and are already being tracked<br
/>git status # Check current status, which files are new, which ones aren't yet committed<br
/>git commit -m "adding the python files that we created up till now in Adelaide workshop" # moves all files from staging to the commit<br
/># "git commit" without the "-m" will open your default text editor. This may be nano, vi, vim, gedit<br
/>git log # see all commits with commit messages, dates etc. (use 'q' to exit)<br
/>git diff # When run in the top level directory of the repo will show you a diff between your working directory and the local whole repository<br
/><br
/><br
/><b>Git exercise 1</b><br
/><br
/>Create a README file, and commit it to your local repository<br
/><br
/>Extra credit: export the Etherpad, and commit that too<br
/><br
/></body>
</html>