Skip to content

Latest commit

 

History

History
298 lines (274 loc) · 18 KB

schedule.org

File metadata and controls

298 lines (274 loc) · 18 KB

Inbox

Learn to use oursql

conn = oursql.connect(host=’127.0.0.1’,user=’cooksonmicro’,passwd=’helicobacter’,db=’mdx’,port=3307) curs = conn.cursor(oursql.DictCursor) # (or many others) then use cursor as usual

Install oursql on mephistopheles

Install mysql on mephistopheles

Create a test database with a single table, insert some rows

Work through oursql tutorial to select those rows in Python

http://packages.python.org/oursql/tutorial.html

Set up stable port bridge to mdx database

ssh -NfL 3307:localhost:3306 [email protected] Connected with: mysql –protocol=tcp –host=localhost –port=3307 –user=cooksonmicro -p mdx

Set up known_hosts and rsa authentication for port on mephistopheles to mdx

Extract sequence_report function from existing code

Takes the paths to two ab1 files and a dictionary like {“accession”: “H34908”, “workup”: null, “pat_name”: “MORRALL,MARK R”, “amp_name”: “alt_16s”, “seq_key”: 22708} returns a string of the HTML to be written.

Extract daily_report function from existing code

Write function to move files into place, connecting to a database with workup table - place_file

Get a few examples from the real data and put them in an SQLite database to use as the connection.

Write command line dispatch of three functions and document it

Write a daemon around sequence_report function

pyinotify monitors a tree. On creation of an ab1 or json file in the tree:

  • if there are two ab1s and a json in that directory, and the json is properly formed, kick off function; break
  • if the json is malformed, log an error to that effect
  • if there is only one ab1 file, log an info message
  • break

Write a daemon around daily_report function

  • on creation of a strandwise_report.html or assembly_report.html in monitored tree, kick off daily_report function on that day.

Write a daemon around place_file

  • call function on file on pyinotify statement that something has happened
  • on SIGUSR1, push everything in the monitored directory (suspend pyinotify)

Files moved while they are being written to work just fine on Linux (at least while moving within a filesystem).

Deployment

Install oursql on microvm

Infrastructure

15min Move codebase and development over to Linux workstation

30min Install blast aligner on server

30min Deployment scripts and instructions written

Set up swatch to monitor syslog and email when there is a problem

http://assets.nagios.com/downloads/nagiosxi/docs/Log_Monitoring_With_Swatch.pdf

45min Set up swatch on meph, try monitoring syslog for errors

90min Set up postfix on meph and figure out how to forward elsewhere

http://www.postfix.org/BASIC_CONFIGURATION_README.html#relay_from

30min Cron job that relinks ‘today’ set up on server

Shares mounted on server and paths set up

This is to be done by Joe or Jerry

Shortcuts to shares set up on techs’ desktops

15min User for daemon set up on server

MDX database migrated to server

http://dev.mysql.com/tech-resources/articles/mysql-administrator-best-practices.html

Install MySQL on server

15min MySQL user for daemon set up on server

15min Syslog and database permissions set up on server

15min Daemon installed on server

15min Configuration file for daemon in place

15min Daemon symlinked into init.d and rc?.d

1.5h Daemon action function written

Instantiate configuration reference Install signal handlers Connect to syslog with level INFO Call seqlablib.conf.read_configuration on /etc/seqlabd.conf to get configuration, assign it to reference Connect to MDX database Set up all queues: newly arrived file queue, analysis queue, HTML regeneration queue Attach behavior to queues Initialize inotify monitoring Check for files already in inbox and enqueue them all Set up missed file checking

Factor out behavior from setting configuration path

Set up testing environment to run daemon’s main in test/data

Hook up serviced call to daemon’s action function

Update setup.py to build and install daemon

Configuration

seqlablib.config: read_configuration(handle) -> dict

Signal handling

seqlablib.signals: set_signal_handlers(conf_ref) -> None

5min 15h46-15h48 Add exit_event argument to set_signal_handlers and propogate to SIGTERM; make SIGTERM handler set the exit event.

MDX database interface

Provides a lookup and an update_path function. The lookup should return a named tuple with all necessary fields. The named tuple should have the fields: path, filename, accession, workup, pat_name, amp_name

lookup_by_sequence_key lookup_by_workup update_by_sequence_key

Mock MDX interface for testing purposes created

Fake filenames set up with their data in a dictionary, and two functions provided that read and write the dictionary. In test code, write MockMDX object with a set of workups for files and lookup and update functions.

30min 14h05-14h12 Generate some example file information and the files to use with them

30min Implement the mock

Real MDX interface created and tested

Two queries (from https://web.labmed.washington.edu/micro/PathsAndIDsForFiles): SELECT mdx.`Accession` as accession, mdx.`Workup Number` as workup, mdx.`Patient Name` as pat_name, `amp categories`.`Amp Name` as amp_name, sr.path as path FROM `seq result` AS sr INNER JOIN `amp categories` AS ac USING (`Amp Category ID`) INNER JOIN mdx USING (`MDX ID`) WHERE sr.`Seq Result ID`=’…sequence_key…’

UPDATE `seq result` SET `path“=’…path…’ where “Seq Result ID“=`’…sequence_key…’

The actual object should allow only one connection at a time and block until the connection is available. It could allow more, but this way I can just put a simple event in instead of something more complicated.

60min Look through Python MySQL bindings and decide which one to use - install it

https://launchpad.net/myconnpy http://packages.python.org/oursql/

45min Write an object with lookup and update functions that use a shared event to coordinate access; don’t put database logic in; write tests

Be sure it releases event on any error in update or lookup

45min Subclass object and add database logic

Check for liveness of the connection.

Regeneration of HTML daily work summary

Order by the creation time of the report in the directories.

30min 14h15-14h50 Function to assemble list of subdirs along with the creation time of *report.html inside and the workup information from the database, and whether it was a strandwise or assembled report.

90min 14h50-15h14 Mock up the display of a set of patient records for a given day in HTML

20min Turn mockup into templet function

20min 15h14-15h29 Write function that takes a path, does lookups, and writes summary into daily_summary.html in that path (including tests)

generate_summary_report(path, lookup_fun=id, format_fun=id, summary_filename=None) For each folder in path, runs path_key to get a key and lookup_fun to get info for that key. Then calls format_fun on the list of all such keys to produce a string. If summary_filename == None, return a string. Otherwise, write the results there.

30min 15h29-15h40 Write map_queue (and tests)

map_queue(queue, fun, exit_event) - pops something off the queue and runs fun on it. When fun returns, repeats. Blocks while queue is empty. Handle exit_event.

Processing pairs of AB1 files into reports

15min 15h40-15h44 Integrate AB1 reading and tests from seqviewer

15min 15h44-15h46 Integrate contig construction from seqviewer, and tests

15min Integrate seqviewer’s alignment rendering code

30min 8h35-8h58 Function to BLAST and write XML to disk, and parse it in memory

blast_seq(seq, xml_path, ncbi_db=’nr’) Takes a string or SeqRecord (seq), returns the path to the XML it writes (in ‘xml_path’) and the parsed BLAST results.

15min 8h58-9h01 Get a pair of AB1 files and extract their sequences

tmpzRpKiy-1.ab1: ‘CAGGGGCATCTATAATGCAGTCGAGCGAACAGATAAGGAGCTTGCTCCTTTGACGTTAGCGGCGGACGGGTGAGTAACACGTGGGTAACCTACCTATAAGACTGGGACAACTTCGGGAAACCGGAGCTAATACCGGATAATATGTTGAACCGCATGGTTCAATAGTGAAAGATGGTTTTGCTATCACTTATAGATGGACCCGCGCCGTATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCAACGATACGTAGCCGACCTGAGAGGGTGATCGGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGCGGGCAGCAGA’

tmpzRpKiy-2.ab1 ‘CGTCGTTCGATGTGGCCGATCACCCTCTCAGGGTCGGCTACGTATCGTTGCCTAGGTGAACCATTACCTCACCAACTAGCTAATACGGCGCGGGTCCATCTATAAGTGATAGCAAAACCATCTTTCACTATTGAACCATGCGGTTCAACATATTATCCGGTATTAGCTCCGGTTTCCCGAAGTTGTCCCAGTCTTATAGGTAGGTTACCCACGTGTTACTCACCCGTCCGCCGCTAACGTCAAAGGAGCAAGCTCCTTATCTGTTCGCTCGACTTGCATGTATTAGGCACGCCGCCAGCGTTCATCCTGAGCCAAATCCAAACTCAAAACGAAGGTATTCTAAAATTTGAAGTCGAGAGAACAGATAAGGAGCTTGCTCCTTTGACGTTTGCGGCGGAGGGGTGAGTAACGCATGGGTTACCTACTAATAATACGGGAACAATTGCGAAATTTGATTTTTGGATAAAAAAAAA’

45min 10h15-10h40 generate_report written and tested

generate_report(workup_info, ab1_file1, ab1_file2, lookup_fun, assembled_render, strandwise_render) workup_info is a named tuple as described in the MDX database section Read the AB1 files Try to contig them If success, lookup_fun the assembly in contig and pass the workup_info, the full contig result, the result, and the AB1 tracks to assembled_render If failure, lookup_fun the two strands and pass the workup_info, the results and the AB1 tracks to strandwise_render assembled_render and strandwise_render return strings of data, which are returned by generate_report For testing, use a pair of AB1 files, just a nop as lookup_fun, and assembled_render and strandwise_render return just the first 10 characters of sequences from the AB1 files.

15min 9h01-9h03 Blast a sequence from BioPython, and pickle the parsed result for testing purposes

Use the sequence TAGGATCAACATGCGTTTCAGCAAACAACCCATCAATCCCCACCGCCGCCGCAGCTCTCGCTAAAATAGGGGCAAAAGAGCTGTCTCCTGAACTTTTCCCGTTCGCTCCCCCTGGCATTTGCACGCTATGGGTAGCGTCAAAAATCACAGGGGCAAATTCTCGCATGATTTTT Goes to H. Pylori It’s in data/blast.pickle. XML is data/pylori_blast.xml

10min 15h48-15h49 Add templet to codebase

25min 9h03-9h13 Write a function to render an AB1 file as alignment (no offsets)

render_ab1(seq, conf, trace) -> HTML that can be embedded in a page Call the seqviewer stuff

15min 9h13-9h26 Write a function to pretty print DNA in an easily copyable way

pprint_seq(seq) -> HTML that can be embedded in a page pprint_seq_css() -> <style></style> block Handle gaps and IUPAC codes

Mock up in basic HTML with a multiline sequence

Write a templet function to return CSS required for this

Write a templet function to return the HTML for a sequence

5min Write a templet function to return seqviewer alignment CSS block

30min 9h26-9h38 Write a function to render two AB1s and the result of contig to HTML

render_alignment(contig, seq1, conf1, trace1, seq2, conf2, trace2) -> HTML that can be embedded in a page Take from the assemble function in seqviewer

Formatting of BLAST results as templet function

90min 9h38-10h35 Use example BLAST in project dir and make a mockup of results in raw HTML

45min 10h35-11h49 Convert raw mockup to a templet function using the pickled BLAST results

5min Write a BLAST CSS block and BLAST JavaScript block function

Tabs in raw JavaScript and CSS in Firefox set up as a template

20min 15h49-15h55 Look up how to set CSS properties on a div by name in Firefox

<html><head> <script type=”text/javascript”> function make_red(n) { document.getElementById(n).style.color = “#f00”; } </script> </head><body> <p><span id=”boris”>Hi!</span> <a onclick=”make_red(‘boris’)”>Make red</a></p> </body></html>

1h 15h55-16h17 Mock up a tab set with links that say to hide the direct children of #tab_body and show the one specified by the link (name specified)

20min 16h17-16h47 Add an h1 header and style the tabs to fill the whole screen - set baselines

30min 16h47-17h05 Make into a templet function which takes additional CSS blocks and a dict of tab names and content for each tab

40min 10h40-11h21 Assembled report designed and implemented

assembled_render(contig_result, blast_result, seq1, conf1, trace1, seq2, conf2, trace2) Returns a string of HTML Workup info at top, link to ../daily_summary.html, then two tabs: Assembly and BLAST Assembly shows the seqviewer alignment followed by the pretty printed assembly. BLAST shows the formatted BLAST results

40min 11h21-11h26 Strandwise report designed and implemented

strandwise_render(seq1, conf1, trace1, blast_result1, seq2, conf2, trace2, blast_result2) Returns a string of HTML Same as assembled, but shows both strands separately, both sequences separately, and has two tabs for BLAST results

Change ‘Assembly’ to ‘Strands’ in strandwise report

Enqueueing, pairing and checking newly queued files

Look at CREATE events and enqueue a structure indicating this onto a specified queue. All that has to be enqueued is the full path to the file.

Defined as a function queue_events(queue, path, mask, fun=lambda x: x)

  • queue - the queue to push to
  • path - the path to monitor
  • mask - the inotify mask to use
  • filter - a regex that the filename in the event must match to be enqueued
  • fun - a function that receives the event and produces a value that is actually enqueued (Defaults to id)

Add ‘exit’ event to queue_events that shuts down thread

15min 17h05-17h10 Look up polling myself instead of calling loop for inotify (as for gtk example)

ThreadedNotifier, and a thread monitoring exit_event that calls stop on the notifier. https://github.com/seb-m/pyinotify/wiki/Tutorial

notifier = pyinotify.ThreadedNotifier(wm, EventHandler()) notifier.start() wdd = wm.add_watch(‘/tmp’, mask, rec=True) wm.rm_watch(wdd.values()) notifier.stop()

10min 11h35-11h44 Write a test to check the exit event

5min 17h10-17h14 Make queue_events properly handle exit event

Add ‘exit’ to batched_unique that shuts down thread

5min 17h14-17h29 Switch test for batched_unique to used exit event instead of n_batches

Need to switch to Linux with winpdb to figure out what’s going on

10min Replace n_batches with exit event in batched_unique

15min 11h46-12h05 Add ‘filter’ argument to queue_events (and test for it)

30min Tests set up for queue_events

Use a directory in test/data Check that files are properly enqueued on creation Check that filter works

Generic process function written

process(pair_by, unmatched_fun, pair_fun)

pair_by needs to look up from the mock object. The key it returns is returned, so the key should be just the workup information.

process(lookup_fun,pairing_key_fun, unmatched_file_path_ref, unmatched_queue, share_path_ref, workup_path_ref, pair_queue, post_enqueue_fun)(filepaths_set): if any of the filepaths don’t exist, syslog a warning and drop them look up files with lookup_fun to get full data try to pair the files with pairing_key_fun for each unpaired file, if its n_retries = 0, move it to unmatched_file_path and delete it from n_retries dict; else reenqueue it and decrement its n_retries. for each pair, ensure that its target directory exists, move the files there, and queue the pair on analysis_queue. Run post_enqueue_fun

30min 12h08-12h21 Set up test framework for process

lookup_fun = id, pairing_key_fun = id, unmatched_file_path_ref = test/data/unmatched, unmatched_queue = a queue, share_path_ref = test/data/process_share, workup_path_ref = workup, pair_queue = a queue, post_enqueue_fun = set an event Write a skeleton of process that does nothing and fails Create files to process in test/data/to_process, set up queues and event, call process on them, check that files go in the right places and that the queues have the correct values.

30min 12h21-12h25 Write pair_up

Takes a key function and an iterable and produces a list of all pairs that it can, and a list of the unpaired items. It’s groupby plus a filter

15min 12h25-12h29 Write ensure_isdir

Takes a path and: if path is a directory -> do nothing if path does not exist -> create directory else -> raise an error

1.5h 12h29-12h57 13h35-13h56 12h15-13h08 Write guts of process

Most of the behavior has been moved to the functions unmatched_fun and pair_fun.

unmatched_fun should be something like requeue_n_times(unmatched_queue, n_retries_ref, function to move file to unmatched on final failure)

pair_fun(mdx_obj, share_path_ref, workup_path_ref, pair_queue)(file1, file2) pair_fun should: Set up the target directory Move the files thence Enqueue the pair on a given queue Call update on the MDX object

Tests aren’t passing…

Catch any errors and syslog them. Drop the file the error occurred for. process(lookup_fun,pairing_key_fun, unmatched_file_path_ref, unmatched_queue, share_path_ref, workup_path_ref, pair_queue, post_enqueue_fun)(filepaths_set): if any of the filepaths don’t exist, syslog a warning and drop them look up files with lookup_fun to get full data try to pair the files with pairing_key_fun for each unpaired file, if its n_retries = 0, move it to unmatched_file_path and delete it from n_retries dict; else reenqueue it and decrement its n_retries. for each pair, ensure that its target directory exists, move the files there, and queue the pair on analysis_queue. Run post_enqueue_fun

Intermittent enqueuing of all files

intermittently and enqueue_files

15min 13h56-14h05 Add filter to enqueue_files to match filenames with a regex

30min Write intermittently function and tests

intermittently(fun, delay_ref, exit_event) runs fun with a fixed delay between runs.

  • fun - function to run
  • delay - Ref-like to the delay in seconds between runs.
  • exit_event - don’t run if this is true

Create a run_now event, and pass a function that sets it to a delay timer. Then loop in the intermittently function until the run_now event is set or the exit_event is set. If the run_now event, get the value from delay_ref and start a new timer. If exit_event, return immediately.