conn = oursql.connect(host=’127.0.0.1’,user=’cooksonmicro’,passwd=’helicobacter’,db=’mdx’,port=3307) curs = conn.cursor(oursql.DictCursor) # (or many others) then use cursor as usual
http://packages.python.org/oursql/tutorial.html
ssh -NfL 3307:localhost:3306 [email protected] Connected with: mysql –protocol=tcp –host=localhost –port=3307 –user=cooksonmicro -p mdx
Takes the paths to two ab1 files and a dictionary like {“accession”: “H34908”, “workup”: null, “pat_name”: “MORRALL,MARK R”, “amp_name”: “alt_16s”, “seq_key”: 22708} returns a string of the HTML to be written.
Get a few examples from the real data and put them in an SQLite database to use as the connection.
pyinotify monitors a tree. On creation of an ab1 or json file in the tree:
- if there are two ab1s and a json in that directory, and the json is properly formed, kick off function; break
- if the json is malformed, log an error to that effect
- if there is only one ab1 file, log an info message
- break
- on creation of a strandwise_report.html or assembly_report.html in monitored tree, kick off daily_report function on that day.
- call function on file on pyinotify statement that something has happened
- on SIGUSR1, push everything in the monitored directory (suspend pyinotify)
Files moved while they are being written to work just fine on Linux (at least while moving within a filesystem).
http://assets.nagios.com/downloads/nagiosxi/docs/Log_Monitoring_With_Swatch.pdf
http://www.postfix.org/BASIC_CONFIGURATION_README.html#relay_from
This is to be done by Joe or Jerry
http://dev.mysql.com/tech-resources/articles/mysql-administrator-best-practices.html
Instantiate configuration reference Install signal handlers Connect to syslog with level INFO Call seqlablib.conf.read_configuration on /etc/seqlabd.conf to get configuration, assign it to reference Connect to MDX database Set up all queues: newly arrived file queue, analysis queue, HTML regeneration queue Attach behavior to queues Initialize inotify monitoring Check for files already in inbox and enqueue them all Set up missed file checking
seqlablib.config: read_configuration(handle) -> dict
seqlablib.signals: set_signal_handlers(conf_ref) -> None
5min 15h46-15h48 Add exit_event argument to set_signal_handlers and propogate to SIGTERM; make SIGTERM handler set the exit event.
Provides a lookup and an update_path function. The lookup should return a named tuple with all necessary fields. The named tuple should have the fields: path, filename, accession, workup, pat_name, amp_name
lookup_by_sequence_key lookup_by_workup update_by_sequence_key
Fake filenames set up with their data in a dictionary, and two functions provided that read and write the dictionary. In test code, write MockMDX object with a set of workups for files and lookup and update functions.
Two queries (from https://web.labmed.washington.edu/micro/PathsAndIDsForFiles): SELECT mdx.`Accession` as accession, mdx.`Workup Number` as workup, mdx.`Patient Name` as pat_name, `amp categories`.`Amp Name` as amp_name, sr.path as path FROM `seq result` AS sr INNER JOIN `amp categories` AS ac USING (`Amp Category ID`) INNER JOIN mdx USING (`MDX ID`) WHERE sr.`Seq Result ID`=’…sequence_key…’
UPDATE `seq result` SET `path“=’…path…’ where “Seq Result ID“=`’…sequence_key…’
The actual object should allow only one connection at a time and block until the connection is available. It could allow more, but this way I can just put a simple event in instead of something more complicated.
https://launchpad.net/myconnpy http://packages.python.org/oursql/
45min Write an object with lookup and update functions that use a shared event to coordinate access; don’t put database logic in; write tests
Be sure it releases event on any error in update or lookup
Check for liveness of the connection.
Order by the creation time of the report in the directories.
30min 14h15-14h50 Function to assemble list of subdirs along with the creation time of *report.html inside and the workup information from the database, and whether it was a strandwise or assembled report.
20min 15h14-15h29 Write function that takes a path, does lookups, and writes summary into daily_summary.html in that path (including tests)
generate_summary_report(path, lookup_fun=id, format_fun=id, summary_filename=None) For each folder in path, runs path_key to get a key and lookup_fun to get info for that key. Then calls format_fun on the list of all such keys to produce a string. If summary_filename == None, return a string. Otherwise, write the results there.
map_queue(queue, fun, exit_event) - pops something off the queue and runs fun on it. When fun returns, repeats. Blocks while queue is empty. Handle exit_event.
blast_seq(seq, xml_path, ncbi_db=’nr’) Takes a string or SeqRecord (seq), returns the path to the XML it writes (in ‘xml_path’) and the parsed BLAST results.
tmpzRpKiy-1.ab1: ‘CAGGGGCATCTATAATGCAGTCGAGCGAACAGATAAGGAGCTTGCTCCTTTGACGTTAGCGGCGGACGGGTGAGTAACACGTGGGTAACCTACCTATAAGACTGGGACAACTTCGGGAAACCGGAGCTAATACCGGATAATATGTTGAACCGCATGGTTCAATAGTGAAAGATGGTTTTGCTATCACTTATAGATGGACCCGCGCCGTATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCAACGATACGTAGCCGACCTGAGAGGGTGATCGGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGCGGGCAGCAGA’
tmpzRpKiy-2.ab1 ‘CGTCGTTCGATGTGGCCGATCACCCTCTCAGGGTCGGCTACGTATCGTTGCCTAGGTGAACCATTACCTCACCAACTAGCTAATACGGCGCGGGTCCATCTATAAGTGATAGCAAAACCATCTTTCACTATTGAACCATGCGGTTCAACATATTATCCGGTATTAGCTCCGGTTTCCCGAAGTTGTCCCAGTCTTATAGGTAGGTTACCCACGTGTTACTCACCCGTCCGCCGCTAACGTCAAAGGAGCAAGCTCCTTATCTGTTCGCTCGACTTGCATGTATTAGGCACGCCGCCAGCGTTCATCCTGAGCCAAATCCAAACTCAAAACGAAGGTATTCTAAAATTTGAAGTCGAGAGAACAGATAAGGAGCTTGCTCCTTTGACGTTTGCGGCGGAGGGGTGAGTAACGCATGGGTTACCTACTAATAATACGGGAACAATTGCGAAATTTGATTTTTGGATAAAAAAAAA’
generate_report(workup_info, ab1_file1, ab1_file2, lookup_fun, assembled_render, strandwise_render) workup_info is a named tuple as described in the MDX database section Read the AB1 files Try to contig them If success, lookup_fun the assembly in contig and pass the workup_info, the full contig result, the result, and the AB1 tracks to assembled_render If failure, lookup_fun the two strands and pass the workup_info, the results and the AB1 tracks to strandwise_render assembled_render and strandwise_render return strings of data, which are returned by generate_report For testing, use a pair of AB1 files, just a nop as lookup_fun, and assembled_render and strandwise_render return just the first 10 characters of sequences from the AB1 files.
Use the sequence TAGGATCAACATGCGTTTCAGCAAACAACCCATCAATCCCCACCGCCGCCGCAGCTCTCGCTAAAATAGGGGCAAAAGAGCTGTCTCCTGAACTTTTCCCGTTCGCTCCCCCTGGCATTTGCACGCTATGGGTAGCGTCAAAAATCACAGGGGCAAATTCTCGCATGATTTTT Goes to H. Pylori It’s in data/blast.pickle. XML is data/pylori_blast.xml
render_ab1(seq, conf, trace) -> HTML that can be embedded in a page Call the seqviewer stuff
pprint_seq(seq) -> HTML that can be embedded in a page pprint_seq_css() -> <style></style> block Handle gaps and IUPAC codes
render_alignment(contig, seq1, conf1, trace1, seq2, conf2, trace2) -> HTML that can be embedded in a page Take from the assemble function in seqviewer
<html><head> <script type=”text/javascript”> function make_red(n) { document.getElementById(n).style.color = “#f00”; } </script> </head><body> <p><span id=”boris”>Hi!</span> <a onclick=”make_red(‘boris’)”>Make red</a></p> </body></html>
1h 15h55-16h17 Mock up a tab set with links that say to hide the direct children of #tab_body and show the one specified by the link (name specified)
30min 16h47-17h05 Make into a templet function which takes additional CSS blocks and a dict of tab names and content for each tab
assembled_render(contig_result, blast_result, seq1, conf1, trace1, seq2, conf2, trace2) Returns a string of HTML Workup info at top, link to ../daily_summary.html, then two tabs: Assembly and BLAST Assembly shows the seqviewer alignment followed by the pretty printed assembly. BLAST shows the formatted BLAST results
strandwise_render(seq1, conf1, trace1, blast_result1, seq2, conf2, trace2, blast_result2) Returns a string of HTML Same as assembled, but shows both strands separately, both sequences separately, and has two tabs for BLAST results
Look at CREATE events and enqueue a structure indicating this onto a specified queue. All that has to be enqueued is the full path to the file.
Defined as a function queue_events(queue, path, mask, fun=lambda x: x)
- queue - the queue to push to
- path - the path to monitor
- mask - the inotify mask to use
- filter - a regex that the filename in the event must match to be enqueued
- fun - a function that receives the event and produces a value that is actually enqueued (Defaults to id)
ThreadedNotifier, and a thread monitoring exit_event that calls stop on the notifier. https://github.com/seb-m/pyinotify/wiki/Tutorial
notifier = pyinotify.ThreadedNotifier(wm, EventHandler()) notifier.start() wdd = wm.add_watch(‘/tmp’, mask, rec=True) wm.rm_watch(wdd.values()) notifier.stop()
Need to switch to Linux with winpdb to figure out what’s going on
Use a directory in test/data Check that files are properly enqueued on creation Check that filter works
process(pair_by, unmatched_fun, pair_fun)
pair_by needs to look up from the mock object. The key it returns is returned, so the key should be just the workup information.
process(lookup_fun,pairing_key_fun, unmatched_file_path_ref, unmatched_queue, share_path_ref, workup_path_ref, pair_queue, post_enqueue_fun)(filepaths_set): if any of the filepaths don’t exist, syslog a warning and drop them look up files with lookup_fun to get full data try to pair the files with pairing_key_fun for each unpaired file, if its n_retries = 0, move it to unmatched_file_path and delete it from n_retries dict; else reenqueue it and decrement its n_retries. for each pair, ensure that its target directory exists, move the files there, and queue the pair on analysis_queue. Run post_enqueue_fun
lookup_fun = id, pairing_key_fun = id, unmatched_file_path_ref = test/data/unmatched, unmatched_queue = a queue, share_path_ref = test/data/process_share, workup_path_ref = workup, pair_queue = a queue, post_enqueue_fun = set an event Write a skeleton of process that does nothing and fails Create files to process in test/data/to_process, set up queues and event, call process on them, check that files go in the right places and that the queues have the correct values.
Takes a key function and an iterable and produces a list of all pairs that it can, and a list of the unpaired items. It’s groupby plus a filter
Takes a path and: if path is a directory -> do nothing if path does not exist -> create directory else -> raise an error
Most of the behavior has been moved to the functions unmatched_fun and pair_fun.
unmatched_fun should be something like requeue_n_times(unmatched_queue, n_retries_ref, function to move file to unmatched on final failure)
pair_fun(mdx_obj, share_path_ref, workup_path_ref, pair_queue)(file1, file2) pair_fun should: Set up the target directory Move the files thence Enqueue the pair on a given queue Call update on the MDX object
Tests aren’t passing…
Catch any errors and syslog them. Drop the file the error occurred for. process(lookup_fun,pairing_key_fun, unmatched_file_path_ref, unmatched_queue, share_path_ref, workup_path_ref, pair_queue, post_enqueue_fun)(filepaths_set): if any of the filepaths don’t exist, syslog a warning and drop them look up files with lookup_fun to get full data try to pair the files with pairing_key_fun for each unpaired file, if its n_retries = 0, move it to unmatched_file_path and delete it from n_retries dict; else reenqueue it and decrement its n_retries. for each pair, ensure that its target directory exists, move the files there, and queue the pair on analysis_queue. Run post_enqueue_fun
intermittently and enqueue_files
intermittently(fun, delay_ref, exit_event) runs fun with a fixed delay between runs.
- fun - function to run
- delay - Ref-like to the delay in seconds between runs.
- exit_event - don’t run if this is true
Create a run_now event, and pass a function that sets it to a delay timer. Then loop in the intermittently function until the run_now event is set or the exit_event is set. If the run_now event, get the value from delay_ref and start a new timer. If exit_event, return immediately.