Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round trip WYSIWG LabOP editor, LLMs, Microsoft Word #195

Open
photocyte opened this issue Apr 25, 2023 · 1 comment
Open

Round trip WYSIWG LabOP editor, LLMs, Microsoft Word #195

photocyte opened this issue Apr 25, 2023 · 1 comment

Comments

@photocyte
Copy link
Member

Making a note of this idea that was brainstormed on 2023-04-25's weekly meeting:

(I may have been inspired about this idea from recently reading this blog post, might be handy: https://ben.balter.com/2014/03/31/word-versus-markdown-more-than-mere-semantics/)

A key functionality of LabOP, is that it should be an easily editable, as well as executable, format for laboratory protocols. Currently, we have the LabOP editor (https://github.com/Bioprotocols/laboped), which provides a GUI for editing the full power of LabOP, to represent graph-like and otherwise non-linear or parallelized protocols with conditional based execution.

However, many protocols (or subprotocols) are simply linear (serial execution of steps without conditions or feedback), and may not need the full power of the LabOP editor. Furthermore, many laboratory users would be terminally discouraged by having to turn to editing Python or RDF, or even raw Markdown LabOP representations.

The likely user preferred export/executable format for such a linear protocol, is a numbered list (with, optionally, indented sub-lists representing, perhaps, notes on the step, or substeps etc.).

Currently LabOP has the ability to export to Markdown (and via Markdown, to a read only PDF), which is a great solution to export such protocols that are well represented as numbered lists. Markdown however, is not a good solution to edit such protocols, as most users do not have a WYSIWYG Markdown editor on their computer (although many are freely available, it is an additional step & barrier to entry). Editing the Markdown raw text, i.e. via their likely installed plaintext editor, may also be terminally discouraging to users.

Here I suggest a few things:

1. LabOP have a Microsoft Word .docx specialization (This could be as simple as Pandoc or OpenOffice conversion from Markdown to .docx)

  • However, if instead converting directly into .docx, there is an option to stash LabOP informative metadata into the hidden .docx XML. See how Zotero interfaces with .docx for an example with inserting citations into the .docx.
  • .docx also has the advantage of being able to be imported into & edited on Google Docs, but perhaps with an unclear maintenance of any LabOP stashed metadata.
  • One could also consider HTML as an alternative to .docx document format with markup, but most users do not have a WYSIWYG HTML editor on their computer, unlike .docx files, as most users already have the Microsoft Office suite or competitive equivalents (Open Office, Apple Pages etc.). See also: HTML Specialization in addition to Markdown? #158
  • Or alternatively the LabOP RDF file itself could stashed somewhere in .docx ZIP archive. (https://en.wikipedia.org/wiki/Office_Open_XML#:~:text=zipped)

2. LabOP have the ability to injest / back-translate a suitable .docx, back into LabOP RDF.

  • In a simple case, a user might edit a single parameter (i.e. change 50 µL -> 100 µL in a single place). That type of change should be easily able to be back-translated into LabOP RDF. The change could also be forward propagated to all the places that value is used elsewhere in the protocol (or not, if the user wishes it to not be propagated).
  • Presumably, this back translation might need the context of the original LabOP RDF protocol which was exported from. Or, such context being stashed in the hidden metadata of the .docx.

3. LabOP should take advantage of the ability to diff .docx files, (i.e. the "Compare Documents" functionality), to better perform the back translation. 1,2,3 taken together, could then make for a Round trip WYSIWYG LabOP Editor, for simple edits.

4. If, a user makes a complex edit in Microsoft Word, i.e. they write in natural language, and then a large language model (LLM) would convert their edit into a LabOP compliant document.

  • A improper LLM edit would be caught by either the LabOP .docx -> LabOP RDF translator, or, by the human user. (i.e. a non-sensical edit by the LLM would be obviously wrong to one of the two parties).
  • An example is given below:

(Original LabOP exported .docx)

4. Put on pipette tip on pipette M300 left, from tipbox in Slot 5
5. Transfer 100 µL from Well A1 into Well A2
6. Drop pipette tip into trash at location slot 12

(User makes an edit)

4. Put on pipette tip on pipette M300 left, from tipbox in Slot 5
5. Transfer 100 µL from Well A1 into Well A2, and then mix
6. Drop pipette tip into trash at location slot 12

->(LLM converts near-LabOP-docx into LabOP-backtranslateable-docx)->
(This may or may not, actually work with stashed metadata / encapsulated RDF on the .docx XML backend, if it exists)

4. Put on pipette tip on pipette M300 left, from tipbox in Slot 5
5. Transfer 100 µL from Well A1 into Well A2
6. Mix Well A1 
   a. Aspirate 80 µL from Well A1
   b. Dispense 80 µL into Well A1
   c. Goto substep a. 4 more times (5 times total)
7. Drop pipette tip into trash at location slot 12

(Example .docx showing this formatting)
LabOP_example.docx

5. A well functioning LLM assistant for converting near-LabOP-complaint-docx, into LabOP-compliant-docx (and therefore onwards into RDF), might be a mechanism for supervised or automated conversion of generic natural language laboratory protocols, found publicly or provided privately by users, into LabOP RDF protocols.

@photocyte
Copy link
Member Author

I pulled out the .XML file that is representing the numbered list with lettered sublists within the above LabOP_example.docx, in case people are interested what it actually looks like in the backend of the .docx:
document.xml.zip

(This is a .docx that was made as a blank file in Microsoft Word, uploaded, maintained, and edited to include the numbered list as lettered sublist in Google Docs, then it was downloaded & unzipped - so, it might be a Google preferred subset of the .docx XML markup that plays a bit more nicely with interoperable use).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant