Skip to content

Commit

Permalink
Adding more docs on how to use LLMs with OAK
Browse files Browse the repository at this point in the history
  • Loading branch information
cmungall committed Oct 22, 2024
1 parent 64fd0cf commit fb327b4
Show file tree
Hide file tree
Showing 7 changed files with 788 additions and 470 deletions.
168 changes: 168 additions & 0 deletions docs/examples/Adapters/LLM/LLM-Tutorial.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": [
"# LLM Tutorial\n",
"\n",
"This walks through using OAK through an LLM wrapper.\n",
"\n",
"See also [How-to guide](https://incatools.github.io/ontology-access-kit/howtos/use-llms.html).\n",
"\n",
"Note for this to work, you must either install OAK with llm extras, or do a separate install\n",
"of `pipx install llm`.\n",
"\n",
"You will also need the API keys for an LLM service, or a proxy to a local model.\n",
"\n",
" "
],
"id": "cf2572dda785deed"
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Annotate Command\n",
"\n",
"Note the first time you run this it may be slow, as it needs to perform an initial embedding.\n",
"\n",
"Here we use the standard OAK `annotate` command, but instead of the usual adapter (e.g. `sqlite:obo:cl`), we pass in a wrapped adapter, using the `gpt4-o` model.\n",
"\n",
"We strongly recommend passing in categories, as this helps the model ground the kinds of terms you are interested in."
],
"id": "95ff062ec749f629"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-10-22T02:00:12.384637Z",
"start_time": "2024-10-22T01:59:41.305531Z"
}
},
"cell_type": "code",
"source": "!runoak --stacktrace -i llm:{gpt-4o}:sqlite:obo:cl annotate \"sequencing was performed on splenic and thymic macrophages\" --category CellType \n",
"id": "8044c89577c9625",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"object_id: CL:0000871\r\n",
"object_label: splenic macrophage\r\n",
"object_categories:\r\n",
"- CellType\r\n",
"subject_label: splenic macrophages\r\n",
"\r\n",
"---\r\n",
"object_id: CL:0000866\r\n",
"object_label: thymic macrophage\r\n",
"object_categories:\r\n",
"- CellType\r\n",
"subject_label: thymic macrophages\r\n",
"start: 40\r\n",
"end: 58\r\n"
]
}
],
"execution_count": 3
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"Currently the specific span coordinates are only returns for concepts that can be clearly mapped back to the text.\n",
"\n",
"You can also use the standard `--whole-text` (`-W`) option to match the entire text span, rather than to annotate segments:"
],
"id": "a014e2a04badc986"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-10-22T02:04:59.758809Z",
"start_time": "2024-10-22T02:04:43.271571Z"
}
},
"cell_type": "code",
"source": "!runoak --stacktrace -i llm:{gpt-4o}:sqlite:obo:cl annotate -W \"macrophage found in the thymus\" --category CellType ",
"id": "4cdcabaad7e6268e",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"object_id: CL:0000866\r\n",
"object_label: thymic macrophage\r\n",
"subject_label: macrophage found in the thymus\r\n"
]
}
],
"execution_count": 6
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Suggesting Definitions\n",
"\n"
],
"id": "7f06875f274fbd06"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-10-22T02:02:45.880910Z",
"start_time": "2024-10-22T02:02:27.963766Z"
}
},
"cell_type": "code",
"source": [
"!runoak -i llm:sqlite:obo:uberon generate-definitions \\\n",
" finger toe \\\n",
" --style-hints \"write definitions in formal genus-differentia form\""
],
"id": "3a9f92f9e258b301",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"add definition 'A manual digit is a type of anatomical structure characterized as one of the distal appendages found on the human hand, distinct from those structures on other limbs, and is primarily comprised of phalanges, a metacarpal bone, and associated soft tissue.' to UBERON:0002389\r\n",
"add definition 'A pedal digit is a type of anatomical structure that is a subdivision of the limb and is specifically located at the distal end of the pes, commonly known as the foot, in vertebrates.' to UBERON:0001466\r\n"
]
}
],
"execution_count": 5
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": "",
"id": "3f64b6dc3ae0a288"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
1 change: 1 addition & 0 deletions docs/examples/Adapters/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ Adapter Examples
:maxdepth: 2

Ubergraph/Ubergraph-Tutorial
LLM/LLM-Tutorial
4 changes: 3 additions & 1 deletion docs/howtos/use-llms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ LLM CLI tools such as the datasette ``llm`` tool pair naturally
OAK LLM Adapter
---------------

See also the `LLM Notebook <https://incatools.github.io/ontology-access-kit/examples/Adapters/LLM/LLM-Tutorial.html>`_.

OAK provides a number of different adapters (implementations) for each of its interfaces.
Some adapters provide direct access to an ontology or collection of ontologies; others act as *wrappers*
onto another adapter, and inject additional functionality.
Expand Down Expand Up @@ -246,7 +248,7 @@ Then you can use the model in OAK:
Mixtral via groq and LiteLLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`groq <https://groq.com/>` provides an API over souped-up hardware running Llama2 and Mixtral.
`groq <https://groq.com/>`_ provides an API over souped-up hardware running Llama2 and Mixtral.
You can configure in a similar way to ollama above, but here we are proxying to a remote server:

. code-block:: bash
Expand Down
Loading

0 comments on commit fb327b4

Please sign in to comment.