From ca14a4ef3b81e96baa7e2201aed117fca74b11b1 Mon Sep 17 00:00:00 2001 From: William FH <13333726+hinthornw@users.noreply.github.com> Date: Tue, 21 Nov 2023 19:26:19 -0800 Subject: [PATCH] Wfh/rerun notebooks (#69) Add links to the notebooks --- docs/source/notebooks/datasets.ipynb | 2 +- docs/source/notebooks/extraction/email.ipynb | 4 +- docs/source/notebooks/getting_started.ipynb | 42 +- .../retrieval/comparing_techniques.ipynb | 43 +- .../retrieval/langchain_docs_qa.ipynb | 392 ++++++++++++++---- 5 files changed, 376 insertions(+), 107 deletions(-) diff --git a/docs/source/notebooks/datasets.ipynb b/docs/source/notebooks/datasets.ipynb index 949971b6..f09cd528 100644 --- a/docs/source/notebooks/datasets.ipynb +++ b/docs/source/notebooks/datasets.ipynb @@ -195,7 +195,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.6" + "version": "3.11.2" } }, "nbformat": 4, diff --git a/docs/source/notebooks/extraction/email.ipynb b/docs/source/notebooks/extraction/email.ipynb index f04da650..e4e9b4a8 100644 --- a/docs/source/notebooks/extraction/email.ipynb +++ b/docs/source/notebooks/extraction/email.ipynb @@ -30,8 +30,8 @@ "import os\n", "\n", "# Get your API key from https://smith.langchain.com/settings\n", - "# os.environ[\"LANGCHAIN_API_KEY\"] = \"sk-...\"\n", - "# os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"" + "os.environ[\"LANGCHAIN_API_KEY\"] = \"sk-...\"\n", + "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"" ] }, { diff --git a/docs/source/notebooks/getting_started.ipynb b/docs/source/notebooks/getting_started.ipynb index 1e3912d7..a037432b 100644 --- a/docs/source/notebooks/getting_started.ipynb +++ b/docs/source/notebooks/getting_started.ipynb @@ -66,7 +66,7 @@ }, "outputs": [], "source": [ - "# %pip install -U --quiet langchain_benchmarks langchain langsmith" + "%pip install -U --quiet langchain_benchmarks langchain langsmith" ] }, { @@ -81,7 +81,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 1, "id": "c516c725-c968-422b-aedf-e360d4f7774c", "metadata": { "tags": [] @@ -92,26 +92,28 @@ "text/html": [ "
Name | Type | Dataset ID | Description |
---|---|---|---|
Name | Type | Dataset ID | Description |
Tool Usage - Typewriter (1 func) | ToolUsageTask | placeholder | Environment with a single function that accepts a single letter as input, and "prints" it on a piece of paper.\n", + " |
Tool Usage - Typewriter (1 tool) | ToolUsageTask | 59577193-8938-4ccf-92a7-e8a96bcf4f86 | Environment with a single tool that accepts a single letter as input, and prints it on a piece of virtual paper.\n", "\n", - "The objective of this task is to evaluate the ability to use the provided tools to repeat a given input string.\n", + "The objective of this task is to evaluate the ability of the model to use the provided tools to repeat a given input string.\n", "\n", - "For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\n", + "For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\n", "\n", "The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string. |
Tool Usage - Typewriter | ToolUsageTask | placeholder | Environment with 26 functions each representing a letter of the alphabet.\n", + " |
Tool Usage - Typewriter (26 tools) | ToolUsageTask | 128af05e-aa00-4e3b-a958-d166dd450581 | Environment with 26 tools each tool represents a letter of the alphabet.\n", "\n", - "In this variation of the typewriter task, there are 26 parameterless functions, where each function represents a letter of the alphabet (instead of a single function that takes a letter as an argument).\n", + "The objective of this task is to evaluate the model's ability the use tools\n", + "for a simple repetition task.\n", "\n", - "The object is to evaluate the ability of use the functions to repeat the given string.\n", + "For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\n", "\n", - "For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\n", + "The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string.\n", "\n", - "The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string. |
Tool Usage - Relational Data | ToolUsageTask | e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5 | Environment with fake data about users and their locations and favorite foods.\n", + "This is a variation of the typer writer task, where 26 parameterless tools are\n", + "given instead of a single tool that takes a letter as an argument. |
Tool Usage - Relational Data | ToolUsageTask | 1d89f4b3-5f73-48cf-a127-2fdeb22f6d84 | Environment with fake data about users and their locations and favorite foods.\n", "\n", "The environment provides a set of tools that can be used to query the data.\n", "\n", @@ -122,25 +124,25 @@ "Each example is composed of a question, a reference answer, and information about the sequence in which tools should be used to answer the question.\n", "\n", "Success is measured by the ability to answer the question correctly, and efficiently. |
Multiverse Math | ToolUsageTask | placeholder | An environment that contains a few basic math operations, but with altered results.\n", + " |
Multiverse Math | ToolUsageTask | 594f9f60-30a0-49bf-b075-f44beabf546a | An environment that contains a few basic math operations, but with altered results.\n", "\n", "For example, multiplication of 5*3 will be re-interpreted as 5*3*1.1. The basic operations retain some basic properties, such as commutativity, associativity, and distributivity; however, the results are different than expected.\n", "\n", "The objective of this task is to evaluate the ability to use the provided tools to solve simple math questions and ignore any innate knowledge about math. |
Email Extraction | ExtractionTask | https://smith.langchain.com/public/36bdfe7d-3cd1-4b36-b957-d12d95810a2b/d | A dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail.\n", + " |
Email Extraction | ExtractionTask | a1742786-bde5-4f51-a1d8-e148e5251ddb | A dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail.\n", "\n", "Some additional cleanup of the data was done by hand after the initial pass.\n", "\n", "See https://github.com/jacoblee93/oss-model-extraction-evals. |
LangChain Docs Q&A | RetrievalTask | 452ccafc-18e1-4314-885b-edd735f17b9d | Questions and answers based on a snapshot of the LangChain python docs.\n", + " |
LangChain Docs Q&A | RetrievalTask | 452ccafc-18e1-4314-885b-edd735f17b9d | Questions and answers based on a snapshot of the LangChain python docs.\n", "\n", "The environment provides the documents and the retriever information.\n", "\n", "Each example is composed of a question and reference answer.\n", "\n", "Success is measured based on the accuracy of the answer relative to the reference answer.\n", - "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Semi-structured Earnings | RetrievalTask | c47d9617-ab99-4d6e-a6e6-92b8daf85a7d | Questions and answers based on PDFs containing tables and charts.\n", + "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Semi-structured Reports | RetrievalTask | c47d9617-ab99-4d6e-a6e6-92b8daf85a7d | Questions and answers based on PDFs containing tables and charts.\n", "\n", "The task provides the raw documents as well as factory methods to easily index them\n", "and create a retriever.\n", @@ -148,15 +150,15 @@ "Each example is composed of a question and reference answer.\n", "\n", "Success is measured based on the accuracy of the answer relative to the reference answer.\n", - "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Name | Type | Dataset ID | Description |
---|---|---|---|
LangChain Docs Q&A | RetrievalTask | 452ccafc-18e1-4314-885b-edd735f17b9d | Questions and answers based on a snapshot of the LangChain python docs.\n", + "\n", + "The environment provides the documents and the retriever information.\n", + "\n", + "Each example is composed of a question and reference answer.\n", + "\n", + "Success is measured based on the accuracy of the answer relative to the reference answer.\n", + "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Semi-structured Reports | RetrievalTask | c47d9617-ab99-4d6e-a6e6-92b8daf85a7d | Questions and answers based on PDFs containing tables and charts.\n", + "\n", + "The task provides the raw documents as well as factory methods to easily index them\n", + "and create a retriever.\n", + "\n", + "Each example is composed of a question and reference answer.\n", + "\n", + "Success is measured based on the accuracy of the answer relative to the reference answer.\n", + "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Name | Type | Dataset ID | Description |
---|---|---|---|
Name | Type | Dataset ID | Description |
LangChain Docs Q&A | RetrievalTask | 452ccafc-18e1-4314-885b-edd735f17b9d | Questions and answers based on a snapshot of the LangChain python docs.\n", + " |
LangChain Docs Q&A | RetrievalTask | 452ccafc-18e1-4314-885b-edd735f17b9d | Questions and answers based on a snapshot of the LangChain python docs.\n", "\n", "The environment provides the documents and the retriever information.\n", "\n", "Each example is composed of a question and reference answer.\n", "\n", "Success is measured based on the accuracy of the answer relative to the reference answer.\n", - "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Semi-structured Earnings | RetrievalTask | c47d9617-ab99-4d6e-a6e6-92b8daf85a7d | Questions and answers based on PDFs containing tables and charts.\n", + "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Semi-structured Reports | RetrievalTask | c47d9617-ab99-4d6e-a6e6-92b8daf85a7d | Questions and answers based on PDFs containing tables and charts.\n", "\n", "The task provides the raw documents as well as factory methods to easily index them\n", "and create a retriever.\n", @@ -107,15 +107,15 @@ "Each example is composed of a question and reference answer.\n", "\n", "Success is measured based on the accuracy of the answer relative to the reference answer.\n", - "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Name | LangChain Docs Q&A |
Type | RetrievalTask |
Dataset ID | 452ccafc-18e1-4314-885b-edd735f17b9d |
Name | LangChain Docs Q&A |
Type | RetrievalTask |
Dataset ID | 452ccafc-18e1-4314-885b-edd735f17b9d |
Description | Questions and answers based on a snapshot of the LangChain python docs.\n", "\n", "The environment provides the documents and the retriever information.\n", @@ -148,18 +148,18 @@ "Each example is composed of a question and reference answer.\n", "\n", "Success is measured based on the accuracy of the answer relative to the reference answer.\n", - "We also measure the faithfulness of the model's response relative to the retrieved documents (if any). |
Retriever Factories | basic, parent-doc, hyde |
Architecture Factories | conversational-retrieval-qa |
get_docs | <function load_cached_docs at 0x102d17240> |
Retriever Factories | basic, parent-doc, hyde |
Architecture Factories | conversational-retrieval-qa |
get_docs |
\n", + " | inputs.question | \n", + "feedback.embedding_cosine_distance | \n", + "feedback.score_string:accuracy | \n", + "feedback.faithfulness | \n", + "error | \n", + "execution_time | \n", + "
---|---|---|---|---|---|---|
count | \n", + "86 | \n", + "86.000000 | \n", + "86.000000 | \n", + "82.000000 | \n", + "0 | \n", + "86.000000 | \n", + "
unique | \n", + "86 | \n", + "NaN | \n", + "NaN | \n", + "NaN | \n", + "0 | \n", + "NaN | \n", + "
top | \n", + "in code, how can i add a system message at the... | \n", + "NaN | \n", + "NaN | \n", + "NaN | \n", + "NaN | \n", + "NaN | \n", + "
freq | \n", + "1 | \n", + "NaN | \n", + "NaN | \n", + "NaN | \n", + "NaN | \n", + "NaN | \n", + "
mean | \n", + "NaN | \n", + "0.190418 | \n", + "0.177907 | \n", + "0.939024 | \n", + "NaN | \n", + "9.605034 | \n", + "
std | \n", + "NaN | \n", + "0.045291 | \n", + "0.176503 | \n", + "0.199231 | \n", + "NaN | \n", + "3.323173 | \n", + "
min | \n", + "NaN | \n", + "0.074583 | \n", + "0.100000 | \n", + "0.100000 | \n", + "NaN | \n", + "4.748375 | \n", + "
25% | \n", + "NaN | \n", + "0.154158 | \n", + "0.100000 | \n", + "1.000000 | \n", + "NaN | \n", + "7.521995 | \n", + "
50% | \n", + "NaN | \n", + "0.190138 | \n", + "0.100000 | \n", + "1.000000 | \n", + "NaN | \n", + "8.637612 | \n", + "
75% | \n", + "NaN | \n", + "0.222883 | \n", + "0.100000 | \n", + "1.000000 | \n", + "NaN | \n", + "10.116563 | \n", + "
max | \n", + "NaN | \n", + "0.289047 | \n", + "1.000000 | \n", + "1.000000 | \n", + "NaN | \n", + "18.631366 | \n", + "