Merge pull request #151 from cohere-ai/llmu-updates

Update LLMU notebooks - RAG
cohere-ai · Apr 16, 2024 · b066c6c · b066c6c
2 parents cffb10f + 9d80f2c
commit b066c6c
Show file tree

Hide file tree

Showing 9 changed files with 488 additions and 798 deletions.
diff --git a/README.md b/README.md
@@ -2,10 +2,23 @@
 
 Welcome to the Cohere Cookbook! This repository provides a collection of examples to help you get started with the Cohere API. These examples contain step-by-step guides, with code examples and explanations, to help you understand and use the API effectively.
 
-# Getting Started
-The cookbook is grouped into two categories. To get started, go to any of the categories below. You will find more details there, but here's a summary:
+# Categories
+The cookbook is grouped into two categories. To get started, go to any of the categories below.
 
 | Category | Description |
 | --- | --- |
 | [Guides](notebooks/guides/) | Tutorials and step-by-step guides covering a range of topics, providing practical guidance and code examples.
-| [LLM University](notebooks/llmu/) | Guides for getting started with Cohere, starting with basic usage and progressing to advanced topics. The code companion to the full [LLM University course](https://llm.university/).|
+| [LLM University](notebooks/llmu/) | Guides for getting started with Cohere, starting with basic usage and progressing to advanced topics. The code companion to the full [LLM University course](https://llm.university/).|
+
+# Getting Started
+If you are looking for a quick tour of the Cohere API, the following notebooks will help you get up and running.
+
+- [**Text Generation**](notebooks/llmu/Building_a_Chatbot.ipynb): Get started with the Command R+ model by building a chatbot using Cohere’s Chat endpoint. Build a chatbot that can respond to user messages and maintain the context of the conversation.
+
+- [**Text Embeddings**](notebooks/llmu/Introduction_Text_Embeddings.ipynb): Get started with the Embed model by generating text embeddings for a dataset. Observe graphically the relationships between documents and explore how to leverage embeddings for semantic search and clustering.
+
+- [**Retrieval-Augmented Generation**](notebooks/llmu/RAG_with_Chat_Embed_and_Rerank.ipynb): Build a RAG-powered chatbot that can extract relevant information from external documents and produce verifiable, inline citations in its responses. This leverages the Chat endpoint as well as the Embed v3 and Rerank 3 models.
+
+- [**Tool Use**](notebooks/Vanilla_Tool_Use.ipynb): Tool use allows you to connect LLMs to external tools like search engines, APIs, functions, databases, etc. In this example, build an assistant that that can query sales reports and a product catalog and provide its analysis.
+
+- [**Multi-Step Tool Use**](notebooks/Data_Analyst_Agent_Cohere_and_Langchain.ipynb): Multi-step tool use allows an LLM to call more than one tool in a sequence of steps, using the results from one tool call in a subsequent step. In this example, build a simple data analyst agent that is able to search the web and run code in a Python interpreter. This agent uses Cohere's Command R+ model and Langchain.
diff --git a/notebooks/llmu/Embed_Endpoint.ipynb b/notebooks/llmu/Embed_Endpoint.ipynb
@@ -17,7 +17,7 @@
       "source": [
         "# The Embed Endpoint\n",
         "\n",
-        "In this lab, we'll learn how to analyze a text dataset using Cohere's Embed cohere endpoint. This colab accompanies the [Classify endpoint lesson](https://docs.cohere.com/docs/embed-endpoint/) of LLM University."
+        "In this lab, we'll learn how to analyze a text dataset using Cohere's Embed cohere endpoint. This colab accompanies the [Embed endpoint lesson](https://docs.cohere.com/docs/embed-endpoint/) of LLM University."
       ]
     },
     {

diff --git a/...ks/llmu/Visualizing_Text_Embeddings.ipynb → ...s/llmu/Introduction_Text_Embeddings.ipynb b/...ks/llmu/Visualizing_Text_Embeddings.ipynb → ...s/llmu/Introduction_Text_Embeddings.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/cohere-ai/notebooks/blob/main/notebooks/llmu/Visualizing_Text_Embeddings.ipynb\">\n",
+    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/cohere-ai/notebooks/blob/main/notebooks/llmu/Introduction_Text_Embeddings.ipynb\">\n",
     "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
     "</a>"
    ]
@@ -15,23 +15,11 @@
     "id": "psRggLwvhi1E"
    },
    "source": [
-    "# Visualizing Text Embeddings\n",
+    "# Introduction to Text Embeddings\n",
     "\n",
     "Text embeddings are a useful way to turn text into numbers that capture its meaning and context. In this notebook, you'll learn how to put them into practice using Cohere's [Embed endpoint](https://docs.cohere.com/reference/embed). You'll calculate embeddings for a dataset of sentences, and plot them in the plane to observe graphically that indeed similar sentences are mapped to close points in the embedding. You'll also explore how to leverage embeddings for semantic search and clustering."
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Overview\n",
-    "\n",
-    "This notebook has three main sections, each with a corresponding blog post:\n",
-    "- **Introduction to Text Embeddings** - Understand the intuition behind text embeddings. _Read the accompanying [blog post here](https://txt.cohere.ai/introduction-to-text-embeddings/)._ \n",
-    "- **Introduction to Semantic Search** - Learn how to use embeddings to build a search capability that surfaces relevant information based on the semantic meaning of a query. _Read the accompanying [blog post here](https://txt.cohere.ai/introduction-to-semantic-search/)._\n",
-    "- **Clustering with Embeddings** - Learn how to use embeddings to group similar documents into clusters, to discover emerging patterns in the documents. _Read the accompanying [blog post here](https://docs.cohere.com/docs/clustering-with-embeddings)._"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -69,13 +57,6 @@
     "from sklearn.cluster import KMeans"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Fill in your Cohere API key in the next cell. To do this, begin by [signing up to Cohere](https://os.cohere.ai/) (for free!) if you haven't yet. Then get your API key [here](https://dashboard.cohere.com/api-keys)."
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -94,8 +75,6 @@
     "\n",
     "In this section, we understand the intuition behind text embeddings.\n",
     "\n",
-    "_Read the accompanying [blog post here](https://txt.cohere.ai/introduction-to-text-embeddings/)._\n",
-    "\n",
     "### Step 1: Prepare the Dataset\n",
     "\n",
     "We'll work with a subset of the Airline Travel Information System (ATIS) dataset ([source](https://aclanthology.org/H90-1021/)), created based on customer inquiries related to flight bookings, flight departures, arrivals, delays, and cancellations. In the next code cell, we create and preview a dataframe `df` containing 91 queries."

diff --git a/notebooks/llmu/Introduction_to_RAG.ipynb b/notebooks/llmu/Introduction_to_RAG.ipynb
@@ -215,7 +215,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": null,
+      "execution_count": 2,
       "metadata": {
         "id": "CdxeI3XW4yIH"
       },
@@ -256,7 +256,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": null,
+      "execution_count": 3,
       "metadata": {
         "colab": {
           "base_uri": "https://localhost:8080/"
@@ -269,18 +269,15 @@
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "The tallest living species of penguin is the emperor penguin (Apteryx australis), which can measure up to 1.6 to 1.8 m (5.2 to 6 ft) when fully grown.\n",
+            "The tallest living penguins are emperor penguins, which are found only in Antarctica.\n",
             "\n",
             "CITATIONS:\n",
-            "start=45 end=60 text='emperor penguin' document_ids=['doc_0']\n",
-            "start=61 end=80 text='(Apteryx australis)' document_ids=['doc_0']\n",
-            "start=106 end=118 text='1.6 to 1.8 m' document_ids=['doc_0']\n",
-            "start=119 end=132 text='(5.2 to 6 ft)' document_ids=['doc_0']\n",
+            "start=32 end=48 text='emperor penguins' document_ids=['doc_0']\n",
+            "start=66 end=85 text='only in Antarctica.' document_ids=['doc_1']\n",
             "\n",
             "DOCUMENTS:\n",
             "{'id': 'doc_0', 'text': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'}\n",
-            "{'id': 'doc_1', 'text': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'}\n",
-            "{'id': 'doc_2', 'text': 'Animals are different from plants.', 'title': 'What are animals?'}\n"
+            "{'id': 'doc_1', 'text': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'}\n"
           ]
         }
       ],
@@ -290,9 +287,9 @@
         "\n",
         "# Generate the response\n",
         "response = co.chat_stream(message=message,\n",
+        "                          model=\"command-r-plus\",\n",
         "                          documents=documents)\n",
         "\n",
-        "\n",
         "# Display the response\n",
         "citations = []\n",
         "cited_documents = []\n",
@@ -302,8 +299,8 @@
         "        print(event.text, end=\"\")\n",
         "    elif event.event_type == \"citation-generation\":\n",
         "        citations.extend(event.citations)\n",
-        "    elif event.event_type == \"search-results\":\n",
-        "        cited_documents = event.documents\n",
+        "    elif event.event_type == \"stream-end\":\n",
+        "      cited_documents = event.response.documents\n",
         "\n",
         "# Display the citations and source documents\n",
         "if citations:\n",
@@ -326,7 +323,16 @@
       "name": "python3"
     },
     "language_info": {
-      "name": "python"
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.11.4"
     }
   },
   "nbformat": 4,