support for png and jpgs without having to swap encodings

zou-group · Jul 7, 2024 · f44350a · f44350a
1 parent ee27d5b
commit f44350a
Show file tree

Hide file tree

Showing 12 changed files with 638 additions and 73 deletions.
diff --git a/README.md b/README.md
@@ -103,17 +103,19 @@ We have many more examples around how TextGrad can optimize all kinds of variabl
 
 ### Tutorials 
 
-We have prepared a couple of tutorials to get you started with TextGrad. 
-You can run them directly in Google Colab by clicking on the links below.
+We have prepared a couple of tutorials to get you started with TextGrad. The order of this
+tutorial is what we would recommend to follow for a beginner. You can run them directly in Google Colab by clicking on the links below (but
+you need an OpenAI/Anthropic key to run the LLMs).
 
 <div align="center">
 
-| Example                                         | Colab Link                                                                                                                                                                                                    |
-|-------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Introduction to TextGrad Primitives             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Primitives.ipynb)                       |
-| Optimizing a Code Snippet and Define a New Loss | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/textgrad/blob/main/examples/notebooks/Tutorial-Test-Time-Loss-for-Code.ipynb) |
-| Prompt Optimization                             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Prompt-Optimization.ipynb)              |
-| Solution Optimization                           | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Solution-Optimization.ipynb)   |
+| Tutorial                                           | Difficulty                                                      | Colab Link                                                                                                                                                                                                    |
+|----------------------------------------------------|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1. Introduction to TextGrad Primitives             | ![](https://img.shields.io/badge/Level-Beginner-green.svg)      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Primitives.ipynb)              |
+| 2. Solution Optimization                           | ![](https://img.shields.io/badge/Level-Beginner-green.svg)      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Solution-Optimization.ipynb)   |
+| 3. Optimizing a Code Snippet and Define a New Loss | ![](https://img.shields.io/badge/Level-Beginner-green.svg)      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/textgrad/blob/main/examples/notebooks/Tutorial-Test-Time-Loss-for-Code.ipynb) |
+| 4. Prompt Optimization                             | ![](https://img.shields.io/badge/Level-Intermediate-yellow.svg) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-Prompt-Optimization.ipynb)     |
+| 5. MultiModal Optimization                         | ![](https://img.shields.io/badge/Level-Beginner-green.svg)      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Tutorial-MultiModal.ipynb)              |
 
 </div>
 

diff --git a/examples/notebooks/Local-Model-With-LMStudio.ipynb b/examples/notebooks/Local-Model-With-LMStudio.ipynb
@@ -182,7 +182,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "textgrad",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -196,9 +196,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.9"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/examples/notebooks/Tutorial-MultiModal-DeepDive.ipynb b/examples/notebooks/Tutorial-MultiModal-DeepDive.ipynb
diff --git a/examples/notebooks/TextGrad-Vision.ipynb → examples/notebooks/Tutorial-MultiModal.ipynb b/examples/notebooks/TextGrad-Vision.ipynb → examples/notebooks/Tutorial-MultiModal.ipynb
@@ -1,20 +1,35 @@
 {
  "cells": [
   {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "6e5bfa16-5124-452c-bc56-3427e453751a",
+   "cell_type": "markdown",
+   "id": "0023a2ae-72fe-490b-b715-4dddb2539c38",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "%load_ext autoreload\n",
+    "# TextGrad Tutorials: MultiModal Optimization\n",
+    "\n",
+    "![TextGrad](https://github.com/vinid/data/blob/master/logo_full.png?raw=true)\n",
+    "\n",
+    "An autograd engine -- for textual gradients!\n",
+    "\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Prompt-Optimization.ipynb)\n",
+    "[![GitHub license](https://img.shields.io/badge/License-MIT-blue.svg)](https://lbesson.mit-license.org/)\n",
+    "[![Arxiv](https://img.shields.io/badge/arXiv-2406.07496-B31B1B.svg)](https://arxiv.org/abs/2406.07496)\n",
+    "[![Documentation Status](https://readthedocs.org/projects/textgrad/badge/?version=latest)](https://textgrad.readthedocs.io/en/latest/?badge=latest)\n",
+    "[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/textgrad)](https://pypi.org/project/textgrad/)\n",
+    "[![PyPI](https://img.shields.io/pypi/v/textgrad)](https://pypi.org/project/textgrad/)\n",
+    "\n",
+    "**Objectives for this tutorial:**\n",
+    "\n",
+    "* Introduce you to multimodal optimization with TextGrad\n",
     "\n",
-    "%autoreload 2"
+    "**Requirements:**\n",
+    "\n",
+    "* You need to have an OpenAI API key to run this tutorial. This should be set as an environment variable as OPENAI_API_KEY.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "id": "8dd3140c-45d0-478e-b184-ec5faed66964",
    "metadata": {},
    "outputs": [],
@@ -48,26 +63,48 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 87,
+   "execution_count": 2,
    "id": "3ca4c62b-2d83-412d-b410-0ed5272a6f06",
    "metadata": {},
    "outputs": [],
    "source": [
     "import textgrad as tg\n",
+    "\n",
+    "# differently from the past tutorials, we now need a multimodal LLM call instead of a standard one!\n",
     "from textgrad.autograd import MultimodalLLMCall\n",
     "from textgrad.loss import ImageQALoss"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 88,
+   "execution_count": 6,
+   "id": "2b06474c-491d-48ff-aef1-62cb0e525473",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from dotenv import load_dotenv\n",
+    "load_dotenv(\".env\", override=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
    "id": "d1986a01-3afd-46a8-a99e-e977b1141768",
    "metadata": {},
    "outputs": [],
    "source": [
-    "from dotenv import load_dotenv\n",
-    "load_dotenv(override=True)\n",
-    "tg.set_backward_engine(\"claude-3-haiku-20240307\")"
+    "tg.set_backward_engine(\"gpt-4o\")"
    ]
   },
   {
@@ -78,29 +115,71 @@
     "# Simply answering questions about images"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "efa853c2-2703-4304-a9c5-a3bde675b532",
+   "metadata": {},
+   "source": [
+    "We now downlaod an image from the web."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "950b3502-97ce-4581-87c7-4c47421beafc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import httpx\n",
+    "\n",
+    "image_url = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\n",
+    "image_data = httpx.get(image_url).content"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ad925fa-c1e6-482d-af4c-df6f8dcb2c2f",
+   "metadata": {},
+   "source": [
+    "As usual, in TextGrad we now have to transform our object of interest into a Variable object. In the previous tutorials, we were doing this with text data, now we are going to do this with Images."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 89,
+   "execution_count": 11,
    "id": "e0629de4-9fcf-4df4-9316-cc86455929e6",
    "metadata": {},
+   "outputs": [],
+   "source": [
+    "image_variable = tg.Variable(image_data, role_description=\"image to answer a question about\", requires_grad=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fcbfaaf-8aa4-4bfa-82af-bf5b7aef0f94",
+   "metadata": {},
+   "source": [
+    "Let's now ask as question!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "1d940cf3-a461-43f4-bc4a-6103589b159e",
+   "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "Variable(value=This image shows a close-up of an ant. The ant appears to be black and is standing on a surface, possibly the ground. The image is highly detailed, showing the ant's body segments, legs, and antennae. The background is blurred, which helps to focus attention on the ant., role=response from the language model, grads=set())"
+       "Variable(value=This image shows a close-up of an ant. The ant appears to be black and is standing on a surface, possibly a ground or a floor. The image is highly detailed, showing the ant's body segments, legs, and antennae. The background is blurred, which helps to focus attention on the ant., role=response from the language model, grads=set())"
       ]
      },
-     "execution_count": 89,
+     "execution_count": 12,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "import httpx\n",
-    "\n",
-    "image_url = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\n",
-    "image_data = httpx.get(image_url).content\n",
-    "image_variable = tg.Variable(image_data, role_description=\"image to answer a question about\", requires_grad=False)\n",
     "question_variable = tg.Variable(\"What do you see in this image?\", role_description=\"question\", requires_grad=False)\n",
     "response = MultimodalLLMCall(\"gpt-4o\")([image_variable, question_variable])\n",
     "response"
@@ -131,57 +210,53 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 90,
+   "execution_count": 15,
    "id": "29affc0a-cedc-40fd-bec4-6bf5178409cf",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "Variable(value=This answer, while providing some accurate information, falls short of being a complete and good response for the image. Let's critically evaluate it:\n",
-       "\n",
-       "1. Incompleteness: The answer fails to mention several key details visible in the image. For instance, it doesn't describe the ant's posture (reared up on its hind legs), the texture of its exoskeleton, or the fine hairs visible on its body.\n",
+       "Variable(value=The provided answer is quite detailed and covers many aspects of the image. However, there are a few points that could be improved or clarified:\n",
        "\n",
-       "2. Lack of precision: The description of the ant as \"black\" is imprecise. The ant appears to have a dark, metallic sheen that could be better described as gunmetal or dark gray.\n",
+       "1. **Species Identification**: The answer mentions \"likely a species of black ant,\" which is a bit vague. While it's understandable that the exact species might not be identifiable, it could be better to simply state that it is a black ant without speculating on the species.\n",
        "\n",
-       "3. Missing context: The answer doesn't comment on the exceptional quality of the macro photography, which is a significant aspect of this image.\n",
+       "2. **Surface Description**: The answer states the ant is on a \"textured surface, possibly concrete or soil.\" This is a reasonable guess, but it could be more concise by just mentioning a textured surface without speculating on the material.\n",
        "\n",
-       "4. Overlooked details: The response fails to mention the ant's mandibles, which are clearly visible and an important feature of the image.\n",
+       "3. **Ant's Posture**: The description of the ant's posture as \"alert or defensive\" is speculative. While the ant's posture is indeed notable, it might be better to describe it without attributing a specific behavior unless it is clearly evident.\n",
        "\n",
-       "5. Lack of depth: There's no attempt to describe the ant's behavior or posture, which appears to be in an alert or defensive stance.\n",
+       "4. **Background Description**: The explanation of the background being blurred due to a shallow depth of field is accurate and well-explained.\n",
        "\n",
-       "6. Vague background description: While the answer mentions a blurred background, it doesn't describe the colors visible (reddish and green tones), which contribute to the overall composition.\n",
+       "5. **Detail Description**: The mention of the texture of the ant's body, the shine on its exoskeleton, and the fine hairs on its legs is excellent and adds to the vividness of the description.\n",
        "\n",
-       "7. Surface description: The answer is uncertain about the surface the ant is on, when it's clearly a textured, light-colore, role=evaluation of the response from the language model, grads=set())"
+       "Overall, the answer is comprehensive and well-articulated but could benefit from slightly less speculation and more straightforward descriptions., role=evaluation of the response from the language model, grads=set())"
       ]
      },
-     "execution_count": 90,
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "loss_fn = ImageQALoss(\n",
-    "    evaluation_instruction=\"Does this seem like a complete and good answer for the image? Criticize.\",\n",
-    "    engine=\"claude-3-5-sonnet-20240620\"\n",
+    "    evaluation_instruction=\"Does this seem like a complete and good answer for the image? Criticize. Do not provide a new answer.\",\n",
+    "    engine=\"gpt-4o\"\n",
     ")\n",
     "loss = loss_fn(question=question_variable, image=image_variable, response=response)\n",
     "loss"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 91,
+   "execution_count": 16,
    "id": "38c2d4ff-1458-459d-8915-3d1a254564fb",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "This image showcases an exceptional close-up view of a metallic, gunmetal-gray ant standing alert on a textured, light-colored surface. The ant's exoskeleton has a striking, almost iridescent sheen, and its body is covered in fine hairs that are clearly visible. The ant's prominent mandibles are positioned in a defensive stance, with the creature reared up on its hind legs, conveying a sense of vigilance and aggression.\n",
-      "\n",
-      "The exceptional macro-level detail and focus of the photography isolates the ant and draws the viewer's attention to its intricate features. The blurred, reddish and green-toned background further emphasizes the ant, creating a dramatic, almost cinematic quality to the image. This close-up perspective provides a rare glimpse into the biology and behavior of this small but fascinating creature, revealing insights into its role within the broader natural world.\n"
+      "This image shows a close-up of a black ant, captured using macro photography. The ant is standing on a textured surface. The image is highly detailed, showcasing the ant's body segments, legs, and antennae with great clarity. The ant's head is raised, and its antennae are extended. The background is blurred, employing a shallow depth of field to focus attention on the ant and highlight its intricate details. The texture of the ant's body segments, the shine on its exoskeleton, and the fine hairs on its legs are all clearly visible, adding to the vividness of the image.\n"
      ]
     }
    ],
@@ -225,7 +300,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.6"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,