diff --git a/docs/docs/how_to/index.mdx b/docs/docs/how_to/index.mdx index d0ba3154f523a..7829a2edd3940 100644 --- a/docs/docs/how_to/index.mdx +++ b/docs/docs/how_to/index.mdx @@ -172,6 +172,7 @@ LangChain Tools contain a description of the tool (to pass to the language model - [How to: add a human in the loop to tool usage](/docs/how_to/tools_human) - [How to: do parallel tool use](/docs/how_to/tools_parallel) - [How to: handle errors when calling tools](/docs/how_to/tools_error) +- [How to: call tools using multi-modal data](/docs/how_to/tool_calls_multi_modal) ### Agents diff --git a/docs/docs/how_to/tool_calls_multi_modal.ipynb b/docs/docs/how_to/tool_calls_multi_modal.ipynb new file mode 100644 index 0000000000000..1550d843a923b --- /dev/null +++ b/docs/docs/how_to/tool_calls_multi_modal.ipynb @@ -0,0 +1,160 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4facdf7f-680e-4d28-908b-2b8408e2a741", + "metadata": {}, + "source": [ + "# How to call tools with multi-modal data\n", + "\n", + "Here we demonstrate how to call tools with multi-modal data, such as images.\n", + "\n", + "Some multi-modal models, such as those that can reason over images or audio, support [tool calling](/docs/concepts/#functiontool-calling) features as well.\n", + "\n", + "To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).\n", + "\n", + "Below, we demonstrate examples using [OpenAI](/docs/integrations/platforms/openai) and [Anthropic](/docs/integrations/platforms/anthropic). We will use the same image and tool in all cases. Let's first select an image, and build a placeholder tool that expects as input the string \"sunny\", \"cloudy\", or \"rainy\". We will ask the models to describe the weather in the image." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "0d9fd81a-b7f0-445a-8e3d-cfc2d31fdd59", + "metadata": {}, + "outputs": [], + "source": [ + "from typing import Literal\n", + "\n", + "from langchain_core.tools import tool\n", + "\n", + "image_url = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\"\n", + "\n", + "\n", + "@tool\n", + "def weather_tool(weather: Literal[\"sunny\", \"cloudy\", \"rainy\"]) -> None:\n", + " \"\"\"Describe the weather\"\"\"\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "id": "8656018e-c56d-47d2-b2be-71e87827f90a", + "metadata": {}, + "source": [ + "## OpenAI\n", + "\n", + "For OpenAI, we can feed the image URL directly in a content block of type \"image_url\":" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "a8819cf3-5ddc-44f0-889a-19ca7b7fe77e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'call_mRYL50MtHdeNuNIjSCm5UPmB'}]\n" + ] + } + ], + "source": [ + "from langchain_core.messages import HumanMessage\n", + "from langchain_openai import ChatOpenAI\n", + "\n", + "model = ChatOpenAI(model=\"gpt-4o\").bind_tools([weather_tool])\n", + "\n", + "message = HumanMessage(\n", + " content=[\n", + " {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n", + " {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n", + " ],\n", + ")\n", + "response = model.invoke([message])\n", + "print(response.tool_calls)" + ] + }, + { + "cell_type": "markdown", + "id": "e5738224-1109-4bf8-8976-ff1570dd1d46", + "metadata": {}, + "source": [ + "Note that we recover tool calls with parsed arguments in LangChain's [standard format](/docs/how_to/tool_calling) in the model response." + ] + }, + { + "cell_type": "markdown", + "id": "0cee63ff-e09f-4dd8-8323-912edbde94f6", + "metadata": {}, + "source": [ + "## Anthropic\n", + "\n", + "For Anthropic, we can format a base64-encoded image into a content block of type \"image\", as below:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "d90c4590-71c8-42b1-99ff-03a9eca8082e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'toolu_016m9KfknJqx5fVRYk4tkF6s'}]\n" + ] + } + ], + "source": [ + "import base64\n", + "\n", + "import httpx\n", + "from langchain_anthropic import ChatAnthropic\n", + "\n", + "image_data = base64.b64encode(httpx.get(image_url).content).decode(\"utf-8\")\n", + "\n", + "model = ChatAnthropic(model=\"claude-3-sonnet-20240229\").bind_tools([weather_tool])\n", + "\n", + "message = HumanMessage(\n", + " content=[\n", + " {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n", + " {\n", + " \"type\": \"image\",\n", + " \"source\": {\n", + " \"type\": \"base64\",\n", + " \"media_type\": \"image/jpeg\",\n", + " \"data\": image_data,\n", + " },\n", + " },\n", + " ],\n", + ")\n", + "response = model.invoke([message])\n", + "print(response.tool_calls)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}