forked from langchain-ai/langchain
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add how-to on multi-modal tool calling (langchain-ai#21667)
Can move this to a dedicated multi-modal section if desired.
- Loading branch information
Showing
2 changed files
with
161 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4facdf7f-680e-4d28-908b-2b8408e2a741", | ||
"metadata": {}, | ||
"source": [ | ||
"# How to call tools with multi-modal data\n", | ||
"\n", | ||
"Here we demonstrate how to call tools with multi-modal data, such as images.\n", | ||
"\n", | ||
"Some multi-modal models, such as those that can reason over images or audio, support [tool calling](/docs/concepts/#functiontool-calling) features as well.\n", | ||
"\n", | ||
"To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).\n", | ||
"\n", | ||
"Below, we demonstrate examples using [OpenAI](/docs/integrations/platforms/openai) and [Anthropic](/docs/integrations/platforms/anthropic). We will use the same image and tool in all cases. Let's first select an image, and build a placeholder tool that expects as input the string \"sunny\", \"cloudy\", or \"rainy\". We will ask the models to describe the weather in the image." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "0d9fd81a-b7f0-445a-8e3d-cfc2d31fdd59", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from typing import Literal\n", | ||
"\n", | ||
"from langchain_core.tools import tool\n", | ||
"\n", | ||
"image_url = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\"\n", | ||
"\n", | ||
"\n", | ||
"@tool\n", | ||
"def weather_tool(weather: Literal[\"sunny\", \"cloudy\", \"rainy\"]) -> None:\n", | ||
" \"\"\"Describe the weather\"\"\"\n", | ||
" pass" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8656018e-c56d-47d2-b2be-71e87827f90a", | ||
"metadata": {}, | ||
"source": [ | ||
"## OpenAI\n", | ||
"\n", | ||
"For OpenAI, we can feed the image URL directly in a content block of type \"image_url\":" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "a8819cf3-5ddc-44f0-889a-19ca7b7fe77e", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'call_mRYL50MtHdeNuNIjSCm5UPmB'}]\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from langchain_core.messages import HumanMessage\n", | ||
"from langchain_openai import ChatOpenAI\n", | ||
"\n", | ||
"model = ChatOpenAI(model=\"gpt-4o\").bind_tools([weather_tool])\n", | ||
"\n", | ||
"message = HumanMessage(\n", | ||
" content=[\n", | ||
" {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n", | ||
" {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n", | ||
" ],\n", | ||
")\n", | ||
"response = model.invoke([message])\n", | ||
"print(response.tool_calls)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e5738224-1109-4bf8-8976-ff1570dd1d46", | ||
"metadata": {}, | ||
"source": [ | ||
"Note that we recover tool calls with parsed arguments in LangChain's [standard format](/docs/how_to/tool_calling) in the model response." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0cee63ff-e09f-4dd8-8323-912edbde94f6", | ||
"metadata": {}, | ||
"source": [ | ||
"## Anthropic\n", | ||
"\n", | ||
"For Anthropic, we can format a base64-encoded image into a content block of type \"image\", as below:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "d90c4590-71c8-42b1-99ff-03a9eca8082e", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'toolu_016m9KfknJqx5fVRYk4tkF6s'}]\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"import base64\n", | ||
"\n", | ||
"import httpx\n", | ||
"from langchain_anthropic import ChatAnthropic\n", | ||
"\n", | ||
"image_data = base64.b64encode(httpx.get(image_url).content).decode(\"utf-8\")\n", | ||
"\n", | ||
"model = ChatAnthropic(model=\"claude-3-sonnet-20240229\").bind_tools([weather_tool])\n", | ||
"\n", | ||
"message = HumanMessage(\n", | ||
" content=[\n", | ||
" {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n", | ||
" {\n", | ||
" \"type\": \"image\",\n", | ||
" \"source\": {\n", | ||
" \"type\": \"base64\",\n", | ||
" \"media_type\": \"image/jpeg\",\n", | ||
" \"data\": image_data,\n", | ||
" },\n", | ||
" },\n", | ||
" ],\n", | ||
")\n", | ||
"response = model.invoke([message])\n", | ||
"print(response.tool_calls)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |