Skip to content

Commit

Permalink
Merge branch 'master' into DOCS-1062
Browse files Browse the repository at this point in the history
  • Loading branch information
J2-D2-3PO authored Dec 5, 2024
2 parents 1a00368 + a259e4a commit 24157bf
Show file tree
Hide file tree
Showing 140 changed files with 3,379 additions and 1,142 deletions.
1 change: 1 addition & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ jobs:
'mistral1',
'notdiamond',
'openai',
'vertexai',
'scorers_tests',
'pandas-test',
]
Expand Down
25 changes: 16 additions & 9 deletions docs/docs/guides/integrations/google-gemini.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,28 @@ import os
import google.generativeai as genai
import weave

weave.init(project_name="google_ai_studio-test")
weave.init(project_name="google-ai-studio-test")

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Write a story about an AI and magic")
```

Weave will also automatically capture traces for [Vertex APIs](https://cloud.google.com/vertexai/docs). To start tracking, calling `weave.init(project_name="<YOUR-WANDB-PROJECT-NAME>")` and use the library as normal.

```python
import vertexai
import weave
from vertexai.generative_models import GenerativeModel

weave.init(project_name="vertex-ai-test")
vertexai.init(project="<YOUR-VERTEXAIPROJECT-NAME>", location="<YOUR-VERTEXAI-PROJECT-LOCATION>")
model = GenerativeModel("gemini-1.5-flash-002")
response = model.generate_content(
"What's a good name for a flower shop specialising in selling dried flower bouquets?"
)
```

## Track your own ops

Wrapping a function with `@weave.op` starts capturing inputs, outputs and app logic so you can debug how data flows through your app. You can deeply nest ops and build a tree of functions that you want to track. This also starts automatically versioning code as you experiment to capture ad-hoc details that haven't been committed to git.
Expand Down Expand Up @@ -97,11 +112,3 @@ Given a weave reference to any `weave.Model` object, you can spin up a fastapi s
```shell
weave serve weave:///your_entity/project-name/YourModel:<hash>
```

## Vertex API

Full Weave support for the `Vertex AI SDK` python package is currently in development, however there is a way you can integrate Weave with the Vertex API.

Vertex API supports OpenAI SDK compatibility ([docs](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/call-gemini-using-openai-library)), and if this is a way you build your application, Weave will automatically track your LLM calls via our [OpenAI](/guides/integrations/openai) SDK integration.

\* Please note that some features may not fully work as Vertex API doesn't implement the full OpenAI SDK capabilities.
125 changes: 125 additions & 0 deletions docs/docs/guides/tools/comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Comparison

The Weave Comparison feature allows you to visually compare and diff code, traces, prompts, models, and model configurations. You can compare two objects side-by-side or analyze a larger set of objects to identify differences, patterns, and trends.

This guide covers the steps to start a comparison and the available actions to tailor your comparison view, including baseline comparisons, numeric diff formatting, and more.

## Access the Comparison view

1. In the sidebar, select the type of object you'd like to compare (e.g. **Traces**, **Models**, etc.).
2. Select the objects that you want to compare. The selection method varies depending on the type of object you are comparing:
- For **Traces**, select traces to compare by checking the checkboxes in the appropriate rows in the Traces column.
- For objects such as **Models**, navigate to the model Versions page and check the checkboxes next to the versions that you want to compare.
3. Click **Compare** to open the Comparison view. Now, you can refine your view using the [available actions](#available-actions).

## Available actions

In the Comparison view, you have multiple actions available, depending on how many objects are being compared. Make sure to look at the [usage notes](#usage-notes).

- [Change the diff display](#change-the-diff-display)
- [Display side-by-side](#display-side-by-side)
- [Display in a unified view](#display-in-a-unified-view)
- [Set a baseline](#set-a-baseline)
- [Remove a baseline](#remove-a-baseline)
- [Change the comparison order](#change-the-comparison-order)
- [Change numeric diff display format](#change-numeric-diff-display-format)
- [Compare with baseline or previous](#compare-with-baseline-or-previous)
- [Compare a pair from a multi-object comparison](#compare-a-pair-from-a-multi-object-comparison)
- [Remove an object from comparison](#remove-an-object-from-comparison)

### Change the diff display

By default, **Diff only** is set to off. To filter the table rows so that only changed rows are displayed, toggle **Diff only** on.

### Display side-by-side

> This option is only available when comparing two objects, or a [pair from a multi-object comparison](#compare-a-pair-from-a-multi-object-comparison).
To compare each object side-by-side in separate columns, select **Side-by-side**.

![Side-by-side Comparison view of two objects](imgs/comparison-2objs-sidebyside.png)

### Display in a unified view

> This option is only available when comparing two objects, or a [pair from a multi-object comparison](#compare-a-pair-from-a-multi-object-comparison).
To compare each object in a unified view, select **Unified**.

![Unified Comparison view of two objects](imgs/comparison-2objs-unified.png)

### Set a baseline

By default, each object in the Comparison view is compared to the object to its left. However, you can set an object as the _baseline_, which means that all objects will be compared to the leftmost object in the view.
To set an object as baseline, do the following:

1. In the Comparison view topbar, mouse over the object that you want to set as the baseline.
2. Click the three dots to the right of the ID.
![Make baseline option displayed.](imgs/comparison-2objs-baseline.png)
3. In the dropdown, select **Make baseline**. The UI refreshes so that the baseline object is furthest left in the topbar, and `Baseline` displays next to the ID.
![Baseline set.](imgs/comparison-2objs-baseline-set.png)

### Remove a baseline

To remove an object as baseline, do the following:

1. In the Comparison view topbar, mouse over the baseline object.
2. Click the three dots to the right of the ID.
3. In the dropdown, select **Remove baseline**. Now, `Baseline` no longer displays next to the call ID.

### Change the comparison order

To change the comparison order, do the following:

1. In the Comparison view topbar, mouse over the ID that you want to reorder.
2. Click the six dots to the left of the ID.
![Setting the order.](imgs/comparison-2objs-reorder.png)
3. Drag the ID to the left or the right, depending on which object was selected.
4. Place the ID in the desired ordering. The UI refreshes with an updated comparison ordering.

### Change numeric diff display format

For numeric values such as `completion_tokens` and `total_tokens`, you can view the diff as either an integer or a percentage. Additionally, positive numeric values can be viewed as a multiplier. To change a numeric diff's display format, do the following:

1. In the Comparison table, find the numeric value that you want to update the diff display format for.
![A numeric value displayed as an integer.](imgs/comparison-2objs-numericdiffformat.png)
2. Click the diff value. The format automatically updates to either an integer or a percentage.
![A numeric value updated to a percentage.](imgs/comparison-2objs-numericdiffformat-updated.png)

### Compare with baseline or previous

> This option is only available when comparing 3 or more objects.
> You can also [set](#set-a-baseline) or [remove an existing baseline by clicking the 3 dots to the right of the ID](#remove-a-baseline).
To perform a baseline comparison with 3 or more objects, do the following:

1. In the right hand corner of the Comparison view, click the dropdown. Depending on your current view configuration, the dropdown is either titled **Compare with previous** or **Compare with baseline**.
2. Depending on your current view configuration, select either **Compare with previous** or **Compare with baseline**.
- **Compare with baseline**: Sets the leftmost object as the baseline. The table updates so that the leftmost column is the baseline.
- **Compare with previous**: No object is set as baseline.

### Compare a pair from a multi-object comparison

> This option is only available when comparing 3 or more objects.
When comparing 3 or more objects, you can compare a single object to a previous object or baseline. This changes the Comparison table view so that the view is identical to a two-object comparison. To compare a pair of objects from a multi-object comparison, do the following:

1. In the Comparison view topbar, find the ID that you want to compare to previous or baseline.
2. To select the item, click the ID. The UI refreshes with a two-way comparison table.
![Comparing a pair from a multi-object comparison.](imgs/comparsion-7objs-diffonly-subset.png)

To reset the view so that the first 6 objects selected for comparison are displayed in the table, click the ID again.

### Remove an object from comparison

> This option is only available when comparing 3 or more objects.
To remove an object from comparison, do the following:

1. In the Comparison view topbar, find the object that you want to remove from comparison.
2. Click the three dots to the right of the ID.
3. In the dropdown, select **Remove object from comparison**. The UI refreshes with an updated table that no longer includes the removed object.

## Usage notes

- The Comparison feature is only available in the UI.
- You can compare as many objects as you'd like. However, the UI only displays a maximum of 6. To view an object in the comparison table that is not visible when comparing more than 6 objects, either [change the comparison order](#change-the-comparison-order) so that the object is one of the first 6 objects from left to right, or [pair from a multi-object comparison](#compare-a-pair-from-a-multi-object-comparison) for easy viewing.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
193 changes: 167 additions & 26 deletions docs/docs/guides/tools/playground.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,189 @@
# Playground

Evaluating LLM prompts and responses is challenging. The Playground tool enables you to quickly iterate on prompts: editing, retrying, and deleting messages. The LLM Playground is currently in preview.
> **The LLM Playground is currently in preview.**
Evaluating LLM prompts and responses is challenging. The Weave Playground is designed to simplify the process of iterating on LLM prompts and responses, making it easier to experiment with different models and prompts. With features like prompt editing, message retrying, and model comparison, Playground helps you to quickly test and improve your LLM applications. Playground currently supports OpenAI, Anthropic, Gemini, and Groq.

## Features

- **Quick access:** Open the Playground from the W&B sidebar for a fresh session or from the Call page to test an existing project.
- **Message controls:** Edit, retry, or delete messages directly within the chat.
- **Flexible messaging:** Add new messages as either user or system inputs, and send them to the LLM.
- **Customizable settings:** Configure your preferred LLM provider and adjust model settings.
- **Multi-LLM support:** Switch between models, with team-level API key management.
- **Compare models:** Compare how different models respond to prompts.

Get started with the Playground to optimize your LLM interactions and streamline your prompt engineering process and LLM application development.

- [Prerequisites](#prerequisites)
- [Add provider credentials and information](#add-provider-credentials-and-information)
- [Access the Playground](#access-the-playground)
- [Select an LLM](#select-an-llm)
- [Adjust LLM parameters](#adjust-llm-parameters)
- [Add a function](#add-a-function)
- [Retry, edit, and delete messages](#retry-edit-and-delete-messages)
- [Add a new message](#add-a-new-message)
- [Compare LLMs](#compare-llms)

## Prerequisites

Before you can use Playground, you must [add provider credentials](#add-provider-credentials-and-information), and [open the Playground UI](#access-the-playground).

### Add provider credentials and information

Playground currently supports OpenAI, Anthropic, Gemini, Groq, and Amazon Bedrock models.
To use one of the available models, add the appropriate information to your team secrets in W&B settings.

- OpenAI: `OPENAI_API_KEY`
- Anthropic: `ANTHROPIC_API_KEY`
- Gemini: `GOOGLE_API_KEY`
- Groq: `GEMMA_API_KEY`
- Amazon Bedrock:
- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`
- `AWS_REGION_NAME`

### Access the Playground

There are two ways to access the Playground:

1. From the sidebar, click **Playground**. This will open a fresh Playground page with a simple system prompt.
2. From the Call page, click the **Open chat in playground** button from the call page's chat view.
1. *Open a fresh Playground page with a simple system prompt*: In the sidebar, select **Playground**. Playground opens in the same tab.
2. *Open Playground for a specific call*:
1. In the sidebar, select the **Traces** tab. A list of traces displays.
2. In the list of traces, click the name of the call that you want to view. The call's details page opens.
3. Click **Open chat in playground**. Playground opens in a new tab.

![Screenshot of Open in Playground button](imgs/open_chat_in_playground.png)

## Retry, edit, and delete messages
## Select an LLM

You can switch the LLM using the dropdown menu in the top left. Currently, the available models are:

- gpt-40-mini
- claude-3-5-sonnet-20240620
- claude-3-5-sonnet-20241022
- claude-3-haiku-20240307
- claude-3-opus-20240229
- claude-3-sonnet-20240229
- gemini/gemini-1.5-flash-001
- gemini/gemini-1.5-flash-002
- gemini/gemini-1.5-flash-8b-exp-0827
- gemini/gemini-1.5-flash-8b-exp-0924
- gemini/gemini-1.5-flash-exp-0827
- gemini/gemini-1.5-flash-latest
- gemini/gemini-1.5-flash
- gemini/gemini-1.5-pro-001
- gemini/gemini-1.5-pro-002
- gemini/gemini-1.5-pro-exp-0801
- gemini/gemini-1.5-pro-exp-0827
- gemini/gemini-1.5-pro-latest
- gemini/gemini-1.5-pro
- gemini/gemini-pro
- gpt-3.5-turbo-0125
- gpt-3.5-turbo-1106
- gpt-3.5-turbo-16k
- gpt-3.5-turbo
- gpt-4-0125-preview
- gpt-4-0314
- gpt-4-0613
- gpt-4-1106-preview
- gpt-4-32k-0314
- gpt-4-turbo-2024-04-09
- gpt-4-turbo-preview
- gpt-4-turbo
- gpt-4
- gpt-40-2024-05-13
- gpt-40-2024-08-06
- gpt-40-mini-2024-07-18
- gpt-4o
- groq/gemma-7b-it
- groq/gemma2-9b-it
- groq/llama-3.1-70b-versatile
- groq/llama-3.1-8b-instant
- groq/llama3-70b-8192
- groq/llama3-8b-8192
- groq/llama3-groq-70b-8192-tool-use-preview
- groq/llama3-groq-8b-8192-tool-use-preview
- groq/mixtral-8x7b-32768
- o1-mini-2024-09-12
- o1-mini
- o1-preview-2024-09-12
- o1-preview
- ai21.j2-mid-v1
- ai21.j2-ultra-v1
- amazon.titan-text-lite-v1
- amazon.titan-text-express-v1
- mistral.mistral-7b-instruct-v0:2
- mistral.mixtral-8x7b-instruct-v0:1
- mistral.mistral-large-2402-v1:0
- mistral.mistral-large-2407-v1:0
- anthropic.claude-3-sonnet-20240229-v1:0
- anthropic.claude-3-5-sonnet-20240620-v1:0
- anthropic.claude-3-haiku-20240307-v1:0
- anthropic.claude-3-opus-20240229-v1:0
- anthropic.claude-v2
- anthropic.claude-v2:1
- anthropic.claude-instant-v1
- cohere.command-text-v14
- cohere.command-light-text-v14
- cohere.command-r-plus-v1:0
- cohere.command-r-v1:0
- meta.llama2-13b-chat-v1
- meta.llama2-70b-chat-v1
- meta.llama3-8b-instruct-v1:0
- meta.llama3-70b-instruct-v1:0
- meta.llama3-1-8b-instruct-v1:0
- meta.llama3-1-70b-instruct-v1:0
- meta.llama3-1-405b-instruct-v1:0

## Adjust LLM parameters

You can experiment with different parameter values for your selected model. To adjust parameters, do the following:

1. In the upper right corner of the Playground UI, click **Chat settings** to open the parameter settings dropdown.
2. In the dropdown, adjust parameters as desired. You can also toggle Weave call tracking on or off, and [add a function](#add-a-function).
3. Click **Chat settings** to close the dropdown and save your changes.

Once in the Playground, you can see the chat history.
When hovering over a message, you will see three buttons: **Edit**, **Retry**, and **Delete**.
![Screenshot of Playground settings](imgs/playground_settings.png)

![Screenshot of Playground message buttons](imgs/playground_message_buttons.png)
## Add a function

1. **Retry**: Deletes all subsequent messages and retries the chat from the selected message.
2. **Delete**: Removes the message from the chat.
3. **Edit**: Allows you to modify the message content.
You can test how different models use functions based on input it receives from the user. To add a function for testing in Playground, do the following:

![Screenshot of Playground editing](imgs/playground_message_editor.png)
1. In the upper right corner of the Playground UI, click **Chat settings** to open the parameter settings dropdown.
2. In the dropdown, click **+ Add function**.
3. In the pop-up, add your function information.
4. To save your changes and close the function pop-up, click the **x** in the upper right corner.
3. Click **Chat settings** to close the settings dropdown and save your changes.

## Adding new messages
## Retry, edit, and delete messages

To add a new message to the chat without sending it to the LLM, select the role (e.g., **User**) and click **Add**.
To send a new message to the LLM, click the **Send** button or press **Command + Enter**.
With Playground, you can retry, edit, and delete messages. To use this feature, hover over the message you want to edit, retry, or delete. Three buttons display: **Delete**, **Edit**, and **Retry**.

![Screenshot of Playground sending a message](imgs/playground_chat_input.png)
- **Delete**: Remove the message from the chat.
- **Edit**: Modify the message content.
- **Retry**: Delete all subsequent messages and retry the chat from the selected message.

## Configuring the LLM
![Screenshot of Playground message buttons](imgs/playground_message_buttons.png)
![Screenshot of Playground editing](imgs/playground_message_editor.png)

We currently support 4 LLM providers.
To use each LLM, your team admin needs to add the relevant API key to your team's settings (found at **wandb.ai/[team-name]/settings**):
## Add a new message

- OpenAI: `OPENAI_API_KEY`
- Anthropic: `ANTHROPIC_API_KEY`
- Gemini: `GOOGLE_API_KEY`
- Groq: `GEMMA_API_KEY`
To add a new message to the chat, do the following:

### Choosing the LLM and its settings
1. In the chat box, select one of the available roles (**Assistant** or **User**)
2. Click **+ Add**.
3. To send a new message to the LLM, click the **Send** button. Alternatively, press the **Command** and **Enter** keys.

Click the **Settings** button to open the settings drawer.
![Screenshot of Playground sending a message](imgs/playground_chat_input.png)

![Screenshot of Playground settings](imgs/playground_settings.png)
## Compare LLMs

Playground allows you to compare LLMs. To perform a comparison, do the following:

You can also switch the LLM using the dropdown menu in the top left.
1. In the Playground UI, click **Compare**. A second chat opens next to the original chat.
2. In the second chat, you can:
- [Select the LLM to compare](#select-an-llm)
- [Adjust parameters](#adjust-llm-parameters)
- [Add functions](#add-a-function)
3. In the message box, enter a message that you want to test with both models and press **Send**.
Loading

0 comments on commit 24157bf

Please sign in to comment.