Skip to content

Commit

Permalink
chore(weave): Regen latest docs #3265
Browse files Browse the repository at this point in the history
  • Loading branch information
tssweeney authored Dec 17, 2024
1 parent f542138 commit 322d7fb
Show file tree
Hide file tree
Showing 28 changed files with 2,411 additions and 688 deletions.
14 changes: 7 additions & 7 deletions docs/docs/reference/gen_notebooks/01-intro_notebook.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ weave.init(PROJECT)

client = OpenAI()
response = client.chat.completions.create(
model="gpt-3.5-turbo-1106",
model="gpt-4o-mini",
messages=[
{
"role": "system",
Expand Down Expand Up @@ -153,7 +153,7 @@ def correct_grammar(user_input):

stripped = strip_user_input(user_input)
response = client.chat.completions.create(
model="gpt-3.5-turbo-1106",
model="gpt-4o-mini",
messages=[
{
"role": "system",
Expand Down Expand Up @@ -198,7 +198,7 @@ def correct_grammar(user_input):

stripped = strip_user_input(user_input)
response = client.chat.completions.create(
model="gpt-3.5-turbo-1106",
model="gpt-4o-mini",
messages=[
{
"role": "system",
Expand Down Expand Up @@ -282,7 +282,7 @@ class OpenAIGrammarCorrector(weave.Model):


corrector = OpenAIGrammarCorrector(
openai_model_name="gpt-3.5-turbo-1106",
openai_model_name="gpt-4o-mini",
system_message="You are a grammar checker, correct the following user input.",
)

Expand All @@ -307,8 +307,8 @@ dataset = weave.Dataset(
},
{"user_input": " I write good ", "expected": "I write well"},
{
"user_input": " GPT-3 is smartest AI model. ",
"expected": "GPT-3 is the smartest AI model.",
"user_input": " GPT-4 is smartest AI model. ",
"expected": "GPT-4 is the smartest AI model.",
},
],
)
Expand All @@ -331,7 +331,7 @@ import weave
weave.init(PROJECT)

corrector = OpenAIGrammarCorrector(
openai_model_name="gpt-3.5-turbo-1106",
openai_model_name="gpt-4o-mini",
system_message="You are a grammar checker, correct the following user input.",
)

Expand Down
98 changes: 58 additions & 40 deletions docs/docs/reference/gen_notebooks/audio_with_weave.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Log Audio With Weave
---

## title: Log Audio With Weave



:::tip[This is a notebook]
Expand All @@ -12,35 +12,36 @@ title: Log Audio With Weave
:::



##
<!--- @wandbcode{feedback-colab} -->


# How to use Weave with Audio Data: An OpenAI Example

This demo uses the OpenAI chat completions API with GPT 4o Audio Preview to generate audio responses to text prompts and track these in Weave.


<img src="https://i.imgur.com/OUfsZ2x.png"></img>

For the advanced use case, we leverage the OpenAI Realtime API to stream audio in realtime. Click the following thumbnail to view the video demonstration, or click [here](https://www.youtube.com/watch?v=lnnd73xDElw).

[![Everything Is AWESOME](https://img.youtube.com/vi/lnnd73xDElw/0.jpg)](https://www.youtube.com/watch?v=lnnd73xDElw "Everything Is AWESOME")



## Setup

Start by installing the OpenAI (`openai`) and Weave (`weave`) dependencies, as well as API key management dependencey `set-env`.



```python
%%capture
!pip install openai
!pip install weave
!pip install set-env-colab-kaggle-dotenv -q # for env var
```

Next, load the required API keys for OpenAI and Weave. Here, we use set_env which is compatible with google colab's secret keys manager, and is an alternative to colab's specific `google.colab.userdata`. See: [here](https://pypi.org/project/set-env-colab-kaggle-dotenv/) for usage instructions.
Next, load the required API keys for OpenAI and Weave. Here, we use set_env which is compatible with google colab's secret keys manager, and is an alternative to colab's specific `google.colab.userdata`. See: [here](https://pypi.org/project/set-env-colab-kaggle-dotenv/) for usage instructions.



```python
Expand All @@ -54,6 +55,7 @@ _ = set_env("WANDB_API_KEY")
And finally import the required libraries.



```python
import base64
import os
Expand All @@ -69,9 +71,11 @@ import weave

## Audio Streaming and Storage Example


Now we will setup a call to OpenAI's completions endpoint with audio modality enabled. First create the OpenAI client and initiate a Weave project.



```python
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
weave.init("openai-audio-chat")
Expand All @@ -80,15 +84,18 @@ weave.init("openai-audio-chat")
Now we will define our OpenAI completions request and add our Weave decorator (op).

Here, we define the function `prompt_endpont_and_log_trace`. This function has three primary steps:

1. We make a completion object using the `GPT 4o Audio Preview` model that supports text and audio inputs and outputs.
- We prompt the model to count to 13 slowly with varying accents.
- We set the completion to "stream".

- We prompt the model to count to 13 slowly with varying accents.
- We set the completion to "stream".

2. We open a new output file to which the streamed data is writen chunk by chunk.

3. We return an open file handler to the audio file so Weave logs the audio data in the trace.



```python
SAMPLE_RATE = 22050

Expand Down Expand Up @@ -146,6 +153,7 @@ Run the following cell. The system and user prompt will be stored in a Weave tra
After running the cell, click the link next to the "🍩" emoji to view your trace.



```python
from IPython.display import Audio, display

Expand All @@ -160,22 +168,25 @@ display(Audio("output.wav", rate=SAMPLE_RATE, autoplay=True))
```

# Advanced Usage: Realtime Audio API with Weave

<img src="https://i.imgur.com/ZiW3IVu.png"/>
<details>
<summary> (Advanced) Realtime Audio API with Weave </summary>
OpenAI's realtime API is a highly functional and reliable conversational API for building realtime audio and text assistants.

Please note:

- Review the cells in [Microphone Configuration](#microphone-configuration)
- Due to limitations of the Google Colab execution environment, **this must be run on your host machine** as a Jupyter Notebook. This cannot be ran in the browser.
- On MacOS you will need to install `portaudio` via Brew (see [here](https://formulae.brew.sh/formula/portaudio)) for Pyaudio to function.
- On MacOS you will need to install `portaudio` via Brew (see [here](https://formulae.brew.sh/formula/portaudio)) for Pyaudio to function.
- OpenAI's Python SDK does not yet provide Realtime API support. We implement the complete OAI Realtime API schema in Pydantic for greater legibility, and may deprecate once official support is released.
- The `enable_audio_playback` toggle will cause playback of assistant outputted audio. Please note that **headphones are required if this is enabled**, as echo detection requires a highly complex implementation.


## Requirements Setup



```python
%%capture
!pip install numpy==2.0
Expand All @@ -195,7 +206,7 @@ import os
import threading
import time
import wave
from typing import Dict, List, Optional
from typing import Optional

import numpy as np
import pyaudio
Expand All @@ -219,6 +230,7 @@ _ = set_env("WANDB_API_KEY")
Run the following cell to find all available audio devices. Then, populate the `INPUT_DEVICE_INDEX` and the `OUTPUT_DEVICE_INDEX` based on the devices listed. Your input device will have at least 1 input channels, and your output device will have at least 1 output channels.



```python
# Get device list from pyaudio so we can configure the next cell
p = pyaudio.PyAudio()
Expand Down Expand Up @@ -259,9 +271,10 @@ The OpenAI Python SDK does not yet provide Realtime API support. We implement th
<summary> Pydantic Schema for OpenAI Realtime API (OpenAI's SDK lacks Realtime API support) </summary>



```python
from enum import Enum
from typing import Any, Dict, List, Literal, Optional, Union
from typing import Any, Literal, Optional, Union

from pydantic import BaseModel, Field, ValidationError

Expand Down Expand Up @@ -315,8 +328,8 @@ class ToolParameterProperty(BaseModel):

class ToolParameter(BaseModel):
type: str
properties: Dict[str, ToolParameterProperty]
required: List[str]
properties: dict[str, ToolParameterProperty]
required: list[str]


class Tool(BaseModel):
Expand All @@ -327,14 +340,14 @@ class Tool(BaseModel):


class Session(BaseModel):
modalities: Optional[List[str]] = None
modalities: Optional[list[str]] = None
instructions: Optional[str] = None
voice: Optional[str] = None
input_audio_format: Optional[str] = None
output_audio_format: Optional[str] = None
input_audio_transcription: Optional[InputAudioTranscription] = None
turn_detection: Optional[TurnDetection] = None
tools: Optional[List[Tool]] = None
tools: Optional[list[Tool]] = None
tool_choice: Optional[str] = None
temperature: Optional[float] = None
max_output_tokens: Optional[int] = None
Expand Down Expand Up @@ -393,7 +406,7 @@ class ConversationItem(BaseModel):
type: Literal["message", "function_call", "function_call_output"]
status: Optional[Literal["completed", "in_progress", "incomplete"]] = None
role: Literal["user", "assistant", "system"]
content: List[
content: list[
Union[ConversationItemContent, FunctionCallContent, FunctionCallOutputContent]
]
call_id: Optional[str] = None
Expand Down Expand Up @@ -545,7 +558,7 @@ class ConversationItemInputAudioTranscriptionFailed(BaseEvent):
] = ServerEventTypes.CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_FAILED
item_id: str
content_index: int
error: Dict[str, Any]
error: dict[str, Any]


class ConversationItemTruncated(BaseEvent):
Expand All @@ -569,8 +582,8 @@ class ResponseUsage(BaseModel):
total_tokens: int
input_tokens: int
output_tokens: int
input_token_details: Optional[Dict[str, int]] = None
output_token_details: Optional[Dict[str, int]] = None
input_token_details: Optional[dict[str, int]] = None
output_token_details: Optional[dict[str, int]] = None


class ResponseOutput(BaseModel):
Expand All @@ -579,7 +592,7 @@ class ResponseOutput(BaseModel):
type: str
status: str
role: str
content: List[Dict[str, Any]]
content: list[dict[str, Any]]


class ResponseContentPart(BaseModel):
Expand All @@ -603,15 +616,15 @@ class ResponseOutputItem(BaseModel):
type: str
status: str
role: str
content: List[ResponseOutputItemContent]
content: list[ResponseOutputItemContent]


class Response(BaseModel):
id: str
object: Literal["realtime.response"]
status: str
status_details: Optional[ResponseStatusDetails] = None
output: List[ResponseOutput]
output: list[ResponseOutput]
usage: Optional[ResponseUsage]


Expand Down Expand Up @@ -788,7 +801,7 @@ class RateLimitsUpdated(BaseEvent):
type: Literal[ServerEventTypes.RATE_LIMITS_UPDATED] = (
ServerEventTypes.RATE_LIMITS_UPDATED
)
rate_limits: List[RateLimit]
rate_limits: list[RateLimit]


ServerEvent = Union[
Expand Down Expand Up @@ -866,9 +879,11 @@ def parse_server_event(event_data: dict) -> ServerEvent:

</details>


## Audio Stream Writer (To Disk and In Memory)



```python
class StreamingWavWriter:
"""Writes audio integer or byte array chunks to a WAV file."""
Expand Down Expand Up @@ -915,24 +930,25 @@ class StreamingWavWriter:

The realtime (RT) audio model uses a websocket to send events to OpenAI's Realtime audio API. This works as follows:

1. __init:__ We initialize local buffers (input audio) and streams (assistant playback stream, user audio disk writer stream) and open a connection to the Realtime API.
2. __receive_messages_thread__: A thread handles receiving messages from the API. Four primary event types are handled:
- RESPONSE_AUDIO_TRANSCRIPT_DONE:
1. **init:** We initialize local buffers (input audio) and streams (assistant playback stream, user audio disk writer stream) and open a connection to the Realtime API.
2. **receive_messages_thread**: A thread handles receiving messages from the API. Four primary event types are handled: - RESPONSE_AUDIO_TRANSCRIPT_DONE:

The server indicates the assistant's response is completed and provides the transcript.

- CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED:

The server indicates the assistant's response is completed and provides the transcript.
The server indicates the user's audio has been transcribed, and sends the transcript of the user's audio. We log the transcript to Weave and print it for the user.

- CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED:

The server indicates the user's audio has been transcribed, and sends the transcript of the user's audio. We log the transcript to Weave and print it for the user.
- RESPONSE_AUDIO_DELTA:

- RESPONSE_AUDIO_DELTA:

The server sends a new chunk of assistant response audio. We append this to the ongoing response data via the response ID, and add this to the output stream for playback.
The server sends a new chunk of assistant response audio. We append this to the ongoing response data via the response ID, and add this to the output stream for playback.

- RESPONSE_DONE:

The server indicates completion of an assistant response. We get all audio chunks associated with the response, as well as the transcript, and log these in Weave.

3.**send_audio**: A handler appends user audio chunks to a buffer, and sends chunks of audio when the audio buffer reaches a certain size.

- RESPONSE_DONE:

The server indicates completion of an assistant response. We get all audio chunks associated with the response, as well as the transcript, and log these in Weave.
3.__send_audio__: A handler appends user audio chunks to a buffer, and sends chunks of audio when the audio buffer reaches a certain size.


```python
Expand All @@ -948,7 +964,7 @@ class RTAudioModel(weave.Model):
None # Stream for writing user output to file
)
input_audio_buffer: Optional[np.ndarray] = None # Buffer for user audio chunks
assistant_outputs: Dict[str, StreamingWavWriter] = (
assistant_outputs: dict[str, StreamingWavWriter] = (
None # Assistant outputs aggregated to send to weave
)
playback_stream: Optional[pyaudio.Stream] = (
Expand Down Expand Up @@ -1134,6 +1150,7 @@ class RTAudioModel(weave.Model):
We use a pyaudio input stream with a handler linked to the `send_audio` method of the RTAudio model. The stream is returned to the main thread so it can be safely exited upon program completion.



```python
# Audio capture stream
def record_audio(realtime_model: RTAudioModel) -> pyaudio.Stream:
Expand Down Expand Up @@ -1164,6 +1181,7 @@ def record_audio(realtime_model: RTAudioModel) -> pyaudio.Stream:
The main thread initiates a Realtime Audio Model with Weave integrated. Next, a reccording is opened and we wait for a keyboard interrupt from the user.



```python
weave.init(project_name="realtime-oai-audio-testing")

Expand Down Expand Up @@ -1194,5 +1212,5 @@ else:
)
```


</details>

Loading

0 comments on commit 322d7fb

Please sign in to comment.