Skip to content

Commit

Permalink
docs(integrations): Add cookbook images to intro, summarization, prom…
Browse files Browse the repository at this point in the history
…pt optimization (#2553)

* chain_of_summary pics

* intro notebook images

* dspy images

---------

Co-authored-by: Scott Condron <[email protected]>
  • Loading branch information
ash0ts and scottire authored Oct 3, 2024
1 parent ffc8b82 commit dba4bad
Show file tree
Hide file tree
Showing 26 changed files with 260 additions and 27 deletions.
Binary file added docs/docs/media/dspy_optimization/1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/dspy_optimization/2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/dspy_optimization/3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/dspy_optimization/4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/dspy_optimization/5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/intro/9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/summarization/dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/summarization/eval_dash.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/media/summarization/model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 20 additions & 0 deletions docs/docs/reference/gen_notebooks/01-intro_notebook.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ weave.init('project-name') # initialize tracking for a specific W&B project

Add the @weave.op decorator to the functions you want to track

![](../../media/intro/1.png)


```python
from openai import OpenAI
Expand Down Expand Up @@ -102,6 +104,8 @@ You can find your interactive dashboard by clicking any of the 👆 wandb links

Here, we're automatically tracking all calls to `openai`. We automatically track a lot of LLM libraries, but it's really easy to add support for whatever LLM you're using, as you'll see below.

![](../../media/intro/2.png)


```python
import weave
Expand All @@ -128,6 +132,8 @@ Now that you've seen the basics, let's combine all of the above and track some d



![](../../media/intro/3.png)


```python
from openai import OpenAI
Expand Down Expand Up @@ -169,6 +175,8 @@ print(result)

Whenever your code crashes, weave will highlight what caused the issue. This is especially useful for finding things like JSON parsing issues that can occasionally happen when parsing data from LLM responses.

![](../../media/intro/4.png)


```python
import json
Expand Down Expand Up @@ -221,6 +229,8 @@ Organizing experimentation is difficult when there are many moving pieces. You c

Many times, it is useful to track & version data, just like you track and version code. For example, here we define a `SystemPrompt(weave.Object)` object that can be shared between teammates

![](../../media/intro/5.png)


```python
import weave
Expand All @@ -242,6 +252,8 @@ weave.publish(system_prompt)

Models are so common of an object type, that we have a special class to represent them: `weave.Model`. The only requirement is that we define a `predict` method.

![](../../media/intro/6.png)


```python
from openai import OpenAI
Expand Down Expand Up @@ -283,6 +295,8 @@ print(result)

Similar to models, a `weave.Dataset` object exists to help track, organize, and operate on datasets

![](../../media/intro/7.png)


```python
dataset = weave.Dataset(
Expand All @@ -309,6 +323,8 @@ Notice that we saved a versioned `GrammarCorrector` object that captures the con

You can publish objects and then retrieve them in your code. You can even call functions from your retrieved objects!

![](../../media/intro/8.png)


```python
import weave
Expand All @@ -324,6 +340,8 @@ ref = weave.publish(corrector)
print(ref.uri())
```

![](../../media/intro/9.png)


```python
import weave
Expand All @@ -346,6 +364,8 @@ Evaluation-driven development helps you reliably iterate on an application. The

See a preview of the API below:

![](../../media/intro/10.png)


```python
import weave
Expand Down
12 changes: 11 additions & 1 deletion docs/docs/reference/gen_notebooks/chain_of_density.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ title: Chain of Density Summarization

Summarizing complex technical documents while preserving crucial details is a challenging task. The Chain of Density (CoD) summarization technique offers a solution by iteratively refining summaries to be more concise and information-dense. This guide demonstrates how to implement CoD using Weave for tracking and evaluating the application.

![](../../media/summarization/eval_dash.png)

## What is Chain of Density Summarization?

[![arXiv](https://img.shields.io/badge/arXiv-2309.04269-b31b1b.svg)](https://arxiv.org/abs/2309.04269)
Expand Down Expand Up @@ -139,6 +141,8 @@ def load_pdf(pdf_url: str) -> str:

Now, let's implement the core CoD summarization logic using Weave operations:

![](../../media/summarization/summarization_trace.png)


```python
# Chain of Density Summarization
Expand Down Expand Up @@ -231,6 +235,8 @@ By using `@weave.op()` decorators, we ensure that Weave tracks the inputs, outpu

Now, let's wrap our summarization pipeline in a Weave Model:

![](../../media/summarization/model.png)


```python
# Weave Model
Expand All @@ -240,7 +246,7 @@ class ArxivChainOfDensityPipeline(weave.Model):

@weave.op()
def predict(self, paper: ArxivPaper, instruction: str) -> dict:
text = load_pdf(paper["pdf_url"])
text = load_pdf(paper.pdf_url)
result = chain_of_density_summarization(
text,
instruction,
Expand Down Expand Up @@ -320,6 +326,8 @@ These evaluation functions use the Claude model to assess the quality of the gen

To evaluate our pipeline, we'll create a Weave Dataset and run an evaluation:

![](../../media/summarization/dataset.png)


```python
# Create a Weave Dataset
Expand All @@ -340,6 +348,8 @@ For our evaluation, we'll use an LLM-as-a-judge approach. This technique involve

[![arXiv](https://img.shields.io/badge/arXiv-2306.05685-b31b1b.svg)](https://arxiv.org/abs/2306.05685)

![](../../media/summarization/eval_dash.png)


```python
# Define the scorer function
Expand Down
10 changes: 10 additions & 0 deletions docs/docs/reference/gen_notebooks/dspy_prompt_optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ def get_dataset(metadata: Metadata):
dspy_train_examples, dspy_val_examples = get_dataset(metadata)
```

![](../../media/dspy_optimization/1.png)

## The DSPy Program

[DSPy](https://dspy-docs.vercel.app) is a framework that pushes building new LM pipelines away from manipulating free-form strings and closer to programming (composing modular operators to build text transformation graphs) where a compiler automatically generates optimized LM invocation strategies and prompts from a program.
Expand Down Expand Up @@ -189,6 +191,8 @@ prediction = baseline_module(dspy_train_examples[0]["question"])
rich.print(prediction)
```

![](../../media/dspy_optimization/2.png)

## Evaluating our DSPy Program

Now that we have a baseline prompting strategy, let's evaluate it on our validation set using [`weave.Evaluation`](../../guides/core-types/evaluations.md) on a simple metric that matches the predicted answer with the ground truth. Weave will take each example, pass it through your application and score the output on multiple custom scoring functions. By doing this, you'll have a view of the performance of your application, and a rich UI to drill into individual outputs and scores.
Expand Down Expand Up @@ -219,6 +223,8 @@ evaluation = weave.Evaluation(
await evaluation.evaluate(baseline_module.forward)
```

![](../../media/dspy_optimization/3.png)

:::note
If you're running from a python script, you can use the following code to run the evaluation:

Expand Down Expand Up @@ -258,6 +264,8 @@ def get_optimized_program(model: dspy.Module, metadata: Metadata) -> dspy.Module
optimized_module = get_optimized_program(baseline_module, metadata)
```

![](../../media/dspy_optimization/4.png)

:::warning
Running the evaluation causal reasoning dataset will cost approximately $0.04 in OpenAI credits.
:::
Expand All @@ -275,6 +283,8 @@ evaluation = weave.Evaluation(
await evaluation.evaluate(optimized_module.forward)
```

![](../../media/dspy_optimization/5.png)

When coomparing the evalution of the baseline program with the optimized one shows that the optimized program answers the causal reasoning questions with siginificantly more accuracy.

## Conclusion
Expand Down
93 changes: 68 additions & 25 deletions docs/docs/reference/gen_notebooks/online_monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,19 +62,25 @@ MODEL_NAMES = [
("gpt-4o-mini", 0.03, 0.06),
("gpt-4-turbo", 0.03, 0.06),
("claude-3-haiku-20240307", 0.01, 0.03),
("gpt-4o", 0.03, 0.06)
("gpt-4o", 0.03, 0.06),
]


def init_weave_client(project_name):
try:
client = weave.init(project_name)
for model, prompt_cost, completion_cost in MODEL_NAMES:
client.add_cost(llm_id=model, prompt_token_cost=prompt_cost, completion_token_cost=completion_cost)
client.add_cost(
llm_id=model,
prompt_token_cost=prompt_cost,
completion_token_cost=completion_cost,
)
return client
except Exception as e:
print(f"Failed to initialize Weave client for project '{project_name}': {e}")
return None



client = init_weave_client(PROJECT_NAME)
```

Expand All @@ -96,9 +102,11 @@ The first option to access data from Weave is to retrieve a list of filtered cal

```python
import itertools
import pandas as pd
from datetime import datetime, timedelta

import pandas as pd


def fetch_calls(client, project_id, start_time, trace_roots_only, limit):
filter_params = {
"project_id": project_id,
Expand All @@ -110,13 +118,16 @@ def fetch_calls(client, project_id, start_time, trace_roots_only, limit):
}
try:
calls_stream = client.server.calls_query_stream(filter_params)
calls = list(itertools.islice(calls_stream, limit)) # limit the number of calls to fetch if too many
calls = list(
itertools.islice(calls_stream, limit)
) # limit the number of calls to fetch if too many
print(f"Fetched {len(calls)} calls.")
return calls
except Exception as e:
print(f"Error fetching calls: {e}")
return []



calls = fetch_calls(client, PROJECT_NAME, datetime.now() - timedelta(days=1), True, 100)
```

Expand All @@ -131,28 +142,40 @@ Processing the calls is very easy with the return from Weave - we'll extract the

```python
import json
import pandas as pd
from datetime import datetime

import pandas as pd


def process_calls(calls):
records = []
for call in calls:
feedback = call.summary.get("weave", {}).get("feedback", [])
thumbs_up = sum(1 for item in feedback if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👍")
thumbs_down = sum(1 for item in feedback if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👎")
thumbs_up = sum(
1
for item in feedback
if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👍"
)
thumbs_down = sum(
1
for item in feedback
if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👎"
)
latency = call.summary.get("weave", {}).get("latency_ms", 0)

records.append({
"Call ID": call.id,
"Trace ID": call.trace_id, # this is a unique ID for the trace that can be used to retrieve it
"Display Name": call.display_name, # this is an optional name you can set in the UI or programatically
"Latency (ms)": latency,
"Thumbs Up": thumbs_up,
"Thumbs Down": thumbs_down,
"Started At": pd.to_datetime(getattr(call, 'started_at', datetime.min)),
"Inputs": json.dumps(call.inputs, default=str),
"Outputs": json.dumps(call.output, default=str)
})

records.append(
{
"Call ID": call.id,
"Trace ID": call.trace_id, # this is a unique ID for the trace that can be used to retrieve it
"Display Name": call.display_name, # this is an optional name you can set in the UI or programatically
"Latency (ms)": latency,
"Thumbs Up": thumbs_up,
"Thumbs Down": thumbs_down,
"Started At": pd.to_datetime(getattr(call, "started_at", datetime.min)),
"Inputs": json.dumps(call.inputs, default=str),
"Outputs": json.dumps(call.output, default=str),
}
)
return pd.DataFrame(records)
```

Expand All @@ -171,7 +194,9 @@ For example, for the cost, we'll use the `query_costs` API to fetch the costs of
# Use cost API to get costs
costs = client.query_costs()
df_costs = pd.DataFrame([cost.dict() for cost in costs])
df_costs['total_cost'] = df_costs['prompt_token_cost'] + df_costs['completion_token_cost']
df_costs["total_cost"] = (
df_costs["prompt_token_cost"] + df_costs["completion_token_cost"]
)

# only show the first row for every unqiue llm_id
df_costs
Expand All @@ -185,17 +210,35 @@ Next, we can generate the visualizations using plotly. This is the most basic da
import plotly.express as px
import plotly.graph_objects as go


def plot_feedback_pie_chart(thumbs_up, thumbs_down):
fig = go.Figure(data=[go.Pie(labels=['Thumbs Up', 'Thumbs Down'], values=[thumbs_up, thumbs_down], marker=dict(colors=['#66b3ff', '#ff9999']), hole=.3)])
fig.update_traces(textinfo='percent+label', hoverinfo='label+percent')
fig = go.Figure(
data=[
go.Pie(
labels=["Thumbs Up", "Thumbs Down"],
values=[thumbs_up, thumbs_down],
marker=dict(colors=["#66b3ff", "#ff9999"]),
hole=0.3,
)
]
)
fig.update_traces(textinfo="percent+label", hoverinfo="label+percent")
fig.update_layout(showlegend=False, title="Feedback Summary")
return fig


def plot_model_cost_distribution(df):
fig = px.bar(df, x="llm_id", y="total_cost", color="llm_id", title="Cost Distribution by Model")
fig = px.bar(
df,
x="llm_id",
y="total_cost",
color="llm_id",
title="Cost Distribution by Model",
)
fig.update_layout(xaxis_title="Model", yaxis_title="Cost (USD)")
return fig


# See the source code for all the plots
```

Expand Down
Loading

0 comments on commit dba4bad

Please sign in to comment.