docs(integrations): Add cookbook images to intro, summarization, prom…

…pt optimization (#2553) * chain_of_summary pics * intro notebook images * dspy images --------- Co-authored-by: Scott Condron <[email protected]>
wandb · Oct 3, 2024 · dba4bad · dba4bad
1 parent ffc8b82
commit dba4bad
Show file tree

Hide file tree

Showing 26 changed files with 260 additions and 27 deletions.
diff --git a/docs/docs/media/dspy_optimization/1.png b/docs/docs/media/dspy_optimization/1.png
diff --git a/docs/docs/media/dspy_optimization/2.png b/docs/docs/media/dspy_optimization/2.png
diff --git a/docs/docs/media/dspy_optimization/3.png b/docs/docs/media/dspy_optimization/3.png
diff --git a/docs/docs/media/dspy_optimization/4.png b/docs/docs/media/dspy_optimization/4.png
diff --git a/docs/docs/media/dspy_optimization/5.png b/docs/docs/media/dspy_optimization/5.png
diff --git a/docs/docs/media/intro/1.png b/docs/docs/media/intro/1.png
diff --git a/docs/docs/media/intro/10.png b/docs/docs/media/intro/10.png
diff --git a/docs/docs/media/intro/2.png b/docs/docs/media/intro/2.png
diff --git a/docs/docs/media/intro/3.png b/docs/docs/media/intro/3.png
diff --git a/docs/docs/media/intro/4.png b/docs/docs/media/intro/4.png
diff --git a/docs/docs/media/intro/5.png b/docs/docs/media/intro/5.png
diff --git a/docs/docs/media/intro/6.png b/docs/docs/media/intro/6.png
diff --git a/docs/docs/media/intro/7.png b/docs/docs/media/intro/7.png
diff --git a/docs/docs/media/intro/8.png b/docs/docs/media/intro/8.png
diff --git a/docs/docs/media/intro/9.png b/docs/docs/media/intro/9.png
diff --git a/docs/docs/media/summarization/dataset.png b/docs/docs/media/summarization/dataset.png
diff --git a/docs/docs/media/summarization/eval_dash.png b/docs/docs/media/summarization/eval_dash.png
diff --git a/docs/docs/media/summarization/model.png b/docs/docs/media/summarization/model.png
diff --git a/docs/docs/media/summarization/summarization_trace.png b/docs/docs/media/summarization/summarization_trace.png
diff --git a/docs/docs/reference/gen_notebooks/01-intro_notebook.md b/docs/docs/reference/gen_notebooks/01-intro_notebook.md
@@ -72,6 +72,8 @@ weave.init('project-name')      # initialize tracking for a specific W&B project
 
 Add the @weave.op decorator to the functions you want to track
 
+![](../../media/intro/1.png)
+
 
 ```python
 from openai import OpenAI
@@ -102,6 +104,8 @@ You can find your interactive dashboard by clicking any of the  👆 wandb links
 
 Here, we're automatically tracking all calls to `openai`. We automatically track a lot of LLM libraries, but it's really easy to add support for whatever LLM you're using, as you'll see below. 
 
+![](../../media/intro/2.png)
+
 
 ```python
 import weave
@@ -128,6 +132,8 @@ Now that you've seen the basics, let's combine all of the above and track some d
 
 
 
+![](../../media/intro/3.png)
+
 
 ```python
 from openai import OpenAI
@@ -169,6 +175,8 @@ print(result)
 
 Whenever your code crashes, weave will highlight what caused the issue. This is especially useful for finding things like JSON parsing issues that can occasionally happen when parsing data from LLM responses.
 
+![](../../media/intro/4.png)
+
 
 ```python
 import json
@@ -221,6 +229,8 @@ Organizing experimentation is difficult when there are many moving pieces. You c
 
 Many times, it is useful to track & version data, just like you track and version code. For example, here we define a `SystemPrompt(weave.Object)` object that can be shared between teammates
 
+![](../../media/intro/5.png)
+
 
 ```python
 import weave
@@ -242,6 +252,8 @@ weave.publish(system_prompt)
 
 Models are so common of an object type, that we have a special class to represent them: `weave.Model`. The only requirement is that we define a `predict` method.
 
+![](../../media/intro/6.png)
+
 
 ```python
 from openai import OpenAI
@@ -283,6 +295,8 @@ print(result)
 
 Similar to models, a `weave.Dataset` object exists to help track, organize, and operate on datasets
 
+![](../../media/intro/7.png)
+
 
 ```python
 dataset = weave.Dataset(
@@ -309,6 +323,8 @@ Notice that we saved a versioned `GrammarCorrector` object that captures the con
 
 You can publish objects and then retrieve them in your code. You can even call functions from your retrieved objects!
 
+![](../../media/intro/8.png)
+
 
 ```python
 import weave
@@ -324,6 +340,8 @@ ref = weave.publish(corrector)
 print(ref.uri())
 ```
 
+![](../../media/intro/9.png)
+
 
 ```python
 import weave
@@ -346,6 +364,8 @@ Evaluation-driven development helps you reliably iterate on an application. The
 
 See a preview of the API below:
 
+![](../../media/intro/10.png)
+
 
 ```python
 import weave

diff --git a/docs/docs/reference/gen_notebooks/chain_of_density.md b/docs/docs/reference/gen_notebooks/chain_of_density.md
@@ -20,6 +20,8 @@ title: Chain of Density Summarization
 
 Summarizing complex technical documents while preserving crucial details is a challenging task. The Chain of Density (CoD) summarization technique offers a solution by iteratively refining summaries to be more concise and information-dense. This guide demonstrates how to implement CoD using Weave for tracking and evaluating the application. 
 
+![](../../media/summarization/eval_dash.png)
+
 ## What is Chain of Density Summarization?
 
 [![arXiv](https://img.shields.io/badge/arXiv-2309.04269-b31b1b.svg)](https://arxiv.org/abs/2309.04269)
@@ -139,6 +141,8 @@ def load_pdf(pdf_url: str) -> str:
 
 Now, let's implement the core CoD summarization logic using Weave operations:
 
+![](../../media/summarization/summarization_trace.png)
+
 
 ```python
 # Chain of Density Summarization
@@ -231,6 +235,8 @@ By using `@weave.op()` decorators, we ensure that Weave tracks the inputs, outpu
 
 Now, let's wrap our summarization pipeline in a Weave Model:
 
+![](../../media/summarization/model.png)
+
 
 ```python
 # Weave Model
@@ -240,7 +246,7 @@ class ArxivChainOfDensityPipeline(weave.Model):
 
     @weave.op()
     def predict(self, paper: ArxivPaper, instruction: str) -> dict:
-        text = load_pdf(paper["pdf_url"])
+        text = load_pdf(paper.pdf_url)
         result = chain_of_density_summarization(
             text,
             instruction,
@@ -320,6 +326,8 @@ These evaluation functions use the Claude model to assess the quality of the gen
 
 To evaluate our pipeline, we'll create a Weave Dataset and run an evaluation:
 
+![](../../media/summarization/dataset.png)
+
 
 ```python
 # Create a Weave Dataset
@@ -340,6 +348,8 @@ For our evaluation, we'll use an LLM-as-a-judge approach. This technique involve
 
 [![arXiv](https://img.shields.io/badge/arXiv-2306.05685-b31b1b.svg)](https://arxiv.org/abs/2306.05685)
 
+![](../../media/summarization/eval_dash.png)
+
 
 ```python
 # Define the scorer function

diff --git a/docs/docs/reference/gen_notebooks/dspy_prompt_optimization.md b/docs/docs/reference/gen_notebooks/dspy_prompt_optimization.md
@@ -125,6 +125,8 @@ def get_dataset(metadata: Metadata):
 dspy_train_examples, dspy_val_examples = get_dataset(metadata)
 ```
 
+![](../../media/dspy_optimization/1.png)
+
 ## The DSPy Program
 
 [DSPy](https://dspy-docs.vercel.app) is a framework that pushes building new LM pipelines away from manipulating free-form strings and closer to programming (composing modular operators to build text transformation graphs) where a compiler automatically generates optimized LM invocation strategies and prompts from a program.
@@ -189,6 +191,8 @@ prediction = baseline_module(dspy_train_examples[0]["question"])
 rich.print(prediction)
 ```
 
+![](../../media/dspy_optimization/2.png)
+
 ## Evaluating our DSPy Program
 
 Now that we have a baseline prompting strategy, let's evaluate it on our validation set using [`weave.Evaluation`](../../guides/core-types/evaluations.md) on a simple metric that matches the predicted answer with the ground truth. Weave will take each example, pass it through your application and score the output on multiple custom scoring functions. By doing this, you'll have a view of the performance of your application, and a rich UI to drill into individual outputs and scores.
@@ -219,6 +223,8 @@ evaluation = weave.Evaluation(
 await evaluation.evaluate(baseline_module.forward)
 ```
 
+![](../../media/dspy_optimization/3.png)
+
 :::note
 If you're running from a python script, you can use the following code to run the evaluation:
 
@@ -258,6 +264,8 @@ def get_optimized_program(model: dspy.Module, metadata: Metadata) -> dspy.Module
 optimized_module = get_optimized_program(baseline_module, metadata)
 ```
 
+![](../../media/dspy_optimization/4.png)
+
 :::warning
 Running the evaluation causal reasoning dataset will cost approximately $0.04 in OpenAI credits.
 :::
@@ -275,6 +283,8 @@ evaluation = weave.Evaluation(
 await evaluation.evaluate(optimized_module.forward)
 ```
 
+![](../../media/dspy_optimization/5.png)
+
 When coomparing the evalution of the baseline program with the optimized one shows that the optimized program answers the causal reasoning questions with siginificantly more accuracy.
 
 ## Conclusion

diff --git a/docs/docs/reference/gen_notebooks/online_monitoring.md b/docs/docs/reference/gen_notebooks/online_monitoring.md
@@ -62,19 +62,25 @@ MODEL_NAMES = [
     ("gpt-4o-mini", 0.03, 0.06),
     ("gpt-4-turbo", 0.03, 0.06),
     ("claude-3-haiku-20240307", 0.01, 0.03),
-    ("gpt-4o", 0.03, 0.06)
+    ("gpt-4o", 0.03, 0.06),
 ]
 
+
 def init_weave_client(project_name):
     try:
         client = weave.init(project_name)
         for model, prompt_cost, completion_cost in MODEL_NAMES:
-            client.add_cost(llm_id=model, prompt_token_cost=prompt_cost, completion_token_cost=completion_cost)
+            client.add_cost(
+                llm_id=model,
+                prompt_token_cost=prompt_cost,
+                completion_token_cost=completion_cost,
+            )
         return client
     except Exception as e:
         print(f"Failed to initialize Weave client for project '{project_name}': {e}")
         return None
-
+
+
 client = init_weave_client(PROJECT_NAME)
 ```
 
@@ -96,9 +102,11 @@ The first option to access data from Weave is to retrieve a list of filtered cal
 
 ```python
 import itertools
-import pandas as pd
 from datetime import datetime, timedelta
 
+import pandas as pd
+
+
 def fetch_calls(client, project_id, start_time, trace_roots_only, limit):
     filter_params = {
         "project_id": project_id,
@@ -110,13 +118,16 @@ def fetch_calls(client, project_id, start_time, trace_roots_only, limit):
     }
     try:
         calls_stream = client.server.calls_query_stream(filter_params)
-        calls = list(itertools.islice(calls_stream, limit)) # limit the number of calls to fetch if too many
+        calls = list(
+            itertools.islice(calls_stream, limit)
+        )  # limit the number of calls to fetch if too many
         print(f"Fetched {len(calls)} calls.")
         return calls
     except Exception as e:
         print(f"Error fetching calls: {e}")
         return []
-
+
+
 calls = fetch_calls(client, PROJECT_NAME, datetime.now() - timedelta(days=1), True, 100)
 ```
 
@@ -131,28 +142,40 @@ Processing the calls is very easy with the return from Weave - we'll extract the
 
 ```python
 import json
-import pandas as pd
 from datetime import datetime
 
+import pandas as pd
+
+
 def process_calls(calls):
     records = []
     for call in calls:
         feedback = call.summary.get("weave", {}).get("feedback", [])
-        thumbs_up = sum(1 for item in feedback if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👍")
-        thumbs_down = sum(1 for item in feedback if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👎")
+        thumbs_up = sum(
+            1
+            for item in feedback
+            if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👍"
+        )
+        thumbs_down = sum(
+            1
+            for item in feedback
+            if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👎"
+        )
         latency = call.summary.get("weave", {}).get("latency_ms", 0)
-
-        records.append({
-            "Call ID": call.id,
-            "Trace ID": call.trace_id,         # this is a unique ID for the trace that can be used to retrieve it
-            "Display Name": call.display_name, # this is an optional name you can set in the UI or programatically
-            "Latency (ms)": latency,
-            "Thumbs Up": thumbs_up,
-            "Thumbs Down": thumbs_down,
-            "Started At": pd.to_datetime(getattr(call, 'started_at', datetime.min)),
-            "Inputs": json.dumps(call.inputs, default=str),
-            "Outputs": json.dumps(call.output, default=str)
-        })
+
+        records.append(
+            {
+                "Call ID": call.id,
+                "Trace ID": call.trace_id,  # this is a unique ID for the trace that can be used to retrieve it
+                "Display Name": call.display_name,  # this is an optional name you can set in the UI or programatically
+                "Latency (ms)": latency,
+                "Thumbs Up": thumbs_up,
+                "Thumbs Down": thumbs_down,
+                "Started At": pd.to_datetime(getattr(call, "started_at", datetime.min)),
+                "Inputs": json.dumps(call.inputs, default=str),
+                "Outputs": json.dumps(call.output, default=str),
+            }
+        )
     return pd.DataFrame(records)
 ```
 
@@ -171,7 +194,9 @@ For example, for the cost, we'll use the `query_costs` API to fetch the costs of
 # Use cost API to get costs
 costs = client.query_costs()
 df_costs = pd.DataFrame([cost.dict() for cost in costs])
-df_costs['total_cost'] = df_costs['prompt_token_cost'] + df_costs['completion_token_cost']
+df_costs["total_cost"] = (
+    df_costs["prompt_token_cost"] + df_costs["completion_token_cost"]
+)
 
 # only show the first row for every unqiue llm_id
 df_costs
@@ -185,17 +210,35 @@ Next, we can generate the visualizations using plotly. This is the most basic da
 import plotly.express as px
 import plotly.graph_objects as go
 
+
 def plot_feedback_pie_chart(thumbs_up, thumbs_down):
-    fig = go.Figure(data=[go.Pie(labels=['Thumbs Up', 'Thumbs Down'], values=[thumbs_up, thumbs_down], marker=dict(colors=['#66b3ff', '#ff9999']), hole=.3)])
-    fig.update_traces(textinfo='percent+label', hoverinfo='label+percent')
+    fig = go.Figure(
+        data=[
+            go.Pie(
+                labels=["Thumbs Up", "Thumbs Down"],
+                values=[thumbs_up, thumbs_down],
+                marker=dict(colors=["#66b3ff", "#ff9999"]),
+                hole=0.3,
+            )
+        ]
+    )
+    fig.update_traces(textinfo="percent+label", hoverinfo="label+percent")
     fig.update_layout(showlegend=False, title="Feedback Summary")
     return fig
 
+
 def plot_model_cost_distribution(df):
-    fig = px.bar(df, x="llm_id", y="total_cost", color="llm_id", title="Cost Distribution by Model")
+    fig = px.bar(
+        df,
+        x="llm_id",
+        y="total_cost",
+        color="llm_id",
+        title="Cost Distribution by Model",
+    )
     fig.update_layout(xaxis_title="Model", yaxis_title="Cost (USD)")
     return fig
 
+
 # See the source code for all the plots
 ```