merge

wandb · Aug 21, 2024 · 17cb4cc · 17cb4cc
2 parents 7de57e1 + b75182d
commit 17cb4cc
Show file tree

Hide file tree

Showing 146 changed files with 819 additions and 357 deletions.
diff --git a/docs/docs/guides/tracking/feedback.md b/docs/docs/guides/tracking/feedback.md
@@ -71,6 +71,40 @@ call.feedback.add_note("this is a note")
 call.feedback.add("correctness", { "value": 5 })
 ```
 
+### Retrieving the Call UUID
+
+For scenarios where you need to add feedback immediately after a call, you can retrieve the call UUID programmatically during or after the call execution. Here is how to get the UUID of the call from within the operation:
+
+```python
+
+import weave
+weave.init("uuid")
+
+@weave.op()
+def simple_operation(input_value):
+    # Perform some simple operation
+    output = f"Processed {input_value}"
+    # Get the current call ID
+    current_call = weave.get_current_call()
+    call_id = current_call.id
+    return output, call_id
+```
+
+Additionally, you can use call() method to execute the operation and retrieve the call ID after execution of the function:
+
+```python
+import weave
+weave.init("uuid")
+
+@weave.op()
+def simple_operation(input_value):
+    return f"Processed {input_value}"
+
+# Execute the operation and retrieve the result and call ID
+result, call = simple_operation.call("example input")
+call_id = call.id
+```
+
 ### Querying feedback on a call
 
 ```python

diff --git a/...reference/gen_notebooks/intro_notebook.md → ...erence/gen_notebooks/01-intro_notebook.md b/...reference/gen_notebooks/intro_notebook.md → ...erence/gen_notebooks/01-intro_notebook.md
diff --git a/docs/docs/reference/gen_notebooks/chain_of_density.md b/docs/docs/reference/gen_notebooks/chain_of_density.md
@@ -13,11 +13,12 @@ title: Chain of Density Summarization
 
 
 
-# Summarization using Chain of Density
+<img src="http://wandb.me/logo-im-png" width="400" alt="Weights & Biases" />
+<!--- @wandbcode{cod-notebook} -->
 
-Summarizing complex technical documents while preserving crucial details is a challenging task. The Chain of Density (CoD) summarization technique offers a solution by iteratively refining summaries to be more concise and information-dense. This guide demonstrates how to implement CoD using Weave, a powerful framework for building, tracking, and evaluating LLM applications. By combining CoD's effectiveness with Weave's robust tooling, you'll learn to create a summarization pipeline that produces high-quality, entity-rich summaries of technical content while gaining insights into the summarization process.
+# Summarization using Chain of Density
 
-![Final Evaluation](./media/chain_of_density/eval_comparison.gif)
+Summarizing complex technical documents while preserving crucial details is a challenging task. The Chain of Density (CoD) summarization technique offers a solution by iteratively refining summaries to be more concise and information-dense. This guide demonstrates how to implement CoD using Weave for tracking and evaluating the application. 
 
 ## What is Chain of Density Summarization?
 
@@ -59,24 +60,26 @@ First, let's set up our environment and import the necessary libraries:
 
 
 ```python
+import io
 import os
-import anthropic
-import weave
 from datetime import datetime, timezone
-from pydantic import BaseModel
+
+import anthropic
 import requests
-import io
+from pydantic import BaseModel
 from PyPDF2 import PdfReader
 from set_env import set_env
 
+import weave
+
 set_env("WANDB_API_KEY")
 set_env("ANTHROPIC_API_KEY")
 
 weave.init("summarization-chain-of-density-cookbook")
 anthropic_client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
 ```
 
-We're using Weave to track our experiment and Anthropic's Claude model for text generation. The `weave.init()` call sets up a new Weave project for our summarization task.
+We're using Weave to track our experiment and Anthropic's Claude model for text generation. The `weave.init(<project name>)` call sets up a new Weave project for our summarization task.
 
 ## Define the ArxivPaper model
 
@@ -94,6 +97,7 @@ class ArxivPaper(BaseModel):
     summary: str
     pdf_url: str
 
+
 # Create sample ArxivPaper
 arxiv_paper = ArxivPaper(
     entry_id="http://arxiv.org/abs/2406.04744v1",
@@ -102,14 +106,12 @@ arxiv_paper = ArxivPaper(
     title="CRAG -- Comprehensive RAG Benchmark",
     authors=["Xiao Yang", "Kai Sun", "Hao Xin"],  # Truncated for brevity
     summary="Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution...",  # Truncated
-    pdf_url="https://arxiv.org/pdf/2406.04744"
+    pdf_url="https://arxiv.org/pdf/2406.04744",
 )
 ```
 
 This class encapsulates the metadata and content of an ArXiv paper, which will be the input to our summarization pipeline.
 
-![Arxiv Paper](./media/chain_of_density/arxiv_paper.gif)
-
 ## Load PDF content
 
 To work with the full paper content, we'll add a function to load and extract text from PDFs:
@@ -121,15 +123,15 @@ def load_pdf(pdf_url: str) -> str:
     # Download the PDF
     response = requests.get(pdf_url)
     pdf_file = io.BytesIO(response.content)
-    
+
     # Read the PDF
     pdf_reader = PdfReader(pdf_file)
-    
+
     # Extract text from all pages
     text = ""
     for page in pdf_reader.pages:
         text += page.extract_text()
-    
+
     return text
 ```
 
@@ -141,7 +143,13 @@ Now, let's implement the core CoD summarization logic using Weave operations:
 ```python
 # Chain of Density Summarization
 @weave.op()
-def summarize_current_summary(document: str, instruction: str, current_summary: str = "", iteration: int = 1, model: str = "claude-3-sonnet-20240229"):
+def summarize_current_summary(
+    document: str,
+    instruction: str,
+    current_summary: str = "",
+    iteration: int = 1,
+    model: str = "claude-3-sonnet-20240229",
+):
     prompt = f"""
     Document: {document}
     Current summary: {current_summary}
@@ -151,36 +159,57 @@ def summarize_current_summary(document: str, instruction: str, current_summary:
     Generate an increasingly concise, entity-dense, and highly technical summary from the provided document that specifically addresses the given instruction.
     """
     response = anthropic_client.messages.create(
-        model=model,
-        max_tokens=4096,
-        messages=[{"role": "user", "content": prompt}]
+        model=model, max_tokens=4096, messages=[{"role": "user", "content": prompt}]
     )
     return response.content[0].text
 
+
 @weave.op()
-def iterative_density_summarization(document: str, instruction: str, current_summary: str, density_iterations: int, model: str = "claude-3-sonnet-20240229"):
+def iterative_density_summarization(
+    document: str,
+    instruction: str,
+    current_summary: str,
+    density_iterations: int,
+    model: str = "claude-3-sonnet-20240229",
+):
     iteration_summaries = []
     for iteration in range(1, density_iterations + 1):
-        current_summary = summarize_current_summary(document, instruction, current_summary, iteration, model)
+        current_summary = summarize_current_summary(
+            document, instruction, current_summary, iteration, model
+        )
         iteration_summaries.append(current_summary)
     return current_summary, iteration_summaries
 
+
 @weave.op()
-def final_summary(instruction: str, current_summary: str, model: str = "claude-3-sonnet-20240229"):
+def final_summary(
+    instruction: str, current_summary: str, model: str = "claude-3-sonnet-20240229"
+):
     prompt = f"""
     Given this summary: {current_summary}
     And this instruction to focus on: {instruction}
     Create an extremely dense, final summary that captures all key technical information in the most concise form possible, while specifically addressing the given instruction.
     """
-    return anthropic_client.messages.create(
-        model=model,
-        max_tokens=4096,
-        messages=[{"role": "user", "content": prompt}]
-    ).content[0].text
+    return (
+        anthropic_client.messages.create(
+            model=model, max_tokens=4096, messages=[{"role": "user", "content": prompt}]
+        )
+        .content[0]
+        .text
+    )
+
 
 @weave.op()
-def chain_of_density_summarization(document: str, instruction: str, current_summary: str = "", model: str = "claude-3-sonnet-20240229", density_iterations: int = 2):
-    current_summary, iteration_summaries = iterative_density_summarization(document, instruction, current_summary, density_iterations, model)
+def chain_of_density_summarization(
+    document: str,
+    instruction: str,
+    current_summary: str = "",
+    model: str = "claude-3-sonnet-20240229",
+    density_iterations: int = 2,
+):
+    current_summary, iteration_summaries = iterative_density_summarization(
+        document, instruction, current_summary, density_iterations, model
+    )
     final_summary_text = final_summary(instruction, current_summary, model)
     return {
         "final_summary": final_summary_text,
@@ -197,7 +226,6 @@ Here's what each function does:
 
 By using `@weave.op()` decorators, we ensure that Weave tracks the inputs, outputs, and execution of these functions.
 
-![Chain of Density](./media/chain_of_density/chain_of_density.gif)
 
 ## Create a Weave Model
 
@@ -213,9 +241,13 @@ class ArxivChainOfDensityPipeline(weave.Model):
     @weave.op()
     def predict(self, paper: ArxivPaper, instruction: str) -> dict:
         text = load_pdf(paper["pdf_url"])
-        result = chain_of_density_summarization(text, instruction, model=self.model, density_iterations=self.density_iterations)
+        result = chain_of_density_summarization(
+            text,
+            instruction,
+            model=self.model,
+            density_iterations=self.density_iterations,
+        )
         return result
-
 ```
 
 This `ArxivChainOfDensityPipeline` class encapsulates our summarization logic as a Weave Model, providing several key benefits:
@@ -226,8 +258,6 @@ This `ArxivChainOfDensityPipeline` class encapsulates our summarization logic as
 4. Hyperparameter management: Model attributes (like `model` and `density_iterations`) are clearly defined and tracked across different runs, facilitating experimentation.
 5. Integration with Weave ecosystem: Using `weave.Model` allows seamless integration with other Weave tools, such as evaluations and serving capabilities.
 
-![Arxiv Chain of Density Pipeline](./media/chain_of_density/model.gif)
-
 ## Implement evaluation metrics
 
 To assess the quality of our summaries, we'll implement simple evaluation metrics:
@@ -236,8 +266,11 @@ To assess the quality of our summaries, we'll implement simple evaluation metric
 ```python
 import json
 
+
 @weave.op()
-def evaluate_summary(summary: str, instruction: str, model: str = "claude-3-sonnet-20240229") -> dict:
+def evaluate_summary(
+    summary: str, instruction: str, model: str = "claude-3-sonnet-20240229"
+) -> dict:
     prompt = f"""
     Summary: {summary}
     Instruction: {instruction}
@@ -266,27 +299,23 @@ def evaluate_summary(summary: str, instruction: str, model: str = "claude-3-sonn
     Ensure that the scores are integers between 1 and 5, and that the explanations are concise.
     """
     response = anthropic_client.messages.create(
-        model=model,
-        max_tokens=1000,
-        messages=[{"role": "user", "content": prompt}]
+        model=model, max_tokens=1000, messages=[{"role": "user", "content": prompt}]
     )
     print(response.content[0].text)
-    
+
     eval_dict = json.loads(response.content[0].text)
-    
+
     return {
-        "relevance": eval_dict['relevance']['score'],
-        "conciseness": eval_dict['conciseness']['score'],
-        "technical_accuracy": eval_dict['technical_accuracy']['score'],
-        "average_score": sum(eval_dict[k]['score'] for k in eval_dict) / 3,
-        "evaluation_text": response.content[0].text
+        "relevance": eval_dict["relevance"]["score"],
+        "conciseness": eval_dict["conciseness"]["score"],
+        "technical_accuracy": eval_dict["technical_accuracy"]["score"],
+        "average_score": sum(eval_dict[k]["score"] for k in eval_dict) / 3,
+        "evaluation_text": response.content[0].text,
     }
 ```
 
 These evaluation functions use the Claude model to assess the quality of the generated summaries based on relevance, conciseness, and technical accuracy.
 
-![Evaluation](./media/chain_of_density/evals_main_screen.gif)
-
 ## Create a Weave Dataset and run evaluation
 
 To evaluate our pipeline, we'll create a Weave Dataset and run an evaluation:
@@ -299,16 +328,14 @@ dataset = weave.Dataset(
     rows=[
         {
             "paper": arxiv_paper,
-            "instruction": "What was the approach to experimenting with different data mixtures?"
+            "instruction": "What was the approach to experimenting with different data mixtures?",
         },
-    ]
+    ],
 )
 
 weave.publish(dataset)
 ```
 
-![Dataset](./media/chain_of_density/eval_dataset.gif)
-
 For our evaluation, we'll use an LLM-as-a-judge approach. This technique involves using a language model to assess the quality of outputs generated by another model or system. It leverages the LLM's understanding and reasoning capabilities to provide nuanced evaluations, especially for tasks where traditional metrics may fall short.
 
 [![arXiv](https://img.shields.io/badge/arXiv-2306.05685-b31b1b.svg)](https://arxiv.org/abs/2306.05685)
@@ -330,8 +357,6 @@ arxiv_chain_of_density_pipeline = ArxivChainOfDensityPipeline()
 results = await evaluation.evaluate(arxiv_chain_of_density_pipeline)
 ```
 
-![Final Evaluation](./media/chain_of_density/eval_comparison.gif)
-
 This code creates a dataset with our sample ArXiv paper, defines a quality scorer, and runs an evaluation of our summarization pipeline.
 
 ## Conclusion
@@ -344,8 +369,7 @@ In this example, we've demonstrated how to implement a Chain of Density summariz
 4. Create a dataset and run an evaluation of the pipeline
 
 Weave's seamless integration allows us to track inputs, outputs, and intermediate steps throughout the summarization process, making it easier to debug, optimize, and evaluate our LLM application.
-
-For more information on Weave and its capabilities, check out the [Weave documentation](https://docs.wandb.ai/weave). You can extend this example to handle larger datasets, implement more sophisticated evaluation metrics, or integrate with other LLM workflows.
+You can extend this example to handle larger datasets, implement more sophisticated evaluation metrics, or integrate with other LLM workflows.
 
 <a 
   href="https://wandb.ai/wandb_fc/arxiv-reader/reports/Building-a-bot-to-summarize-arXiv-papers-as-PDFs-using-Anthrophic-and-W-B-Weave--Vmlldzo4Nzg0ODI4"