Skip to content

Commit

Permalink
merge
Browse files Browse the repository at this point in the history
  • Loading branch information
gtarpenning committed Aug 21, 2024
2 parents 7de57e1 + b75182d commit 17cb4cc
Show file tree
Hide file tree
Showing 146 changed files with 819 additions and 357 deletions.
34 changes: 34 additions & 0 deletions docs/docs/guides/tracking/feedback.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,40 @@ call.feedback.add_note("this is a note")
call.feedback.add("correctness", { "value": 5 })
```

### Retrieving the Call UUID

For scenarios where you need to add feedback immediately after a call, you can retrieve the call UUID programmatically during or after the call execution. Here is how to get the UUID of the call from within the operation:

```python

import weave
weave.init("uuid")

@weave.op()
def simple_operation(input_value):
# Perform some simple operation
output = f"Processed {input_value}"
# Get the current call ID
current_call = weave.get_current_call()
call_id = current_call.id
return output, call_id
```

Additionally, you can use call() method to execute the operation and retrieve the call ID after execution of the function:

```python
import weave
weave.init("uuid")

@weave.op()
def simple_operation(input_value):
return f"Processed {input_value}"

# Execute the operation and retrieve the result and call ID
result, call = simple_operation.call("example input")
call_id = call.id
```

### Querying feedback on a call

```python
Expand Down
132 changes: 78 additions & 54 deletions docs/docs/reference/gen_notebooks/chain_of_density.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ title: Chain of Density Summarization



# Summarization using Chain of Density
<img src="http://wandb.me/logo-im-png" width="400" alt="Weights & Biases" />
<!--- @wandbcode{cod-notebook} -->

Summarizing complex technical documents while preserving crucial details is a challenging task. The Chain of Density (CoD) summarization technique offers a solution by iteratively refining summaries to be more concise and information-dense. This guide demonstrates how to implement CoD using Weave, a powerful framework for building, tracking, and evaluating LLM applications. By combining CoD's effectiveness with Weave's robust tooling, you'll learn to create a summarization pipeline that produces high-quality, entity-rich summaries of technical content while gaining insights into the summarization process.
# Summarization using Chain of Density

![Final Evaluation](./media/chain_of_density/eval_comparison.gif)
Summarizing complex technical documents while preserving crucial details is a challenging task. The Chain of Density (CoD) summarization technique offers a solution by iteratively refining summaries to be more concise and information-dense. This guide demonstrates how to implement CoD using Weave for tracking and evaluating the application.

## What is Chain of Density Summarization?

Expand Down Expand Up @@ -59,24 +60,26 @@ First, let's set up our environment and import the necessary libraries:

```python
import io
import os
import anthropic
import weave
from datetime import datetime, timezone
from pydantic import BaseModel

import anthropic
import requests
import io
from pydantic import BaseModel
from PyPDF2 import PdfReader
from set_env import set_env

import weave

set_env("WANDB_API_KEY")
set_env("ANTHROPIC_API_KEY")

weave.init("summarization-chain-of-density-cookbook")
anthropic_client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
```

We're using Weave to track our experiment and Anthropic's Claude model for text generation. The `weave.init()` call sets up a new Weave project for our summarization task.
We're using Weave to track our experiment and Anthropic's Claude model for text generation. The `weave.init(<project name>)` call sets up a new Weave project for our summarization task.

## Define the ArxivPaper model

Expand All @@ -94,6 +97,7 @@ class ArxivPaper(BaseModel):
summary: str
pdf_url: str


# Create sample ArxivPaper
arxiv_paper = ArxivPaper(
entry_id="http://arxiv.org/abs/2406.04744v1",
Expand All @@ -102,14 +106,12 @@ arxiv_paper = ArxivPaper(
title="CRAG -- Comprehensive RAG Benchmark",
authors=["Xiao Yang", "Kai Sun", "Hao Xin"], # Truncated for brevity
summary="Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution...", # Truncated
pdf_url="https://arxiv.org/pdf/2406.04744"
pdf_url="https://arxiv.org/pdf/2406.04744",
)
```

This class encapsulates the metadata and content of an ArXiv paper, which will be the input to our summarization pipeline.

![Arxiv Paper](./media/chain_of_density/arxiv_paper.gif)

## Load PDF content

To work with the full paper content, we'll add a function to load and extract text from PDFs:
Expand All @@ -121,15 +123,15 @@ def load_pdf(pdf_url: str) -> str:
# Download the PDF
response = requests.get(pdf_url)
pdf_file = io.BytesIO(response.content)

# Read the PDF
pdf_reader = PdfReader(pdf_file)

# Extract text from all pages
text = ""
for page in pdf_reader.pages:
text += page.extract_text()

return text
```

Expand All @@ -141,7 +143,13 @@ Now, let's implement the core CoD summarization logic using Weave operations:
```python
# Chain of Density Summarization
@weave.op()
def summarize_current_summary(document: str, instruction: str, current_summary: str = "", iteration: int = 1, model: str = "claude-3-sonnet-20240229"):
def summarize_current_summary(
document: str,
instruction: str,
current_summary: str = "",
iteration: int = 1,
model: str = "claude-3-sonnet-20240229",
):
prompt = f"""
Document: {document}
Current summary: {current_summary}
Expand All @@ -151,36 +159,57 @@ def summarize_current_summary(document: str, instruction: str, current_summary:
Generate an increasingly concise, entity-dense, and highly technical summary from the provided document that specifically addresses the given instruction.
"""
response = anthropic_client.messages.create(
model=model,
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
model=model, max_tokens=4096, messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text


@weave.op()
def iterative_density_summarization(document: str, instruction: str, current_summary: str, density_iterations: int, model: str = "claude-3-sonnet-20240229"):
def iterative_density_summarization(
document: str,
instruction: str,
current_summary: str,
density_iterations: int,
model: str = "claude-3-sonnet-20240229",
):
iteration_summaries = []
for iteration in range(1, density_iterations + 1):
current_summary = summarize_current_summary(document, instruction, current_summary, iteration, model)
current_summary = summarize_current_summary(
document, instruction, current_summary, iteration, model
)
iteration_summaries.append(current_summary)
return current_summary, iteration_summaries


@weave.op()
def final_summary(instruction: str, current_summary: str, model: str = "claude-3-sonnet-20240229"):
def final_summary(
instruction: str, current_summary: str, model: str = "claude-3-sonnet-20240229"
):
prompt = f"""
Given this summary: {current_summary}
And this instruction to focus on: {instruction}
Create an extremely dense, final summary that captures all key technical information in the most concise form possible, while specifically addressing the given instruction.
"""
return anthropic_client.messages.create(
model=model,
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
).content[0].text
return (
anthropic_client.messages.create(
model=model, max_tokens=4096, messages=[{"role": "user", "content": prompt}]
)
.content[0]
.text
)


@weave.op()
def chain_of_density_summarization(document: str, instruction: str, current_summary: str = "", model: str = "claude-3-sonnet-20240229", density_iterations: int = 2):
current_summary, iteration_summaries = iterative_density_summarization(document, instruction, current_summary, density_iterations, model)
def chain_of_density_summarization(
document: str,
instruction: str,
current_summary: str = "",
model: str = "claude-3-sonnet-20240229",
density_iterations: int = 2,
):
current_summary, iteration_summaries = iterative_density_summarization(
document, instruction, current_summary, density_iterations, model
)
final_summary_text = final_summary(instruction, current_summary, model)
return {
"final_summary": final_summary_text,
Expand All @@ -197,7 +226,6 @@ Here's what each function does:

By using `@weave.op()` decorators, we ensure that Weave tracks the inputs, outputs, and execution of these functions.

![Chain of Density](./media/chain_of_density/chain_of_density.gif)

## Create a Weave Model

Expand All @@ -213,9 +241,13 @@ class ArxivChainOfDensityPipeline(weave.Model):
@weave.op()
def predict(self, paper: ArxivPaper, instruction: str) -> dict:
text = load_pdf(paper["pdf_url"])
result = chain_of_density_summarization(text, instruction, model=self.model, density_iterations=self.density_iterations)
result = chain_of_density_summarization(
text,
instruction,
model=self.model,
density_iterations=self.density_iterations,
)
return result

```

This `ArxivChainOfDensityPipeline` class encapsulates our summarization logic as a Weave Model, providing several key benefits:
Expand All @@ -226,8 +258,6 @@ This `ArxivChainOfDensityPipeline` class encapsulates our summarization logic as
4. Hyperparameter management: Model attributes (like `model` and `density_iterations`) are clearly defined and tracked across different runs, facilitating experimentation.
5. Integration with Weave ecosystem: Using `weave.Model` allows seamless integration with other Weave tools, such as evaluations and serving capabilities.

![Arxiv Chain of Density Pipeline](./media/chain_of_density/model.gif)

## Implement evaluation metrics

To assess the quality of our summaries, we'll implement simple evaluation metrics:
Expand All @@ -236,8 +266,11 @@ To assess the quality of our summaries, we'll implement simple evaluation metric
```python
import json


@weave.op()
def evaluate_summary(summary: str, instruction: str, model: str = "claude-3-sonnet-20240229") -> dict:
def evaluate_summary(
summary: str, instruction: str, model: str = "claude-3-sonnet-20240229"
) -> dict:
prompt = f"""
Summary: {summary}
Instruction: {instruction}
Expand Down Expand Up @@ -266,27 +299,23 @@ def evaluate_summary(summary: str, instruction: str, model: str = "claude-3-sonn
Ensure that the scores are integers between 1 and 5, and that the explanations are concise.
"""
response = anthropic_client.messages.create(
model=model,
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
model=model, max_tokens=1000, messages=[{"role": "user", "content": prompt}]
)
print(response.content[0].text)

eval_dict = json.loads(response.content[0].text)

return {
"relevance": eval_dict['relevance']['score'],
"conciseness": eval_dict['conciseness']['score'],
"technical_accuracy": eval_dict['technical_accuracy']['score'],
"average_score": sum(eval_dict[k]['score'] for k in eval_dict) / 3,
"evaluation_text": response.content[0].text
"relevance": eval_dict["relevance"]["score"],
"conciseness": eval_dict["conciseness"]["score"],
"technical_accuracy": eval_dict["technical_accuracy"]["score"],
"average_score": sum(eval_dict[k]["score"] for k in eval_dict) / 3,
"evaluation_text": response.content[0].text,
}
```

These evaluation functions use the Claude model to assess the quality of the generated summaries based on relevance, conciseness, and technical accuracy.

![Evaluation](./media/chain_of_density/evals_main_screen.gif)

## Create a Weave Dataset and run evaluation

To evaluate our pipeline, we'll create a Weave Dataset and run an evaluation:
Expand All @@ -299,16 +328,14 @@ dataset = weave.Dataset(
rows=[
{
"paper": arxiv_paper,
"instruction": "What was the approach to experimenting with different data mixtures?"
"instruction": "What was the approach to experimenting with different data mixtures?",
},
]
],
)

weave.publish(dataset)
```

![Dataset](./media/chain_of_density/eval_dataset.gif)

For our evaluation, we'll use an LLM-as-a-judge approach. This technique involves using a language model to assess the quality of outputs generated by another model or system. It leverages the LLM's understanding and reasoning capabilities to provide nuanced evaluations, especially for tasks where traditional metrics may fall short.

[![arXiv](https://img.shields.io/badge/arXiv-2306.05685-b31b1b.svg)](https://arxiv.org/abs/2306.05685)
Expand All @@ -330,8 +357,6 @@ arxiv_chain_of_density_pipeline = ArxivChainOfDensityPipeline()
results = await evaluation.evaluate(arxiv_chain_of_density_pipeline)
```

![Final Evaluation](./media/chain_of_density/eval_comparison.gif)

This code creates a dataset with our sample ArXiv paper, defines a quality scorer, and runs an evaluation of our summarization pipeline.

## Conclusion
Expand All @@ -344,8 +369,7 @@ In this example, we've demonstrated how to implement a Chain of Density summariz
4. Create a dataset and run an evaluation of the pipeline

Weave's seamless integration allows us to track inputs, outputs, and intermediate steps throughout the summarization process, making it easier to debug, optimize, and evaluate our LLM application.

For more information on Weave and its capabilities, check out the [Weave documentation](https://docs.wandb.ai/weave). You can extend this example to handle larger datasets, implement more sophisticated evaluation metrics, or integrate with other LLM workflows.
You can extend this example to handle larger datasets, implement more sophisticated evaluation metrics, or integrate with other LLM workflows.

<a
href="https://wandb.ai/wandb_fc/arxiv-reader/reports/Building-a-bot-to-summarize-arXiv-papers-as-PDFs-using-Anthrophic-and-W-B-Weave--Vmlldzo4Nzg0ODI4"
Expand Down
Loading

0 comments on commit 17cb4cc

Please sign in to comment.