feat(weave): Add initial suite of scorers, refactor weave/flow #2662

morganmcg1 · 2024-10-10T14:18:22Z

Scorers

Create opinionated scorers for users looking for off-the-shelf evaluators to common LLM issues.

TODO

Restructure scorers
Create initial set of scorers
Write tests for all new scorers
Add google.generativeai
Write documentation
Cleanup, remove any stray TODOs, comments, prints, commented-out code

New Scorers

This PR introduces several scorers for evaluating different aspects of model outputs. These scorers are designed to work with the existing Scorer base class and can be easily integrated into the evaluation pipeline. The LLM-based scorers support multiple LLM providers (OpenAI, Anthropic, Google Generative AI and Mistral) with a unified interface and use the instructor library for a consistent api and for structured outputs.

LLMScorer: A base class for LLM-based scorers.
InstructorLLMScorer: for instructor-powered LLMs

HallucinationScorer: Given a model output and and input, checks for hallucinations
SummarizationScorer: Grades an output summary and also returns a measure of the entity-density of the summary
EmbeddingScorer: Computes cosine similarity between embeddings of model output and target.
OpenAIModerationScorer: Uses OpenAI's moderation API to check if the model output is safe.
JSONScorer: Validates if the model output is a valid JSON string.
XMLScorer: Checks if the model output is a valid XML string.
PydanticScorer: Checks if the model output is valid for a given Pydantic model.
ContextEntityRecallScorer: estimates context recall , from the RAGAS library
ContextRelevancyScorer: evaluates the relevancy of the provided context, from RAGAS library

User-facing api changes

now outputs can be used as a param in the score function as an alternative to model_outputs
added column_map as an optional attribute for Scorer to give more flexibility when scorer param names are different to dataset column names.

Structural repo changes

creation of weave/flow/scorers and moving most core scoring functionality from weave/flow/scorer.py to weave/flow/scorers/base_scorer.py
- weave/flow/scorer.py kept around for now for backward compatibility
weave/scorers dir was created for high-level imports to enable more dev-friendly importing of scorers. The user shouldn't have to know about flow:
- from: from weave.flow.scorers.json_scorer import ValidJSONScorer
- to: from weave.scorers import ValidJSONScorer

Documentation

circle-job-mirror · 2024-10-10T14:30:23Z

Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=142eca3bb9565e81853eba3b20bf0d455199c1fb

docs/docs/guides/evaluation/scorers.md

add inital scorere, refactor

c7ac51a

morganmcg1 requested a review from a team as a code owner October 10, 2024 14:18

morganmcg1 and others added 4 commits October 10, 2024 15:34

fixes

08c83bb

add json and xml scorers

7928270

add keys

f2c69cd

Embed Scorer

3c398f9

tcapelle force-pushed the add_more_scorers branch from b8dcef6 to 3c398f9 Compare October 10, 2024 16:49

tcapelle added 4 commits October 10, 2024 19:00

add openai moderation

07502df

missing import

3d2e352

re-invent the wheel

f1f604a

simplify moderation output

291363f

tcapelle changed the title ~~add inital scorere, refactor~~ add inital scorers, refactor Oct 10, 2024

tcapelle added 2 commits October 10, 2024 21:14

handle system message

97242ec

simple prompt scorer

ad5f021

morganmcg1 marked this pull request as draft October 10, 2024 19:27

tcapelle added 14 commits October 10, 2024 21:30

clean test

125d583

pydantic validator

200c115

move classification out

18d3d81

rename embedding

ddd1dfd

add ragas support

49cb13f

ref

9a22490

update init

674080b

lint

0f16c19

pass through dataset row

a22c76b

add string match

8050e8b

hallucination and llm refactor

bed6017

fix embed

5b7d7e5

fix ragas

8cd5957

rename

5dcde73

andrewtruong reviewed Oct 28, 2024

View reviewed changes

docs/docs/guides/evaluation/scorers.md Show resolved Hide resolved

andrewtruong reviewed Oct 28, 2024

View reviewed changes

docs/docs/guides/evaluation/scorers.md Outdated Show resolved Hide resolved

andrewtruong reviewed Oct 28, 2024

View reviewed changes

docs/docs/guides/evaluation/scorers.md Outdated Show resolved Hide resolved

andrewtruong reviewed Oct 28, 2024

View reviewed changes

docs/docs/guides/evaluation/scorers.md Show resolved Hide resolved

tcapelle added 21 commits October 28, 2024 11:43

remove unused kwargs

f9911c3

set default parameters for temp and max_tokens

5f7388d

improve docstrings

cc12cf1

add default model_id

f65928e

typos and link to instructor

cde2681

add default model

fd8c8d5

fixed code snippets

c47be95

typos

4a80d5e

update docs

8f1acce

add default embedding model

d66e9a8

remove unused col

b1932bc

Merge remote-tracking branch 'origin/master' into add_more_scorers

0220499

lint

e212a03

remove ignore type

390fd6c

remove ignores

394e39e

type

d67611c

remove ignore type

3dd3dce

remove ignore type

30a11b3

make copy-pastable

77f7863

remove missing ignores

c23c95f

missing imports

2ac6f53

tcapelle force-pushed the add_more_scorers branch from 152ae34 to 2ac6f53 Compare October 28, 2024 17:13

andrewtruong approved these changes Oct 28, 2024

View reviewed changes

scottire merged commit f79fbcc into master Oct 28, 2024
218 checks passed

scottire deleted the add_more_scorers branch October 28, 2024 18:29

github-actions bot locked and limited conversation to collaborators Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(weave): Add initial suite of scorers, refactor weave/flow #2662

feat(weave): Add initial suite of scorers, refactor weave/flow #2662

morganmcg1 commented Oct 10, 2024 •

edited

Loading

circle-job-mirror bot commented Oct 10, 2024 •

edited

Loading

feat(weave): Add initial suite of scorers, refactor weave/flow #2662

feat(weave): Add initial suite of scorers, refactor weave/flow #2662

Conversation

morganmcg1 commented Oct 10, 2024 • edited Loading

Scorers

New Scorers

User-facing api changes

Structural repo changes

Documentation

circle-job-mirror bot commented Oct 10, 2024 • edited Loading

morganmcg1 commented Oct 10, 2024 •

edited

Loading

circle-job-mirror bot commented Oct 10, 2024 •

edited

Loading