Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(weave): Add initial suite of scorers, refactor weave/flow #2662

Merged
merged 163 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 142 commits
Commits
Show all changes
163 commits
Select commit Hold shift + click to select a range
c7ac51a
add inital scorere, refactor
morganmcg1 Oct 10, 2024
08c83bb
fixes
morganmcg1 Oct 10, 2024
7928270
add json and xml scorers
tcapelle Oct 10, 2024
f2c69cd
add keys
tcapelle Oct 10, 2024
3c398f9
Embed Scorer
tcapelle Oct 10, 2024
07502df
add openai moderation
tcapelle Oct 10, 2024
3d2e352
missing import
tcapelle Oct 10, 2024
f1f604a
re-invent the wheel
tcapelle Oct 10, 2024
291363f
simplify moderation output
tcapelle Oct 10, 2024
97242ec
handle system message
tcapelle Oct 10, 2024
ad5f021
simple prompt scorer
tcapelle Oct 10, 2024
125d583
clean test
tcapelle Oct 10, 2024
200c115
pydantic validator
tcapelle Oct 10, 2024
18d3d81
move classification out
tcapelle Oct 10, 2024
ddd1dfd
rename embedding
tcapelle Oct 11, 2024
49cb13f
add ragas support
tcapelle Oct 11, 2024
9a22490
ref
tcapelle Oct 11, 2024
674080b
update init
tcapelle Oct 11, 2024
0f16c19
lint
tcapelle Oct 11, 2024
a22c76b
pass through dataset row
tcapelle Oct 11, 2024
8050e8b
add string match
tcapelle Oct 11, 2024
bed6017
hallucination and llm refactor
tcapelle Oct 11, 2024
5b7d7e5
fix embed
tcapelle Oct 11, 2024
8cd5957
fix ragas
tcapelle Oct 11, 2024
5dcde73
rename
tcapelle Oct 11, 2024
a367096
refactor LLMScorer, move stuff around
tcapelle Oct 11, 2024
d78f7cf
rename ragas
tcapelle Oct 11, 2024
3fdaade
add summarization (sort of)
tcapelle Oct 11, 2024
4fd3c22
levenshtein
tcapelle Oct 11, 2024
3df1839
rename
tcapelle Oct 11, 2024
672eed8
model_output -> output
tcapelle Oct 11, 2024
5fb442a
model_output -> output
tcapelle Oct 11, 2024
4b903e3
unify naming
tcapelle Oct 11, 2024
9e6e3be
let's go with tests!
tcapelle Oct 11, 2024
dddb6cf
rename model_output to output
tcapelle Oct 11, 2024
aa4f588
fix eval tests
tcapelle Oct 11, 2024
3e13e57
add LLM services tests
tcapelle Oct 11, 2024
04496cc
enable test
tcapelle Oct 11, 2024
8bafa5d
lint
tcapelle Oct 11, 2024
06dfb7f
ruff
tcapelle Oct 11, 2024
20d164a
fix most tests and linting
tcapelle Oct 11, 2024
ed94dbd
Merge branch 'master' into add_more_scorers
tcapelle Oct 11, 2024
7359469
missing distance
tcapelle Oct 12, 2024
831db85
check instructor instal
tcapelle Oct 12, 2024
23dcd6a
rename model_output -> output in tests
tcapelle Oct 12, 2024
343b86d
wrong test path
tcapelle Oct 12, 2024
aa0f675
Merge branch 'master' into add_more_scorers
tcapelle Oct 12, 2024
6a0abaf
don't mock oai
tcapelle Oct 12, 2024
a4920cf
Update from scorer to scorers dir, small tidy ups
morganmcg1 Oct 12, 2024
29e6f78
Merge branch 'add_more_scorers' of https://github.com/wandb/weave int…
morganmcg1 Oct 12, 2024
2114c4f
re-order llms from most popular to least
morganmcg1 Oct 12, 2024
0b2bbf2
feat(weave): fixes tests, summarization scorer re-write, re-names flo…
morganmcg1 Oct 12, 2024
2f479f5
feat(weave): re-add weave/flow/scorer.py for backward compatibiliy
morganmcg1 Oct 12, 2024
4044597
lint
morganmcg1 Oct 12, 2024
fdf55ea
more lint
morganmcg1 Oct 12, 2024
87a25c3
add gemini support
tcapelle Oct 14, 2024
a3c6617
remove regex
tcapelle Oct 14, 2024
f99f190
feat(docs): start adding Scorers docs, remove fast_model_id
morganmcg1 Oct 14, 2024
5d871ec
Merge branch 'add_more_scorers' of https://github.com/wandb/weave int…
morganmcg1 Oct 14, 2024
bd62f0d
update outputs of hallucination scorers, add docstrings, add docs
morganmcg1 Oct 14, 2024
7c129f3
clear gemini message
tcapelle Oct 14, 2024
ef4e3ca
add full eval test
tcapelle Oct 14, 2024
bd45db6
deal with gemini kwargs
tcapelle Oct 14, 2024
fae45d6
better column map error
tcapelle Oct 14, 2024
748d3f4
test LLM integrations
tcapelle Oct 14, 2024
2445c97
update reqa
tcapelle Oct 14, 2024
16ec03f
fix test
tcapelle Oct 14, 2024
1eefa31
lint
tcapelle Oct 15, 2024
b47b4a6
Merge remote-tracking branch 'origin/master' into add_more_scorers
tcapelle Oct 16, 2024
30e9d01
pass API env vars
tcapelle Oct 16, 2024
a52fc83
rename google key
tcapelle Oct 16, 2024
633be15
remove 3.13
tcapelle Oct 16, 2024
3ca802d
lint
tcapelle Oct 16, 2024
5dfed1f
whitespaces :/
tcapelle Oct 16, 2024
adfad10
re-write scorer docs for grade 7
morganmcg1 Oct 16, 2024
13cee24
finish scorers docs edits
morganmcg1 Oct 16, 2024
6dcbbbe
small docs fixes
morganmcg1 Oct 16, 2024
8a1d366
small docs fix
morganmcg1 Oct 16, 2024
b7058e1
delete prints from eval.py
morganmcg1 Oct 16, 2024
7c1f50a
Merge branch 'master' into add_more_scorers
morganmcg1 Oct 16, 2024
c05630a
Update docs/docs/guides/evaluation/scorers.md
morganmcg1 Oct 17, 2024
fd9e001
Update docs/docs/guides/evaluation/scorers.md
morganmcg1 Oct 17, 2024
88fbdc6
remove unused kwargs
tcapelle Oct 17, 2024
1476c19
add similarity scorer test
tcapelle Oct 17, 2024
4b63fe4
add test for stringify
tcapelle Oct 17, 2024
873b90c
add eval test
tcapelle Oct 17, 2024
a847395
add column_map warnings, fix docs, make create and embed available
morganmcg1 Oct 17, 2024
37028ea
test for column_map cases
tcapelle Oct 17, 2024
00f7868
lint
tcapelle Oct 17, 2024
d508f76
remove skip
tcapelle Oct 17, 2024
9a22a22
check with isinstance
tcapelle Oct 17, 2024
94a1229
lint again
tcapelle Oct 17, 2024
a511130
remove useless test
tcapelle Oct 17, 2024
4a92a89
back to <- map
tcapelle Oct 17, 2024
e690eb5
lint
tcapelle Oct 17, 2024
5cc14e3
typing
tcapelle Oct 17, 2024
644d327
renove unused
tcapelle Oct 17, 2024
1555ee9
ruff
tcapelle Oct 17, 2024
9eb2e04
just use list
tcapelle Oct 17, 2024
8e247f7
reverse arrow
tcapelle Oct 17, 2024
a91fd67
reverse again
tcapelle Oct 17, 2024
782bd0e
back compat should work
tcapelle Oct 17, 2024
b76e8ec
move out of try
tcapelle Oct 17, 2024
2f704c5
another edge case...
tcapelle Oct 17, 2024
9d27bc2
rename `has_hallucination`
tcapelle Oct 17, 2024
8b1ec76
fix ragas
tcapelle Oct 17, 2024
6cfc6fa
lint
tcapelle Oct 17, 2024
448139b
Apply scorers docs suggestions from Andrew's review
morganmcg1 Oct 18, 2024
1b7d26f
update scorer docs, fix similarityscore threshold
morganmcg1 Oct 18, 2024
4893a3c
update similarity scorer and context entity scorer docs
morganmcg1 Oct 18, 2024
eaa360c
Move all scorers from flow into weave.scorers
morganmcg1 Oct 18, 2024
c5ba6dc
simplify JSON scorer
tcapelle Oct 21, 2024
d1e748c
warn new scorers path
tcapelle Oct 21, 2024
bdb95e6
remove TODO
tcapelle Oct 21, 2024
85384d1
split into scorers + scorers_test
tcapelle Oct 21, 2024
2d08123
make more real
tcapelle Oct 21, 2024
f29b305
duh, don't tet the `scorers` shard
tcapelle Oct 21, 2024
49d29b6
lint
tcapelle Oct 21, 2024
9e7ec30
rename integrations
tcapelle Oct 21, 2024
2a0fd55
missing update to test.yaml
tcapelle Oct 21, 2024
3450317
Merge remote-tracking branch 'origin/master' into add_more_scorers
tcapelle Oct 21, 2024
c8f7e15
Merge remote-tracking branch 'origin/master' into add_more_scorers
tcapelle Oct 21, 2024
e69757a
add Instructor req to scorers deps
morganmcg1 Oct 22, 2024
e8a974e
Updating column error messages to be more consistent eval.py
morganmcg1 Oct 22, 2024
475eaf9
Apply docs suggestions from code review
morganmcg1 Oct 22, 2024
ec5826e
Modify scorers error message to be more consistent and precise eval.py
morganmcg1 Oct 22, 2024
652ae75
add deprecation warning to flow/scorer.py
morganmcg1 Oct 22, 2024
e4fe1f8
Merge branch 'add_more_scorers' of https://github.com/wandb/weave int…
morganmcg1 Oct 22, 2024
dc4242e
remove temp weave.flow.scorers dir
morganmcg1 Oct 22, 2024
0b61f7c
Fix code formatting in scorers docs
morganmcg1 Oct 22, 2024
afc1ab1
scorers.md formatting
morganmcg1 Oct 22, 2024
ff05051
scorers docs update
morganmcg1 Oct 22, 2024
c6aac64
rename scorers_integrations to scorers_tests, fix scorers imports in …
morganmcg1 Oct 22, 2024
ac127e8
scorers tests linting
morganmcg1 Oct 22, 2024
1730b57
parameterize scorer tests
morganmcg1 Oct 22, 2024
1c0ba4e
Merge branch 'master' into add_more_scorers
morganmcg1 Oct 22, 2024
8187547
add full evaluation examples for each scorer
tcapelle Oct 23, 2024
742b15a
Merge remote-tracking branch 'origin/master' into add_more_scorers
tcapelle Oct 24, 2024
df7a4a1
subclass to map columns
tcapelle Oct 25, 2024
8f8abec
scorer summarization
tcapelle Oct 25, 2024
d89ebcc
add missing summarize
tcapelle Oct 25, 2024
afcac81
Merge remote-tracking branch 'origin/master' into add_more_scorers
tcapelle Oct 25, 2024
f9911c3
remove unused kwargs
tcapelle Oct 28, 2024
5f7388d
set default parameters for temp and max_tokens
tcapelle Oct 28, 2024
cc12cf1
improve docstrings
tcapelle Oct 28, 2024
f65928e
add default model_id
tcapelle Oct 28, 2024
cde2681
typos and link to instructor
tcapelle Oct 28, 2024
fd8c8d5
add default model
tcapelle Oct 28, 2024
c47be95
fixed code snippets
tcapelle Oct 28, 2024
4a80d5e
typos
tcapelle Oct 28, 2024
8f1acce
update docs
tcapelle Oct 28, 2024
d66e9a8
add default embedding model
tcapelle Oct 28, 2024
b1932bc
remove unused col
tcapelle Oct 28, 2024
0220499
Merge remote-tracking branch 'origin/master' into add_more_scorers
tcapelle Oct 28, 2024
e212a03
lint
tcapelle Oct 28, 2024
390fd6c
remove ignore type
tcapelle Oct 28, 2024
394e39e
remove ignores
tcapelle Oct 28, 2024
d67611c
type
tcapelle Oct 28, 2024
3dd3dce
remove ignore type
tcapelle Oct 28, 2024
30a11b3
remove ignore type
tcapelle Oct 28, 2024
77f7863
make copy-pastable
tcapelle Oct 28, 2024
c23c95f
remove missing ignores
tcapelle Oct 28, 2024
2ac6f53
missing imports
tcapelle Oct 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ jobs:
'mistral1',
'notdiamond',
'openai',
'scorers_tests',
'pandas-test',
]
fail-fast: false
Expand Down Expand Up @@ -292,6 +293,9 @@ jobs:
WF_CLICKHOUSE_HOST: weave_clickhouse
WEAVE_SERVER_DISABLE_ECOSYSTEM: 1
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
nox -e "tests-${{ matrix.python-version-major }}.${{ matrix.python-version-minor }}(shard='${{ matrix.nox-shard }}')"
trace-tests-matrix-check: # This job does nothing and is only used for the branch protection
Expand Down
Loading
Loading