Skip to content

Commit

Permalink
finish (#1022)
Browse files Browse the repository at this point in the history
Co-authored-by: Max Marion <[email protected]>
  • Loading branch information
bmosaicml and maxisawesome authored Mar 11, 2024
1 parent d61c53d commit 4e43792
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions scripts/eval/local_data/EVAL_GAUNTLET.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Mosaic Eval Gauntlet v0.1.0 - Evaluation Suite
# Mosaic Eval Gauntlet v0.3.0 - Evaluation Suite


<!-- SETUPTOOLS_LONG_DESCRIPTION_HIDE_BEGIN -->
Expand All @@ -24,7 +24,7 @@ At evaluation time, we run all the benchmarks, average the subscores within each

For example, if benchmark A has a random baseline accuracy of 25%, and the model achieved 30%, we would report this as (0.3 - 0.25)/(1-0.25) = 0.0667. This can be thought of as the accuracy above chance rescaled so the max is 1. For benchmarks in which the random guessing baseline accuracy is ~0 we report the accuracy as is. Note that with this rescaling, a model could technically score below 0 on a category as a whole, but we haven’t found this to occur with any of the models we’ve tested.

This is version v0.1.0 of the Eval Gauntlet.
This is version v0.3.0 of the Eval Gauntlet.

### Reading Comprehension

Expand Down

0 comments on commit 4e43792

Please sign in to comment.