[WIP] in-loop task evaluator compatible from oe-eval-internal #72

lihaoxin2020 · 2024-10-24T18:35:49Z

This PR adds in-loop task evaluator callbacks, mostly borrow from OLMo repo and compatible with oe-eval-internal.

I implemented this for an ongoing project and I think it would be useful for the community, so i publish the draft to show the framework. If the team thinks it's worth adding the feature I would like to help refine it.

To use the evaluator, add the callback just like LMEvaluator:

downstream_evaluators = [
    "pubmedqa_mc",
    "scifact_rc",
]

trainer_config = (
    TrainerConfig(
        ...
    ).with_callback(
        "evaluator",
        LMEvaluatorCallbackConfig(
            eval_dataset=NumpyDatasetConfig(
                paths=["/net/nfs/allennlp/llm-data/c4/en/c4-validation.00000-00008.npy"],
                metadata=[{"label": "c4-validation"}],
                name=NumpyDatasetType.padded_fsl,
                sequence_length=1024,
                tokenizer=tokenizer_config,
                work_dir="/tmp/dataset-cache",
            ),
            eval_interval=250,
            eval_duration=Duration.steps(10),
        ),
    ).with_callback(
        "downstream",
        DownstreamEvaluatorCallbackConfig(
            labels=downstream_evaluators,
            eval_batch_size=4,
            tokenizer="meta-llama/Llama-2-7b-hf",  # tokenizer implementation from OMLo
            eval_interval=10,
            # eval_duration=Duration.steps(10),
        ),
    )
)

Let me know if you think it's useful to have this in main.

Replaces #72. Gives us complete parity with the original downstream evals in the OLMo repo.

add before test

164153a

epwalsh mentioned this pull request Oct 29, 2024

Add a callback for downstream evals, update Docker builds #73

Merged

epwalsh added a commit that referenced this pull request Oct 30, 2024

Add a callback for downstream evals, update Docker builds (#73)

3fe59b6

Replaces #72. Gives us complete parity with the original downstream evals in the OLMo repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] in-loop task evaluator compatible from oe-eval-internal #72

[WIP] in-loop task evaluator compatible from oe-eval-internal #72

lihaoxin2020 commented Oct 24, 2024

[WIP] in-loop task evaluator compatible from oe-eval-internal #72

Are you sure you want to change the base?

[WIP] in-loop task evaluator compatible from oe-eval-internal #72

Conversation

lihaoxin2020 commented Oct 24, 2024