`load_best_model_at_end` is inconsistent with evaluation (and save) logic at end of training #28539

antoine-lizee · 2024-01-16T17:43:53Z

System Info

transformers version: 4.36.2
Platform: Linux-5.10.201-191.748.amzn2.x86_64-x86_64-with-glibc2.26
Python version: 3.10.13
Huggingface_hub version: 0.20.2
Safetensors version: 0.3.3
Accelerate version: 0.26.0
Accelerate config: not found
PyTorch version (GPU?): 2.0.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@muellerzr @pacman100 @sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Shortened script below:


model_checkpoint = "xlm-roberta-large"
model_name = model_checkpoint.split("/")[-1]
model = XLMRobertaForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list))

batch_size = 32
learning_rate = 2e-5
eval_steps = 0.1

# The data + batch size leads to having 11277 steps

training_args = TrainingArguments(
    output_dir_name,
    logging_dir=run_dir,
    logging_strategy="steps",
    logging_steps=eval_steps / 5,
    evaluation_strategy="steps",
    eval_steps=eval_steps,
    save_strategy="steps",
    save_steps=eval_steps,
    learning_rate=learning_rate,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=epochs,
    weight_decay=0.01,
    push_to_hub=False,
    save_total_limit=4,
    load_best_model_at_end=True
)

data_collator = DataCollatorForTokenClassification(tokenizer)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=test_ds,
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

# Train the model
trainer.train()

Expected behavior

I would expect that my model is evaluated (and saved!) at the last step.

It is not, and in most example scripts we see trainer.evaluate() after the trainer.train().

As a result, when we set load_best_model_at_end=True we concretely discard any training that happened after the last checkpoint, which seems wrong. In my case, the last 10% of training is discarded.

My understanding of what's happening:

In the trainer callback, we check (here) if the global_step is a multiple of the eval_steps. If the total number of step is not a multiple of it, this condition is not met at the last step.
If we load_best_model_at_end, the last accessible evaluation does not include the performance of the latest stages of training.
As a side note, running trainer.evaluate() by hand after the training only re-evaluates the past checkpoint that was selected as the best.

The text was updated successfully, but these errors were encountered:

antoine-lizee · 2024-01-16T17:49:07Z

Notes

I realize that this issue probably doesn't arise if the strategy is epoch.

It seems that using N + epsilon as the num_train_epochs would go around this problem in a very hacky way (and evaluate / save the model that corresponds to the first step after the desired epoch that is a multiple of eval_steps). Would that be your recommendation?

edit: Ok digging a bit more, it seems that the proper way of fixing this problem would be to add a callback to the trainer which would enforce saving at the end of training.
I will do this, but the default behaviour is still "wrong" I believe. (and would warrant at least some clear disclaimer in the doc?)

amyeroberts · 2024-03-12T10:20:07Z

Gentle ping @muellerzr @pacman100

amyeroberts · 2024-04-08T11:34:33Z

Another ping @pacman100 @muellerzr

pacman100 · 2024-04-10T12:09:20Z

Hello, Zach will be looking into this

muellerzr · 2024-04-10T12:41:06Z

Done, #30160 will address this by making it default to save the model at the end of training, always.

ymoslem · 2024-04-10T15:41:34Z

Hello! I have a relevant question, please. If both load_best_model_at_end and push_to_hub are True, is the best model or the last model uploaded? If so, how can I verify this? I am asking because when the model card is updated, it shows “results on the evaluation set” of the last model, not the best model. Thanks for clarification!

UPDATE
Answering my question, I have downloaded the model from the hub and compared the checksum of its model file with that of the local one sha256sum model.safetensors
So, it appears that this is just not reflected when automatically updating the evaluation on the model card. I can manually edit the card.

github-actions · 2024-05-05T08:05:37Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface deleted a comment from github-actions bot Feb 16, 2024

huggingface deleted a comment from github-actions bot Mar 12, 2024

amyeroberts added the trainer label Apr 8, 2024

huggingface deleted a comment from github-actions bot Apr 8, 2024

muellerzr mentioned this issue Apr 10, 2024

Enforce saving at end of training if saving option chosen #30160

Merged

5 tasks

github-actions bot closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`load_best_model_at_end` is inconsistent with evaluation (and save) logic at end of training #28539

`load_best_model_at_end` is inconsistent with evaluation (and save) logic at end of training #28539

antoine-lizee commented Jan 16, 2024

antoine-lizee commented Jan 16, 2024 •

edited

Loading

amyeroberts commented Mar 12, 2024

amyeroberts commented Apr 8, 2024

pacman100 commented Apr 10, 2024

muellerzr commented Apr 10, 2024

ymoslem commented Apr 10, 2024 •

edited

Loading

github-actions bot commented May 5, 2024

load_best_model_at_end is inconsistent with evaluation (and save) logic at end of training #28539

load_best_model_at_end is inconsistent with evaluation (and save) logic at end of training #28539

Comments

antoine-lizee commented Jan 16, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

antoine-lizee commented Jan 16, 2024 • edited Loading

Notes

amyeroberts commented Mar 12, 2024

amyeroberts commented Apr 8, 2024

pacman100 commented Apr 10, 2024

muellerzr commented Apr 10, 2024

ymoslem commented Apr 10, 2024 • edited Loading

github-actions bot commented May 5, 2024

`load_best_model_at_end` is inconsistent with evaluation (and save) logic at end of training #28539

`load_best_model_at_end` is inconsistent with evaluation (and save) logic at end of training #28539

antoine-lizee commented Jan 16, 2024 •

edited

Loading

ymoslem commented Apr 10, 2024 •

edited

Loading