You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
I have a LoRA finetuning pipeline for mistralai/Mistral-7B-v0.1, where have been using accelerate==0.25.0
and deepspeed==0.13.2 to train the model.
I tried to pip install --upgrade accelerate to 0.26.0 or 0.27.2, but noticed that the accuracy drops by ~4.5% when doing so. It is hard to figure out from the release notes of latest versions which change might be causing this behaviour.
I am using as basis of my script this bash script from simlm repo, which calls this Python script.
The model accuracy drops by ~4.5% if I pip install --upgrade from accelerate==0.25.0 to 0.26.0 or 0.27.2
Here is additional info on my environment and config files.
The issue happens both when I use accelerate or deepspeed commands:
This should not happen, thanks for reporting this issue.
but noticed that the accuracy drops by ~4.5% when doing so
Is this train or validation/test accuracy? Are those absolute or relative percentage points?
Generally, this type of finding is very hard to debug without being able to run the code. If it is possible for you, could you check if the same issue occurs without using DeepSpeed? Is it possible to boil down the problem to something that can be run quickly so that we can pinpoint the source of the issue with git bisect? Absent of this, it's going to be hard to identify what exactly causes the drop.
There is one thing that comes to mind from memory: In accelerate 0.25, we had enabled the random sampler to be seedable for reproducibility (#2057) but users reported issues so from 0.26, we went back to the previous behavior (#2319). Maybe this change had the opposite effect for you? If this applies to you, you could try passing use_seedable_sampler=True to Accelerator and check if that fixes things.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
I have a LoRA finetuning pipeline for
mistralai/Mistral-7B-v0.1
, where have been usingaccelerate==0.25.0
and
deepspeed==0.13.2
to train the model.I tried to pip install --upgrade
accelerate
to0.26.0
or0.27.2
, but noticed that the accuracy drops by ~4.5% when doing so. It is hard to figure out from the release notes of latest versions which change might be causing this behaviour.I am using as basis of my script this bash script from simlm repo, which calls this Python script.
The model accuracy drops by ~4.5% if I pip install --upgrade from
accelerate==0.25.0
to0.26.0
or0.27.2
Here is additional info on my environment and config files.
The issue happens both when I use
accelerate
ordeepspeed
commands:Accelerate Config (default_config.yaml)
DeepSpeed config (ds_config.json)
Expected behavior
The model accuracy should not drop a lot when upgrading
accelerate
versionThe text was updated successfully, but these errors were encountered: