Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model accuracy drops when upgrading from accelerate==0.25.0 to 0.26.0 or 0.27.2 #2476

Closed
2 of 4 tasks
gabrielspmoreira opened this issue Feb 21, 2024 · 2 comments
Closed
2 of 4 tasks

Comments

@gabrielspmoreira
Copy link

System Info

- `Accelerate` version: 0.27.2
- Platform: Linux-5.15.0-1032-oracle-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.22.2
- PyTorch version (GPU?): 2.1.0a0+fe05266 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 2015.68 GB
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
        Not found

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

I have a LoRA finetuning pipeline for mistralai/Mistral-7B-v0.1, where have been using accelerate==0.25.0
and deepspeed==0.13.2 to train the model.
I tried to pip install --upgrade accelerate to 0.26.0 or 0.27.2, but noticed that the accuracy drops by ~4.5% when doing so. It is hard to figure out from the release notes of latest versions which change might be causing this behaviour.
I am using as basis of my script this bash script from simlm repo, which calls this Python script.
The model accuracy drops by ~4.5% if I pip install --upgrade from accelerate==0.25.0 to 0.26.0 or 0.27.2

Here is additional info on my environment and config files.

The issue happens both when I use accelerate or deepspeed commands:

accelerate launch --config_file default_config_ranker.yaml ./src/train_model.py \
    --model_name_or_path mistralai/Mistral-7B-v0.1 \
    --use_accelerator True \
    ...
deepspeed ./src/train_model.py --deepspeed ds_config.json \
    --model_name_or_path mistralai/Mistral-7B-v0.1 \
    ...

Accelerate Config (default_config.yaml)

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  deepspeed_config_file: ./ds_config.json
  zero3_init_flag: true
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

DeepSpeed config (ds_config.json)

{
    "bf16": {
        "enabled": false
    },
    "_fp16": {
        "enabled": true,
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 12,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },
    "scheduler": {
        "type": "WarmupDecayLR",
        "params": {
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": 1000,
            "total_num_steps": "auto"
        }
    },
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 2e8,
        "contiguous_gradients": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 5000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

Expected behavior

The model accuracy should not drop a lot when upgrading accelerate version

@BenjaminBossan
Copy link
Member

This should not happen, thanks for reporting this issue.

but noticed that the accuracy drops by ~4.5% when doing so

Is this train or validation/test accuracy? Are those absolute or relative percentage points?

Generally, this type of finding is very hard to debug without being able to run the code. If it is possible for you, could you check if the same issue occurs without using DeepSpeed? Is it possible to boil down the problem to something that can be run quickly so that we can pinpoint the source of the issue with git bisect? Absent of this, it's going to be hard to identify what exactly causes the drop.

There is one thing that comes to mind from memory: In accelerate 0.25, we had enabled the random sampler to be seedable for reproducibility (#2057) but users reported issues so from 0.26, we went back to the previous behavior (#2319). Maybe this change had the opposite effect for you? If this applies to you, you could try passing use_seedable_sampler=True to Accelerator and check if that fixes things.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants