Facing issues in training composed model #5

RadiantCrystal · 2024-11-06T08:36:20Z

Hi,

When I am using command accelerate launch --config_file accelerate_config.yaml train.py --anchor_model_dir google/gemma-7b --aug_model_dir google/gemma-7b --num_heads 2 --num_connections 2 --learning_rate 3e-4 --batch_size 2 --output_dir './tmp' to train the composed model I am getting below error. Could you please help me to resolve this issue as soon as possible?
Also even after trying with FSDP setup, it is not executing in multiple gpus.

The text was updated successfully, but these errors were encountered:

berserank · 2024-11-08T07:21:07Z

Can you share your accelerate_config.yaml file and the number of GPUs you were using to train the model?

RadiantCrystal · 2024-11-08T13:11:32Z

Hi Aditya,

Thanks for your response.

I am using the same yaml file you have given in the repo. Also I am using 4 gpus. Somehow I managed fix this issue by modifying the parameters. Although I would appreciate if authors could mention the parameters specifically responsible for gpus allocations etc in their repo. In that way, one can easily execute the code without much hassle.

Also, I am now facing problem in loading the saved model in './tmp' folder after finetuning. Can you please explicitly provide the code for loading the model from that saved folder '/tmp'?

The generic model loading code is not working and throwing some errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facing issues in training composed model #5

Facing issues in training composed model #5

RadiantCrystal commented Nov 6, 2024

berserank commented Nov 8, 2024 •

edited

Loading

RadiantCrystal commented Nov 8, 2024

Facing issues in training composed model #5

Facing issues in training composed model #5

Comments

RadiantCrystal commented Nov 6, 2024

berserank commented Nov 8, 2024 • edited Loading

RadiantCrystal commented Nov 8, 2024

berserank commented Nov 8, 2024 •

edited

Loading