Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facing issues in training composed model #5

Open
RadiantCrystal opened this issue Nov 6, 2024 · 2 comments
Open

Facing issues in training composed model #5

RadiantCrystal opened this issue Nov 6, 2024 · 2 comments

Comments

@RadiantCrystal
Copy link

Hi,

When I am using command accelerate launch --config_file accelerate_config.yaml train.py --anchor_model_dir google/gemma-7b --aug_model_dir google/gemma-7b --num_heads 2 --num_connections 2 --learning_rate 3e-4 --batch_size 2 --output_dir './tmp' to train the composed model I am getting below error. Could you please help me to resolve this issue as soon as possible?
Also even after trying with FSDP setup, it is not executing in multiple gpus.


Screenshot 2024-11-06 at 4 33 09 PM

@berserank
Copy link
Collaborator

berserank commented Nov 8, 2024

Can you share your accelerate_config.yaml file and the number of GPUs you were using to train the model?

@RadiantCrystal
Copy link
Author

Hi Aditya,

Thanks for your response.

I am using the same yaml file you have given in the repo. Also I am using 4 gpus. Somehow I managed fix this issue by modifying the parameters. Although I would appreciate if authors could mention the parameters specifically responsible for gpus allocations etc in their repo. In that way, one can easily execute the code without much hassle.

Also, I am now facing problem in loading the saved model in './tmp' folder after finetuning. Can you please explicitly provide the code for loading the model from that saved folder '/tmp'?

The generic model loading code is not working and throwing some errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants