-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When training a unified model, TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal' #41
Comments
If you are certain you are using https://github.com/ContextualAI/gritlm/blob/main/scripts/modeling_mistral_gritlm.py or https://huggingface.co/GritLM/GritLM-7B/blob/main/modeling_gritlm7b.py , then I am not sure what the problem is. Maybe try |
Yes, maybe there are some small problems. I try to print the type of the model in the training/model.py:
and it shows: <class 'peft.peft_model.PeftModel'> Maybe Lora influence the model type? I am not clear about it. Do you train the model in a full fine-tuning manner? |
When I remove --lora, it shows CUDA out of memory^ ^. Maybe it is really due to the Lora. Maybe I could use more GPUs, but why the Lora influence the model type? |
I see, yes it could be because of Lora. I think that the Peft library wraps the transformer model and this could change the kwargs that are passed through. You may need to change something in https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/model.py to pass it through. We do full fine-tuning; I haven't really tried Lora with GRIT. |
I see. Thank you for your reply! |
Hello!
I meet a problem when I train the model in the unified mode.
First, I would like to share that when I evaluate several models in the artifacts (for example bbcc-mean, cccc-lasttoken, and cccc-wmean), it is also shown that info: TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.
To tackle the problem, I deem that only when the model is loaded with the class of MistralForCausalLM in modeling_gritlm7b.py, the is_causal argument is meaningful. Otherwise, it I do not put the modeling_gritlm7b.py in the model directory, the model is loaded as the MistralForCausalLM in the transformers lib, which do not have "is_causal". Besides, I think that the model config file should also be modified by adding:
"auto_map": {
"AutoModel": "modeling_gritlm7b.MistralModel",
"AutoModelForCausalLM": "modeling_gritlm7b.MistralForCausalLM",
"AutoModelForSequenceClassification": "modeling_gritlm7b.MistralForSequenceClassification"
},
I fix this issue for evaluation by executing the behaviors above and it succeeds. However, I meet the same question when I train the model. I download Mistral-7B, add modeling_gritlm7b.py, and modify the config file. However, it still shows TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.
I guess maybe the model is not loaded correctly, so I print the type of the model in the run.py after loading the model:
The result is <class 'transformers_modules.Mistral-7B.modeling_gritlm7b.MistralForCausalLM'>, which is correct. So I want to know what is the problem? How can I modify some codes to make it work?
The training command:
torchrun --nproc_per_node 1
-m training.run
--output_dir output_dir
--model_name_or_path ../models/Mistral-7B
--train_data ../data/unified_data
--learning_rate 1e-5
--num_train_epochs 5
--per_device_train_batch_size 5
--per_device_generative_bs 1
--dataloader_drop_last True
--normalized True
--temperature 0.02
--query_max_len 32
--passage_max_len 128
--train_group_size 2
--mode unified
--max_steps 1253
--attn cccc
--overwrite_output_dir
--lora
Waiting for your kind reply! :)
The text was updated successfully, but these errors were encountered: