Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When training a unified model, TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal' #41

Open
zillion-zhao opened this issue Jun 14, 2024 · 5 comments

Comments

@zillion-zhao
Copy link

Hello!

I meet a problem when I train the model in the unified mode.

First, I would like to share that when I evaluate several models in the artifacts (for example bbcc-mean, cccc-lasttoken, and cccc-wmean), it is also shown that info: TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.

To tackle the problem, I deem that only when the model is loaded with the class of MistralForCausalLM in modeling_gritlm7b.py, the is_causal argument is meaningful. Otherwise, it I do not put the modeling_gritlm7b.py in the model directory, the model is loaded as the MistralForCausalLM in the transformers lib, which do not have "is_causal". Besides, I think that the model config file should also be modified by adding:
"auto_map": {
"AutoModel": "modeling_gritlm7b.MistralModel",
"AutoModelForCausalLM": "modeling_gritlm7b.MistralForCausalLM",
"AutoModelForSequenceClassification": "modeling_gritlm7b.MistralForSequenceClassification"
},

I fix this issue for evaluation by executing the behaviors above and it succeeds. However, I meet the same question when I train the model. I download Mistral-7B, add modeling_gritlm7b.py, and modify the config file. However, it still shows TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.

I guess maybe the model is not loaded correctly, so I print the type of the model in the run.py after loading the model:

model = GritLMTrainModel(
    model_name_or_path=model_args.model_name_or_path,
    normalized=model_args.normalized,
    pooling_method=model_args.pooling_method,
    negatives_cross_device=training_args.negatives_cross_device,
    temperature=training_args.temperature,
    mode=training_args.mode,
    projection=model_args.projection,
    attn=model_args.attn,
    attn_implementation=model_args.attn_implementation,
    torch_dtype=args_to_dtype(training_args),
    loss_gen_type=training_args.loss_gen_type,
    loss_gen_factor=training_args.loss_gen_factor,
    use_cache=False,
    # Critical to make Mixtral work
    low_cpu_mem_usage=True,
    quantization_config=quantization_config,
    load_in_4bit=load_in_4bit,
)
**print(type(model.model))**

The result is <class 'transformers_modules.Mistral-7B.modeling_gritlm7b.MistralForCausalLM'>, which is correct. So I want to know what is the problem? How can I modify some codes to make it work?

The training command:
torchrun --nproc_per_node 1
-m training.run
--output_dir output_dir
--model_name_or_path ../models/Mistral-7B
--train_data ../data/unified_data
--learning_rate 1e-5
--num_train_epochs 5
--per_device_train_batch_size 5
--per_device_generative_bs 1
--dataloader_drop_last True
--normalized True
--temperature 0.02
--query_max_len 32
--passage_max_len 128
--train_group_size 2
--mode unified
--max_steps 1253
--attn cccc
--overwrite_output_dir
--lora

Waiting for your kind reply! :)

@Muennighoff
Copy link
Collaborator

If you are certain you are using https://github.com/ContextualAI/gritlm/blob/main/scripts/modeling_mistral_gritlm.py or https://huggingface.co/GritLM/GritLM-7B/blob/main/modeling_gritlm7b.py , then I am not sure what the problem is. Maybe try pip show transformers and replace the modeling_mistral.py file with one of the correct Python files. Else this seems like a simple issue that can just be solved by debugging with print statements.

@zillion-zhao
Copy link
Author

Yes, maybe there are some small problems. I try to print the type of the model in the training/model.py:

def encode(self, features):
    print(type(self.model))

and it shows: <class 'peft.peft_model.PeftModel'>

Maybe Lora influence the model type? I am not clear about it. Do you train the model in a full fine-tuning manner?

@zillion-zhao
Copy link
Author

When I remove --lora, it shows CUDA out of memory^ ^. Maybe it is really due to the Lora. Maybe I could use more GPUs, but why the Lora influence the model type?

@Muennighoff
Copy link
Collaborator

I see, yes it could be because of Lora. I think that the Peft library wraps the transformer model and this could change the kwargs that are passed through. You may need to change something in https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/model.py to pass it through.

We do full fine-tuning; I haven't really tried Lora with GRIT.

@zillion-zhao
Copy link
Author

I see. Thank you for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants