-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix jit model #566
fix jit model #566
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thaks for the fix!
optimum/intel/ipex/modeling_base.py
Outdated
self.model_dtype = kwargs.get("model_dtype", self.dtype) | ||
self._dtype = self.model_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should deprecate the model_dtype
attribute (redundant with _dtype
), adding a warning that it will be removed after v1.18.0
@property
def model_dtype(self):
# add a warning
return self._dtype
optimum/intel/ipex/modeling_base.py
Outdated
self._reorder_cache = self.model_cls._reorder_cache.__get__(self) | ||
self.prepare_inputs_for_generation = self.model_cls.prepare_inputs_for_generation.__get__(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep it as _reorder_cache
and prepare_inputs_for_generation
can be different depending on the modeling
optimum/intel/ipex/modeling_base.py
Outdated
if hasattr(self.model_cls, "_convert_to_standard_cache"): | ||
self._convert_to_standard_cache = self.model_cls._convert_to_standard_cache | ||
if hasattr(self.model_cls, "_convert_to_bloom_cache"): | ||
self._convert_to_bloom_cache = self.model_cls._convert_to_bloom_cache | ||
if warmup: | ||
self._init_warmup() | ||
|
||
def prepare_inputs_for_generation(self, input_ids, past_key_values=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove
optimum/intel/ipex/modeling_base.py
Outdated
@@ -215,14 +216,6 @@ def to(self, device: Union[torch.device, str]): | |||
def can_generate(self): | |||
return isinstance(self, GenerationMixin) | |||
|
|||
def _call_model(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also might make sense to add a test to verify this doesn't break support for model traced with with autocasting enabled cc @ofirzaf do you know if there any tiny model on thus hub we can use for this (https://huggingface.co/Intel/q8_tiny_starcoder_py/ ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use this one, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually should have a test for this to verify no modifications will break support, would you mind adding it @jiqing-feng in https://github.com/huggingface/optimum-intel/blob/v1.15.2/tests/ipex/test_modeling.py#L201 ? Also given @ofirzaf above explanations removing _call_model
will likely broke support for q8_tiny_starcoder_py
What this PR is trying to fix? The autocasting was added here because autocasting with a traced model is not optional, either the model runs with autocasting or the model runs without, it is an attribute of the model and not something the user can decide upon. Regarding |
Hi @echarlaix @ofirzaf . What I did is based on the latest version of transformers and try to enable llama jit model.
Here is my script and traceback: import torch
from transformers import AutoTokenizer, pipeline
from optimum.intel import IPEXModelForCausalLM
model_id = "meta-llama/Llama-2-7b-chat-hf"
model = IPEXModelForCausalLM.from_pretrained(model_id, export=True, torch_dtype=torch.bfloat16)
generation_kwargs = {"do_sample": False, "max_new_tokens": 32, "num_beams": 4}
tokenizer = AutoTokenizer.from_pretrained(model_id)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(text_generator("I am happy today because ", **generation_kwargs)) traceback
|
@jiqing-feng I think the problem might be in the tracing process. the |
Hi @echarlaix @ofirzaf Thanks for your review.
Hope you understand, thx! |
Hi @echarlaix @ofirzaf . Thanks for your fix by adding I have some minor changes to support low-precision(bf16) model, would you please help me to review it? Thx! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for iterating on it @jiqing-feng !
optimum/intel/ipex/modeling_base.py
Outdated
position_ids = position_ids[:, -1].unsqueeze(-1) | ||
position_ids = position_ids[:, -input_ids.shape[-1] :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this modification needed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
position_ids
should always have the same size as input_ids
, we cannot assume the length is 1
while pkv exists (for example, assisted decoding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a test that would fail without this fix / pass with it then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be taken care by prepare_inputs_for_generation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also an option, WDYT @echarlaix ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's already integrated in prepare_inputs_for_generation
for each modeling
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1237, could make sense to remove it all actually
optimum/intel/ipex/modeling_base.py
Outdated
@@ -215,14 +216,6 @@ def to(self, device: Union[torch.device, str]): | |||
def can_generate(self): | |||
return isinstance(self, GenerationMixin) | |||
|
|||
def _call_model(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually should have a test for this to verify no modifications will break support, would you mind adding it @jiqing-feng in https://github.com/huggingface/optimum-intel/blob/v1.15.2/tests/ipex/test_modeling.py#L201 ? Also given @ofirzaf above explanations removing _call_model
will likely broke support for q8_tiny_starcoder_py
Hi @echarlaix refer to this comment, do you mind making it clear what kind of tests should I add? Thx! with torch.autocast(device.type, data_type):
model = IPEXModelForCausalLM.from_pretrained(model_id, export=True, torch_dtype=torch.bfloat16)
model(**inputs) Adding |
Hi @echarlaix , I have reverted "_call_model", would you please help to review and merge it? Thx! |
Hi @echarlaix . Do you mind having a look at this PR? I think it is ready to merge. |
Hi @ofirzaf , Do you mind confirming whether it could be merged? Thx! |
LGTM |
Hi @echarlaix . I have added the testing for assisted decoding which will input both |
Hi @echarlaix @ofirzaf . You were right, we don't need to prepare postion_ids in the forward because |
Hi @echarlaix . This PR fix the jit model:
autocast
in the class as users can use it outside, and we shouldn't use it if the model type isbf16
orfp16
Would you please help me to review it? Thx!