-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing device mismatch for InternVL2_5-78B rotary embeddings #35312
Conversation
Fixing problem with Multi-GPU management of InternVL2_5-78B (https://huggingface.co/OpenGVLab/InternVL2_5-78B)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MorenoLaQuatra, thanks for the fix! Please run make fix-copies
to update the code for other models.
I think that Here some context: python utils/check_copies.py --fix_and_overwrite
Detected changes, rewriting src/transformers/models/qwen2/modeling_qwen2.py.
python utils/check_modular_conversion.py --fix_and_overwrite
No differences found for src/transformers/models/gemma/configuration_gemma.py.
No differences found for src/transformers/models/gemma/tokenization_gemma.py.
No differences found for src/transformers/models/gemma/modeling_gemma.py. I'm not sure the same problem happen when dividing llama 3.2 other kind of models on multiple GPUs and I want to avoid breaking other models. Someone is able to check maybe? |
You are right, so you have to make changes to the origin. I suppose it should be safe for other models as well. We can also run slow tests in CI to make sure it hasn't broken. To make this, please, push an empty commit with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! would you mind providing a reproducer, the exact same function is used for llama, for which we did not have issues! 🤗
Similar issues/PRs
It seems like some issue exists, reproducing example would be great |
I reproduced the issue with "minimal" code: import torch
import torchvision.transforms as T
from PIL import Image
import requests
from io import BytesIO
from transformers import AutoModel, AutoTokenizer
from torchvision.transforms.functional import InterpolationMode
import math
from typing import Dict, List, Tuple
class ImageProcessor:
def __init__(self):
self.input_size = 448
self.max_num_patches = 12
self.min_num_patches = 1
self.use_thumbnail = True
self.transform = self._build_transform()
def _build_transform(self):
return T.Compose([
T.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
T.Resize((self.input_size, self.input_size), interpolation=InterpolationMode.BICUBIC),
T.ToTensor(),
T.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
def _find_closest_aspect_ratio(
self,
aspect_ratio: float,
width: int,
height: int,
image_size: int
) -> Tuple[int, int]:
target_ratios = set(
(i, j)
for n in range(self.min_num_patches, self.max_num_patches + 1)
for i in range(1, n + 1)
for j in range(1, n + 1)
if i * j <= self.max_num_patches and i * j >= self.min_num_patches
)
target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])
best_ratio_diff = float('inf')
best_ratio = (1, 1)
area = width * height
for ratio in target_ratios:
target_aspect_ratio = ratio[0] / ratio[1]
ratio_diff = abs(aspect_ratio - target_aspect_ratio)
if ratio_diff < best_ratio_diff:
best_ratio_diff = ratio_diff
best_ratio = ratio
elif ratio_diff == best_ratio_diff:
if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
best_ratio = ratio
return best_ratio
def _dynamic_preprocess(self, image: Image.Image) -> List[Image.Image]:
orig_width, orig_height = image.size
aspect_ratio = orig_width / orig_height
image_size = self.input_size
target_aspect_ratio = self._find_closest_aspect_ratio(
aspect_ratio, orig_width, orig_height, image_size
)
target_width = image_size * target_aspect_ratio[0]
target_height = image_size * target_aspect_ratio[1]
blocks = target_aspect_ratio[0] * target_aspect_ratio[1]
resized_img = image.resize((target_width, target_height))
processed_images = []
for i in range(blocks):
box = (
(i % (target_width // image_size)) * image_size,
(i // (target_width // image_size)) * image_size,
((i % (target_width // image_size)) + 1) * image_size,
((i // (target_width // image_size)) + 1) * image_size
)
split_img = resized_img.crop(box)
processed_images.append(split_img)
if self.use_thumbnail and len(processed_images) != 1:
thumbnail_img = image.resize((image_size, image_size))
processed_images.append(thumbnail_img)
return processed_images
def process_image(self, image_url: str) -> torch.Tensor:
response = requests.get(image_url)
image = Image.open(BytesIO(response.content)).convert('RGB')
processed_images = self._dynamic_preprocess(image)
pixel_values = [self.transform(img) for img in processed_images]
return torch.stack(pixel_values).to(torch.bfloat16).cuda()
class ModelManager:
def __init__(self):
self.model_name = "OpenGVLab/InternVL2_5-78B"
self.model = None
self.tokenizer = None
self.generation_config = {
'max_new_tokens': 1024,
'do_sample': True
}
def _split_model(self) -> Dict[str, int]:
device_map = {}
world_size = torch.cuda.device_count()
print(f"Found {world_size} GPUs")
num_layers = 80 # InternVL2_5-78B has 80 layers
num_layers_per_gpu = math.ceil(num_layers / (world_size - 0.5))
num_layers_per_gpu = [num_layers_per_gpu] * world_size
num_layers_per_gpu[0] = math.ceil(num_layers_per_gpu[0] * 0.5)
layer_cnt = 0
for i, num_layer in enumerate(num_layers_per_gpu):
for _ in range(num_layer):
device_map[f'language_model.model.layers.{layer_cnt}'] = i
layer_cnt += 1
base_components = {
'vision_model': 0,
'mlp1': 0,
'language_model.model.tok_embeddings': 0,
'language_model.model.embed_tokens': 0,
'language_model.output': 0,
'language_model.model.norm': 0,
'language_model.lm_head': 0
}
device_map.update(base_components)
device_map[f'language_model.model.layers.{num_layers - 1}'] = 0
return device_map
def initialize_model(self):
device_map = self._split_model()
self.model = AutoModel.from_pretrained(
self.model_name,
torch_dtype=torch.bfloat16,
load_in_8bit=False,
low_cpu_mem_usage=True,
use_flash_attn=True,
trust_remote_code=True,
device_map=device_map
).eval()
self.tokenizer = AutoTokenizer.from_pretrained(
self.model_name,
trust_remote_code=True,
use_fast=False
)
self.model.generation_config.pad_token_id = self.tokenizer.eos_token_id
def generate_caption(self, pixel_values: torch.Tensor) -> str:
prompt = "Please describe this image."
return self.model.chat(
tokenizer=self.tokenizer,
pixel_values=pixel_values,
question=prompt,
generation_config=self.generation_config
)
def main():
# Initialize components
image_processor = ImageProcessor()
model_manager = ModelManager()
# Initialize model
print("Initializing model...")
model_manager.initialize_model()
# Process image
image_url = "https://images.unsplash.com/photo-1507146426996-ef05306b995a?fm=jpg&q=60&w=3000&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxzZWFyY2h8M3x8cHVwcHl8ZW58MHx8MHx8fDA%3D"
print("Processing image...")
pixel_values = image_processor.process_image(image_url)
# Generate caption
print("Generating caption...")
caption = model_manager.generate_caption(pixel_values)
print(f"\nGenerated caption: {caption}")
if __name__ == "__main__":
main() I run this on a machine with 4xA100 80GB |
Hey! Please have a look at the comment I posted here about the same issue, it should fix it! |
We decided internally not to automatically solve it in the end. I updated the snippet on the hub to avoid future issues for users though. For more details, see #35583! |
Fixing problem with Multi-GPU management of InternVL2_5-78B (https://huggingface.co/OpenGVLab/InternVL2_5-78B)
What does this PR do?
Fixes # (issue)
No specific open issue fixing. I was working on inference using the documentation provided by the official model card of InternVL2_5-78B for multiple GPUs here. I got the error of mismatching devices GPU:0 and cpu, I traced back the error to this line.
It may happen to other models, maybe to newer llama vision models (3.2) but I've no access to these models in Europe (see circleci "copies" error).
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@amyeroberts, @qubvel, @ArthurZucker (being Text+Vision, I mentioned all the related ones)