Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt Image Alignment Experiment #6

Open
yigu1008 opened this issue Oct 20, 2023 · 4 comments
Open

Prompt Image Alignment Experiment #6

yigu1008 opened this issue Oct 20, 2023 · 4 comments

Comments

@yigu1008
Copy link

Hi Kevin, when I'm trying to reproduce the Prompt Alignment Experiment, I downloaded the llava_server codebase using weights from "liuhaotian/llava-v1.5-7b" first, when I run

gunicorn "app:create_app()

I got KeyError: 'llava'. when loading weights

To handle this, I cloned the latest llava from https://github.com/haotian-liu/LLaVA and modified the llava_server/llava.py:

from typing import Iterable, List
from transformers import AutoTokenizer, AutoConfig, LlamaConfig
import torch
import numpy as np
from llava.utils import disable_torch_init
from transformers import CLIPImageProcessor
from PIL import Image
from llava.conversation import simple_conv_multimodal
from llava.model.language_model.llava_llama import LlavaLlamaForCausalLM


DEFAULT_IMAGE_TOKEN = "<image>"
DEFAULT_IMAGE_PATCH_TOKEN = "<im_patch>"
DEFAULT_IM_START_TOKEN = "<im_start>"
DEFAULT_IM_END_TOKEN = "<im_end>"

MAX_TOKENS = 64

PROMPT = simple_conv_multimodal.get_prompt() + "Human: "

def load_llava(params_path):
    # load model
    params_path = "liuhaotian/llava-v1.5-7b"
    disable_torch_init()
    tokenizer = AutoTokenizer.from_pretrained(params_path)
    class LlavaConfig(LlamaConfig):
        model_type = "llava"

    AutoConfig.register("llava", LlavaConfig)

    model = LlavaLlamaForCausalLM.from_pretrained(
        params_path, torch_dtype=torch.float16
    ).cuda()

Now I'm testing this code on the machine with 3 A100 GPUs, it could load the weights and setup servers with app.py

However, when I use 2 GPUs for llava inference and run train.py on the other, I got:
"images = images.to("cuda", dtype=torch.float16)
RuntimeError: CUDA error: device-side assert triggered"

I also checked the nvidia smi that my processes were indeed on three GPUs separately. May I know if you have could help me with this? Thank you!

@alnaeini
Copy link

This approach throw an error:
from llava.conversation import simple_conv_multimodal

which if check on their repo, it does not exists. I wonder how you work around that?

@jeeyung
Copy link

jeeyung commented May 6, 2024

I got the same "device-side assert triggered"

@stanleyshen2003
Copy link

For the "from llava.conversation import simple_conv_multimodal" error, you can simply use

PROMPT = """You are LLaVA, a large language and vision assistant trained by UW Madison WAIV Lab.You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.Follow the instructions carefully and explain your answers in detail.###Human: Hi!###Assistant: Hi there!  How can I help you today?
###Human:"""

instead of importing simple_conv_multimodal.

@Lil-Shake
Copy link

Regarding “CUDA error: device-side assert triggered”, this happened when torch.embedding() tried to convert tokens into embeddings. Because LLaVA-server/llava_server/llava.py added special tokens to tokenizer, but it doesn't enlarge the embedding matrix of the model, which leads to this issue. We can resize the embedding matrix after adding special tokens. Add the following code in LLaVA-server/llava_server/llava.py line 38 may solve it.
model.resize_token_embeddings(len(tokenizer))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants