Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying PaliGemma VLMs #123

Open
kanishkamisra opened this issue Nov 21, 2024 · 2 comments
Open

Querying PaliGemma VLMs #123

kanishkamisra opened this issue Nov 21, 2024 · 2 comments
Assignees
Labels

Comments

@kanishkamisra
Copy link

My collaborators and I are trying to use your very useful containers to deploy and use Google's PaliGemma models on GCS/Vertex. I was wondering what is the best way to query the model with images, especially if the images are stored locally? I see that there is an example showing this for Llama Vision but it seems like you have to pass in the images as urls which may not be feasible for us..

We're getting some success by doing something like this, but unsure if that's the right way:

image_path = "/PATH/rabbit.png"

with open(image_path, "rb") as f:
    image = base64.b64encode(f.read()).decode("utf-8")

image = f"data:image/png;base64,{image}"

output = deployed_model.predict(
    instances=[
        {
            "inputs":f"![]({image})What is the animal wearing?",
            "parameters":{"max_new_tokens": 100, "do_sample": False}
        }
    ]
)
#> space suit

Please let me know if you need more details! Any assistance would be much appreciated!

@alvarobartt
Copy link
Member

Hi here @kanishkamisra, sorry I got to reply this just now!

But you are right, that's how it's supposed to be done programmatically if you don't have the URL of the image but just the images themselves, see the Text Generation Inference Documentation on Visual Language Models.

Here's the full example as shown in the documentation linked above:

import base64
from huggingface_hub import InferenceClient

client = InferenceClient("http://127.0.0.1:8080")

image_path = "rabbit.png"
with open(image_path, "rb") as f:
    image = base64.b64encode(f.read()).decode("utf-8")

image = f"data:image/png;base64,{image}"
prompt = f"![]({image})What is this a picture of?\n\n"

for token in client.text_generation(prompt, max_new_tokens=10, stream=True):
    print(token)

I'll try to create an example within this repository too in order to have a working example with the different alternatives!

@alvarobartt alvarobartt self-assigned this Nov 28, 2024
@alvarobartt
Copy link
Member

P.S. just realised that you are missing the two ending line-breaks i.e. \n, and Paligemma is known to be quite sensitive to the prompt formatting, so your code should look like the following instead:

image_path = "/PATH/rabbit.png"

with open(image_path, "rb") as f:
    image = base64.b64encode(f.read()).decode("utf-8")

image = f"data:image/png;base64,{image}"

output = deployed_model.predict(
    instances=[
        {
            "inputs":f"![]({image})What is the animal wearing?\n\n",
            "parameters":{"max_new_tokens": 100, "do_sample": False}
        }
    ]
)
#> space suit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants