Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up VLMPipeline #68

Closed
wants to merge 46 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
9177756
convert.py script deprecation and llm-bench README update (#916)
andrei-kochin Oct 3, 2024
b11f0d9
StaticLLMPipeline: Enable DQ (#878)
TolyaTalamanov Oct 7, 2024
41f1e7b
LoRA in Text2ImagePipeline (#911)
slyalin Oct 7, 2024
d3bb229
Clean up VLMPipeline
Wovchena Oct 7, 2024
a5a58c2
Remove error handling
Wovchena Oct 7, 2024
7441a18
Allow [NHWC] and [HWC]
Wovchena Oct 8, 2024
eaaa971
Move subtract_chat_tokenized_inputs' implementation to .cpp
Wovchena Oct 8, 2024
5eb7011
Revert test to drop mac
Wovchena Oct 8, 2024
db14fd0
WWB: Add comparison for SD models (#901)
AlexKoff88 Oct 8, 2024
ff38f90
Remove excess comma in src/cpp/CMakeLists.txt (#927)
ilya-lavrenov Oct 8, 2024
ff27cf7
Fix layout description
Wovchena Oct 8, 2024
a0b78c0
Update README.md
rkazants Oct 8, 2024
8c9a240
Update samples/cpp/text2image/README.md
rkazants Oct 8, 2024
abbc695
Fix the misprint (#928)
andrei-kochin Oct 8, 2024
93927b5
MiniCPM-V-2_6 with image input (#912)
Wovchena Oct 8, 2024
a5fb3a6
Fix misprint (#929)
Wovchena Oct 8, 2024
aa7bfd6
fix cb llm bench for gpu, allow string config
eaidova Oct 8, 2024
117d790
fix cb llm bench for gpu, allow string config (#931)
andrei-kochin Oct 8, 2024
aaf731c
fix linting issue in llm bench (#932)
eaidova Oct 8, 2024
14df316
disable md5 check assert for CB
eaidova Oct 8, 2024
09c5742
disable md5 check assert for CB (#933)
andrei-kochin Oct 8, 2024
4465727
Use older MSVC toolchain version
ilya-lavrenov Oct 8, 2024
7f9a579
Use OpenVINO runners
ilya-lavrenov Oct 8, 2024
9d1e7e3
Apply suggestions from code review
ilya-lavrenov Oct 8, 2024
2c56899
Try to fix Windows
ilya-lavrenov Oct 8, 2024
4aa01ca
Added &
ilya-lavrenov Oct 8, 2024
0afb553
Update stable_diffusion_1_5_cpp.yml
ilya-lavrenov Oct 8, 2024
4ae6b18
Update stable_diffusion_1_5_cpp.yml
ilya-lavrenov Oct 8, 2024
3e772fd
Apply suggestions from code review
ilya-lavrenov Oct 8, 2024
0b55cd0
Apply suggestions from code review
ilya-lavrenov Oct 8, 2024
22c573d
Apply suggestions from code review
ilya-lavrenov Oct 8, 2024
fcd6670
Apply suggestions from code review
ilya-lavrenov Oct 8, 2024
e7c1371
Apply suggestions from code review
ilya-lavrenov Oct 8, 2024
78bbf62
Update .github/workflows/stable_diffusion_1_5_cpp.yml
ilya-lavrenov Oct 8, 2024
581e2c1
Apply suggestions from code review
ilya-lavrenov Oct 8, 2024
d43773c
SDXL Pipeline, Euler Discrete scheduler
likholat Sep 27, 2024
0663533
text2image Readme update
likholat Oct 7, 2024
f4e90aa
num_images_per_prompt>1 for demo, unet reshape fix, num_hidden_layers…
likholat Oct 8, 2024
ea8b9fa
Readme update
likholat Oct 8, 2024
df2161d
codestyle fixes
likholat Oct 8, 2024
8db0301
Apply suggestions from code review
ilya-lavrenov Oct 8, 2024
b9eed0a
Added VLM bindings and a Python sample. (#914)
popovaan Oct 9, 2024
a1feff9
Prevent overwriting of the sampling strategy. (#937)
andreyanufr Oct 9, 2024
6d2763a
Multiple images miniCPM-V-2_6 (#919)
Wovchena Oct 9, 2024
770f7ed
Merge branch 'master' into clean-up
Wovchena Oct 9, 2024
6609a08
Remove py constructor
Wovchena Oct 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/causal_lm_cpp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -711,10 +711,10 @@ jobs:
- run: >
source ./ov/setupvars.sh
&& python ./samples/cpp/visual_language_chat/export_MiniCPM-V-2_6.py ./miniCPM-V-2_6/
- run: wget https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11
- run: wget https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11 --output-document cat.jpg
- run: >
source ./ov/setupvars.sh
&& ./build/samples/cpp/visual_language_chat/visual_language_chat ./miniCPM-V-2_6/ d5fbbd1a-d484-415c-88cb-9986625b7b11
&& ./build/samples/cpp/visual_language_chat/visual_language_chat ./miniCPM-V-2_6/ cat.jpg
<<< $'What is on the image?\nWhat is special on the image?'
timeout-minutes: 110

Expand Down
2 changes: 1 addition & 1 deletion samples/cpp/visual_language_chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ export_MiniCPM-V-2_6.py miniCPM-V-2_6

## Run

https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11 can be used as a sample image.
[This image](https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11) can be used as a sample image.

`visual_language_chat miniCPM-V-2_6 319483352-d5fbbd1a-d484-415c-88cb-9986625b7b11.jpg`

Expand Down
6 changes: 2 additions & 4 deletions samples/cpp/visual_language_chat/visual_language_chat.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,10 @@ int main(int argc, char* argv[]) try {

pipe.start_chat();
std::cout << "question:\n";
if (!std::getline(std::cin, prompt)) {
throw std::runtime_error("std::cin failed");
}
std::getline(std::cin, prompt);
pipe.generate(
prompt,
ov::genai::image(std::move(image)),
ov::genai::image(image),
ov::genai::streamer(print_subword)
);
std::cout << "\n----------\n"
Expand Down
4 changes: 3 additions & 1 deletion src/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,9 @@ file(GLOB_RECURSE SOURCE_FILES "${CMAKE_CURRENT_SOURCE_DIR}/src/*.cpp" "${CMAKE_

set(TARGET_NAME openvino_genai)
add_library(${TARGET_NAME} SHARED ${SOURCE_FILES})
add_dependencies(${TARGET_NAME} openvino_tokenizers)
if(TARGET openvino_tokenizers)
add_dependencies(${TARGET_NAME} openvino_tokenizers)
endif()
add_library(openvino::genai ALIAS ${TARGET_NAME})

target_include_directories(${TARGET_NAME}
Expand Down
8 changes: 4 additions & 4 deletions src/cpp/include/openvino/genai/vision_encoder.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

namespace ov::genai {
/// @brief A pair describing image size.
struct HeightWidth {
struct ImageSize {
/// @brief Height of a corresponding image.
size_t height;
/// @brief Width of a corresponding image.
Expand All @@ -25,16 +25,16 @@ struct EncodedImage {
ov::Tensor resized_source;
/// @brief A size of an image used to compute embeddings for
/// divided by ProcessorConfig's patch_size.
HeightWidth resized_source_size;
ImageSize resized_source_size;
/// @brief Embeddings of images obtained from a source image by
/// slicing at no more than max_slice_nums pieces and resizing.
/// The tensor's shape is
/// [slice_y, slice_x, number_of_embeddings, embedding_size].
/// slices_sizes.size() == slice_y * slice_x.
ov::Tensor slices;
/// @brief Flattened sizes of images used to compute embeddings
/// @brief A size of images used to compute embeddings
/// stored in slices member divided by ProcessorConfig's patch_size.
std::vector<HeightWidth> slices_sizes;
ImageSize slices_size;
};

/// @brief A class used to infer embeddings of an image using
Expand Down
31 changes: 4 additions & 27 deletions src/cpp/include/openvino/genai/vlm_pipeline.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,37 +65,14 @@ class OPENVINO_GENAI_EXPORTS VLMPipeline {
explicit VLMPipeline(
const std::filesystem::path& model_dir,
const std::string& device="CPU",
const ov::AnyMap device_config={},
ov::Core core=ov::Core{}
) : VLMPipeline{
model_dir,
Tokenizer(model_dir.string(), device_config),
device,
device_config,
core
} {}

/// @brief Construct a pipeline form a folder containing model IRs
/// and from a Tokenizer instance.
/// @param model_dir A folder to read model IRs.
/// @param tokenizer An instance of Tokenizer to use.
/// @param device Inference device.
/// @param device_config A config to pass to ov::Core.set_property()
/// and ov::Core::compile_model().
/// @param core ov::Core instance to use.
VLMPipeline(
const std::filesystem::path& model_dir,
const ov::genai::Tokenizer& tokenizer,
const std::string& device="CPU",
const ov::AnyMap device_config={},
ov::Core core=ov::Core{}
const ov::AnyMap device_config={}
);

/// @brief Default destructor.
~VLMPipeline();

/// @brief Generate a response given a prompt and any number of
/// uint8 RGB images.
/// uint8 RGB images with [NHWC] or [HWC] layout.
/// @param prompt A prompt to respond to.
/// @param images Images to be prepended to a prompt.
/// @param generation_config A config to follow for text generation.
Expand All @@ -120,7 +97,7 @@ class OPENVINO_GENAI_EXPORTS VLMPipeline {
/// @brief Generate a response given a prompt and arbitrary number
/// of ov::Property instances.
/// Example:
/// generate("text", image(std::move(rgb)), do_sample(true));
/// generate("text", image(rgb), do_sample(true));
/// @param prompt A prompt to respond to.
/// @param ...properties ov::Property instances to be combined into
/// ov::AnyMap.
Expand Down Expand Up @@ -166,7 +143,7 @@ class OPENVINO_GENAI_EXPORTS VLMPipeline {

/*
* utils that allow to use generate() in the following way:
* pipe.generate(prompt, ov::genai::image(std::move(image_tensor))).
* pipe.generate(prompt, ov::genai::image(image_tensor)).
*/
static constexpr ov::Property<ov::Tensor> image{"image"};
static constexpr ov::Property<std::vector<ov::Tensor>> images{"images"};
Expand Down
3 changes: 0 additions & 3 deletions src/cpp/src/clip.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
// I'll gradually clean and extend it
// Note: Even when using identical normalized image inputs (see normalize_image_u8_to_f32()) we have a significant difference in resulting embeddings compared to pytorch

#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.hpp"

#include <cassert>
#include <cmath>
#include <cstdlib>
Expand Down
4 changes: 1 addition & 3 deletions src/cpp/src/clip.hpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
// Copyright (C) 2023-2024 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#ifndef CLIP_H
#define CLIP_H
#pragma once

#include <vector>
#include <numeric>
Expand Down Expand Up @@ -53,4 +52,3 @@ bool bicubic_resize(const clip_image_u8& img, clip_image_u8& dst, int target_wid

/** preprocess img and store the result in res_imgs, pad_to_square may be overriden to false depending on model configuration */
clip_image_f32 clip_image_preprocess(struct clip_ctx& ctx, const clip_image_u8& img);
#endif // CLIP_H
Loading
Loading