Integrate image understanding pipeline #192

dolfim-ibm · 2024-11-01T12:02:39Z

As part of the enrichment pipeline we want to leverage multi-modal vision models for the analysis of images in documents.

For example:

Charts
UML diagrams
and more

An initial prototype is in #25, which will be re-implemented on top of the stronger v2 pipelines.

Runtime

The system will support:

Prompting a model served as API, e.g. using the openai vision api
Launching a local model, e.g. using vllm

dolfim-ibm added the enhancement New feature or request label Nov 1, 2024

dolfim-ibm self-assigned this Nov 6, 2024

This was referenced Nov 6, 2024

How can I annotate/caption the image and display it when exporting it to markdown or text file? #256

Closed

feat: picture description models #259

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate image understanding pipeline #192

Integrate image understanding pipeline #192

dolfim-ibm commented Nov 1, 2024

Integrate image understanding pipeline #192

Integrate image understanding pipeline #192

Comments

dolfim-ibm commented Nov 1, 2024

Runtime