From ad7ffb5e324f5bb6b8b62d4016b0b4043f8cc53e Mon Sep 17 00:00:00 2001 From: Kyryl Truskovskyi Date: Mon, 2 Dec 2024 17:35:02 -0500 Subject: [PATCH] Update README.en.md --- ai-search-demo/README.en.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ai-search-demo/README.en.md b/ai-search-demo/README.en.md index 720892a..9cfecb0 100644 --- a/ai-search-demo/README.en.md +++ b/ai-search-demo/README.en.md @@ -6,7 +6,7 @@ This is a small demo showing how to build AI search on top of visual data (PDFs, ## Why -The classic way to handle visual documents (PDFs, forms, images, etc.) is to use OCR, Layout Detection, Table Recognition, etc. See [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) [Tesseract](https://github.com/tesseract-ocr/tesseract) or [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) for example. However, we are going to split PDFs by page and embed each as an image to avoid complexity. The main models we are going to use are [Qwen2-VL](https://arxiv.org/abs/2409.12191) for visual understanding and ColPali. +The classic way to handle visual documents (PDFs, forms, images, etc.) is to use OCR, Layout Detection, Table Recognition, etc. See [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) [Tesseract](https://github.com/tesseract-ocr/tesseract) or [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) for example. However, we are going to split PDFs by page and embed each as an image to avoid complexity. The main models we are going to use are [Qwen2-VL](https://arxiv.org/abs/2409.12191) for visual understanding and [ColPali](https://github.com/illuin-tech/colpali). ## Evaluation @@ -162,4 +162,4 @@ Deploy models ``` modal deploy llm-inference/llm_serving.py modal deploy llm-inference/llm_serving_colpali.py -``` \ No newline at end of file +```