This repository contains the source code for my post: Scale Up Your RAG: A Rust-Powered Indexing Pipeline with LanceDB and Candle that was published in Towards Data Science. The post explains how to build a high-performance Retrieval-Augmented Generation (RAG) indexing pipeline implemented in Rust. It also demonstrates how to efficiently read, chunk, embed, and store textual documents as vectors using HuggingFace's Candle framework and LanceDB.
- Fast document processing and chunking
- Efficient text embedding using Candle
- Vector storage with LanceDB
- Scalable design for handling large volumes of data
- Standalone application deployable in various environments
- Rust (latest stable version)
- Cargo (Rust's package manager)
Clone this repository:
git clone https://github.com/your-username/rag-indexing-pipeline-rust.git
cd rag-indexing-pipeline-rust
To run the app on test data, use the following command:
cargo run --release -- --input-directory embedding_files_test --db-uri data/vecdb1