GPU Accelerated Vector Database Querying

This project demonstrates efficient vector database querying using GPU acceleration, applied to a large-scale dataset of Wikipedia articles. The process involves embedding generation, clustering, and approximate nearest neighbor search (GPU accelerated) to retrieve relevant articles based on input queries.

Dataset

We utilize the Wikipedia dataset containing plain text from November 2020. You can download the dataset from Kaggle:

Plain Text Wikipedia 202011 Dataset

Workflow (If you have access to CUDA5, skip to compilation and execution)

(The below preprocessing is done and saved in CUDA5 /scratch/pvg2018/ if you don't have access, email [email protected] or [email protected] or [email protected])

Embedding Generation: Use embedding.py to generate vector embeddings for the Wikipedia articles.
Clustering: Apply K-means clustering with cluster.py to group the embeddings into 128 clusters. Save the clustered data in a designated folder.
Data Conversion: Convert the .npy files to .bin format using convert_npy_bin.py to ensure compatibility with C++.
Query Processing: Use pregenerated embeddings in the queries_data folder or create custom ones (See the Queries subsection for more info)
Approximate Nearest Neighbor Search: Compile and execute IVF.cpp to find the closest matching article to the query.

Configurable Parameters

The following arguments need to be passed to the executable:

n_probe: Value from 1 to 128 which denotes how many top clusters can be chosen in the coarse search to do the fine search in
Which kernel mode: This defines which cuda kernel will run. It can either be "Atomic", or "NonAtomic". These are the 2 different types of kernels we use to compute the coarse and fine search
Sequential Search: This can be true or false. "true" stands for sequential search and "false" for non sequential search. This defines if each cluster is handled by a separate kernel or all the clusters are combined into one and a single kernel handles them all.
Use CUDA coarse: This can be true or false. This stands for using the CPU or GPU for the coarse search part (find the top n_probe cluster centroids).
Use CUDA fine: This can be true or false. This stands for using the CPU or GPU for the fine search part (find the top k closest elements in the top n_probe clusters).
ThreadsPerBlock: Can be any multiple of 32. Denotes the threads per block

Input Queries

Using Pre-uploaded Queries

We have pre-generated embeddings for the following queries, saved as .bin files in the queries_data folder:

"What is learning rate in gradient descent?" (query1.bin)
"What is Microbial biogeography?" (query2.bin)
"Give me details about The Arch of Cabanes." (query3.bin)
"Give me details about the history of the Taj Mahal." (query4.bin)
"Tell me something about the labelling used on aid packages created and sent under the Marshall Plan." (query5.bin)

You can use these queries by updating the query number in the code where the query file is specified.

Generating a New Query

If you want to process a new question:

Run the test.py script with the following command:

python test.py --query "Your new question here"

This will generate a .bin file for your query and save it in the queries_data folder.
Update the query number in the code to point to the new query file.

Example for Query Processing in IVF.cpp

In your IVF.cpp code, make sure the path to the query file points to the desired query (e.g., query1.bin for the first pre-uploaded query or your new query file):

std::string query_path = "queries_data/query1.bin"; // Update the file as needed

Compilation and Execution

To compile and run the program on a CUDA-enabled machine:

ssh to cuda5.cims.nyu.edu
git clone https://github.com/PranavGrandhi/GPU_Accelerated_Vector_Indexing.git
cd GPU_Accelerated_Vector_Indexing

module load cuda-12.4
nvcc IVF.cpp cosine_similarity.cu -o IVF
./IVF --n_probe=30 --mode=Atomic --sequential_fine_search=false --use_cuda_coarse=true --use_cuda_fine=true --threadsperBlock=128 --print_results=true

Update flag values as needed

Upon execution, the program will output the article most relevant to the input query.

Output

The program will display the title and content of the Wikipedia article that best matches the query "What is learning rate in gradient descent?", or any custom query

Experiment Analysis

Analysis

We can use the run_multiple_configs.sh file to run specific configurations multiple times and calculate the average running CPU/GPU time for each configuration.

To execute an experiment:

Prepare the Configuration File: Each experiment has a corresponding .txt file listing all parameter configurations (e.g., experiment1_config.txt).
Run the Script: Use the following command:
```
./run_multiple_configs.sh <configuration_file> [<num_runs>]
```
- Replace <configuration_file> with the name of the experiment configuration file (e.g., experiment1_config.txt).
- Replace <num_runs> with the number of runs for averaging (default is 5).
Output: The script will display the configuration details, individual run times, and the average time for each configuration.

Example Command

./run_multiple_configs.sh experiment1_config.txt 10

For more details, refer to the source code and comments within each script.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
python_indexing_scripts		python_indexing_scripts
queries_data		queries_data
IVF.cpp		IVF.cpp
README.md		README.md
check_cos_sim.cpp		check_cos_sim.cpp
cosine_similarity.cu		cosine_similarity.cu
experiment1_config.txt		experiment1_config.txt
experiment2_config.txt		experiment2_config.txt
experiment3_config.txt		experiment3_config.txt
experiment4_config.txt		experiment4_config.txt
experiment5_config.txt		experiment5_config.txt
experiment6_config.txt		experiment6_config.txt
experiment7_config.txt		experiment7_config.txt
experiment8_config.txt		experiment8_config.txt
json.hpp		json.hpp
run_multiple_configs.sh		run_multiple_configs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Accelerated Vector Database Querying

Dataset

Workflow (If you have access to CUDA5, skip to compilation and execution)

(The below preprocessing is done and saved in CUDA5 /scratch/pvg2018/ if you don't have access, email [email protected] or [email protected] or [email protected])

Configurable Parameters

Input Queries

Using Pre-uploaded Queries

Generating a New Query

Example for Query Processing in IVF.cpp

Compilation and Execution

Output

Experiment Analysis

Analysis

Example Command

About

Releases

Packages

Contributors 3

Languages

PranavGrandhi/GPU_Accelerated_Vector_Indexing

Folders and files

Latest commit

History

Repository files navigation

GPU Accelerated Vector Database Querying

Dataset

Workflow (If you have access to CUDA5, skip to compilation and execution)

(The below preprocessing is done and saved in CUDA5 /scratch/pvg2018/ if you don't have access, email [email protected] or [email protected] or [email protected])

Configurable Parameters

Input Queries

Using Pre-uploaded Queries

Generating a New Query

Example for Query Processing in IVF.cpp

Compilation and Execution

Output

Experiment Analysis

Analysis

Example Command

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages