This repository contains the code and resources for the 2-day NVIDIA TensorRT Deployment Workshop. The workshop focuses on deploying deep learning models efficiently using NVIDIA TensorRT, a high-performance deep learning inference library.
During this hands-on workshop, participants will:
- Learn how to optimize deep learning models for inference.
- Understand the benefits of using TensorRT for deployment in real-world applications.
- Explore various model formats (ONNX, PyTorch, TensorFlow) and their conversion to TensorRT.
- Work with practical examples involving neural networks and their deployment on NVIDIA GPUs.
- Introduction to Huggingface Framework : In order to use models on the fly using various libraries and adapting to our use case.
- Introduction to NVIDIA TensorRT: Understanding its components and optimization strategies.
- Model Conversion: Converting trained models into TensorRT.
- Inference Optimization: Techniques for improving inference speed and efficiency.
- Hands-on Labs: Implementing real-world examples, including object detection and classification tasks.
- Performance Benchmarks: Measuring speed and accuracy across different hardware setups.
This Repository also contains the Demo of NIM NVIDIA RAG-based Chatbot utilizing LLAMA 3.1B model, and Vector Store DB (Pickled for upto 10x efficiency), hosted on Streamlit
Before executing NIM_NVIDIA_CHATBOT.py
, ensure you have the required dependencies installed. Run the following command:
pip install -r requirements.txt
- Basic knowledge of deep learning models (PyTorch or TensorFlow).