In this example you'll use Olive to:
- Fine-tune a LoRA adapter to classify phrases into Sad, Joy, Fear, Surprise.
- Merge the adapter weights into the base model.
- Optimize and Quantize the model into
int4
.
We'll also show you how to inference the fine-tuned model using the ONNX Runtime (ORT) Generate API.
⚠️ For Fine-tuning, you'll need to have a suitable GPU available - for example, an A10, V100, A100.
Create a new Python virtual environment (for example, using conda
):
conda create -n olive-ai python=3.11
conda activate olive-ai
Next, install the Olive and the dependencies for a fine-tuning workflow:
cd Phi-3CookBook/code/04.Finetuning/olive-ort-example
pip install olive-ai[gpu]
pip install -r requirements.txt
The Olive configuration file contains a workflow with the following passes:
Phi3 -> LoRA -> MergeAdapterWeights -> ModelBuilder
At a high-level, this workflow will:
- Fine-tune Phi3 (for 150 steps, which you can modify) using the dataset/data-classification.json data.
- Merge the LoRA adapter weights into the base model. This will give you a single model artifact in the ONNX format.
- Model Builder will optimize the model for the ONNX runtime and quantize the model into
int4
.
To execute the workflow, run:
olive run --config phrase-classification.json
When Olive has completed, you're optimized int4
fine-tuned Phi3 model is available in: code/04.Finetuning/olive-ort-example/models/lora-merge-mb/gpu-cuda_model
.
To run the app:
python app/app.py --phrase "cricket is a wonderful sport!" --model-path models/lora-merge-mb/gpu-cuda_model
This response should be a single word classification of the phrase (Sad/Joy/Fear/Surprise).