Fine tuning example - from OK-VQA dataset

On Uxix:

a. Install Packages

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

b. Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

chmod +x *.sh

./install-lfs__sr.sh

pip install datasets

python prep_data__OK-VQA__sr.py

./download_llava_weights__sr.sh

You need to check the values for xxx to match your hardware.

cat ./train_qlora__wandb.sh

./train_qlora__wandb.sh

monitor the progress - if quality is high enough, you can stop the training early

if you run out of GPU memory, then adjust the script to offload more work to CPU

./infer_example.sh

To infer with a given prompt and image:

./infer_qlora_v1.5__wandb.sh <path to image> "my prompt"

To infer WITHOUT the lora layer (to see the behaviour BEFORE fine-tuning):

./infer_example__no_lora.sh

Provide feedback