Skip to content

Latest commit

 

History

History
37 lines (27 loc) · 1.53 KB

README.md

File metadata and controls

37 lines (27 loc) · 1.53 KB

arXiv

If you like our project, please give us a star ⭐ on GitHub for the latest update.

TL;DR

We introduce the Egocentric Video Understanding Dataset (EVUD), an instruction-tuning dataset for training VLMs on video captioning and question answering tasks specific to egocentric videos.

News

  • The AlanaVLM paper is now on arXiv! arXiv
  • All the checkpoints developed for this project are available on Hugging Face
  • The EVUD dataset is available on Hugging Face

Prerequisites

Create and activate virtual environment:

python -m venv env
source venv/bin/activate
pip install -r requirements.txt

Data generation

Together with our generated data released on HuggingFace, we are also releasing all the scripts to reproduce our data generation pipeline:

The generated data follows the LLaVa JSON format.