Egocentric Video Understanding Dataset (EVUD)

If you like our project, please give us a star ⭐ on GitHub for the latest update.

TL;DR

We introduce the Egocentric Video Understanding Dataset (EVUD), an instruction-tuning dataset for training VLMs on video captioning and question answering tasks specific to egocentric videos.

News

The AlanaVLM paper is now on arXiv!
All the checkpoints developed for this project are available on Hugging Face
The EVUD dataset is available on Hugging Face

Prerequisites

Create and activate virtual environment:

python -m venv env
source venv/bin/activate
pip install -r requirements.txt

Data generation

Together with our generated data released on HuggingFace, we are also releasing all the scripts to reproduce our data generation pipeline:

Ego4D VQA
Ego4D VQA Gemini
EgoClip
VSR
HM3D

The generated data follows the LLaVa JSON format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Egocentric Video Understanding Dataset (EVUD)

If you like our project, please give us a star ⭐ on GitHub for the latest update.

TL;DR

News

Prerequisites

Data generation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Egocentric Video Understanding Dataset (EVUD)

If you like our project, please give us a star ⭐ on GitHub for the latest update.

TL;DR

News

Prerequisites

Data generation