Skip to content

fork of "TIM: A Time Interval Machine for Audio-Visual Action Recognition"

Notifications You must be signed in to change notification settings

arjunrs1/AudioGround

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TIM: A Time Interval Machine for Audio-Visual Action Recognition

This repository provides the code used to implement the model proposed in the paper:

Jacob Chalk*, Jaesung Huh*, Evangelos Kazakos, Andrew Zisserman, Dima Damen, TIM: A Time Interval Machine for Audio-Visual Action Recognition, CVPR, 2024

(* indicates equal contribution.)

Project Webpage

ArXiv Paper

Citing

When using this code, please reference:

@InProceedings{Chalk2024TIM,
    author    = {Chalk, Jacob and Huh, Jaesung and Kazakos, Evangelos and Zisserman, Andrew and Damen, Dima},
    title     = {{TIM}: {A} {T}ime {I}nterval {M}achine for {A}udio-{V}isual {A}ction {R}ecognition},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024}
}

Requirements

The requirements for TIM can be installed in a separate conda environment by running the following command in your terminal: conda env create -f environment.yml. You can then activate this with conda activate TIM.

NOTE: This environment only applies to the recognition and detection folders. Seperate requirements are listed for the backbones in the feature_extractors folder.

Features

The features used for this project can be extracted by following the instructions in the feature_extractors folder.

Pre-trained models

You can find links to the relevant pre-trained models in the recognition, feature_extractors and detection folders.

Ground-Truth

We provide the necessary ground-truth files for all datasets here.

The link contains a zip containing ground truth data for each dataset, consisting of:

  • The training split ground truth
  • The validation split ground truth
  • The video metadata of the dataset
  • The feature time intervals for training and valdiation splits

NOTE: These annotation files have been cleaned to be compatible with the TIM codebase.

Training and Evaluating TIM

We provide instructions on how to train and evaluate TIM for both recognition and detection in the respective folders.

License

The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.

About

fork of "TIM: A Time Interval Machine for Audio-Visual Action Recognition"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • C++ 0.5%