We use the ActionFormer detection pipeline as our baseline method and replace its I3D feature with the feature extracted by VideoMAE V2-g.
Dataset | Backbone | Head | mAP | Features |
---|---|---|---|---|
THUMOS14 | VideoMAE V2-g | ActionFormer | 69.6 | th14_mae_g_16_4.tar.gz |
FineAction | VideoMAE V2-g | ActionFormer | 18.2 | fineaction_mae_g.tar.gz |
Use extract_tad_feature.py
to extract the feature of datasets. For example, to extract the feature of THUMOS14, running the following command:
python extract_tad_feature.py \
--data_set THUMOS14 \
--data_path YOUR_PATH/thumos14_videos \
--save_path YOUR_PATH/th14_vit_g_16_4 \
--model vit_giant_patch14_224 \
--ckpt_path YOUR_PATH/vit_g_hyrbid_pt_1200e_k710_ft.pth