Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I spent last 2 hour finally get video retrieval working.... here is the mistake I made! (in case anyone else make the same mistake as me -_-) #181

Open
zmy1116 opened this issue Sep 16, 2024 · 1 comment

Comments

@zmy1116
Copy link

zmy1116 commented Sep 16, 2024

Hello,

First, I want to thank the authors for this work. I think I see your poster at ICLR in May. Back then I did not work on anything related to video retrieval. oh well.

So yeah I had some troubles get the jupyter notebook in demo folder working because there are few errors here and there.
I then made some adjustments to make the code run but I notice that the outputs are gibberish. In particular, the text embedding seems to be wrong. After some investigation, I found my error:

I firstly got error when I run this line
tokenizer = BertTokenizer.from_pretrained(config.model.text_encoder.pretrained, local_files_only=True)
I assume it's becaues I don't have the right file, so I remove local_files_only=True . Then I got the error about vocab, caused by me having a higher version for transformer.

I then observe that by changing from models.backbones.bert.tokenization_bert import BertTokenizer to from transformers import BertTokenizer. So I changed it. And apparently this causes the generated text embedding to be incorrect.

downgrade transformers to 4.28.1 fix the error.

Thanks

@qingy1337
Copy link
Contributor

I also noticed a lot of people running into the relative imports error, you can fix it by running this code from InternVideo/InternVideo2/multi_modality/:

import sys
import os

sys.path.append(os.getcwd())
import numpy as np
import os
import io
import cv2

import torch

from demo.config import (Config,
                    eval_dict_leaf)

from demo.utils import (retrieve_text,
                  _frame_from_video,
                  setup_internvideo2)

You will still need to edit some imports from demo/utils.py and demo/config.py but other than that, it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants