-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU version support? #14
Comments
I think you should be able to do The model itself is pretty simple, so you should be able to load the pre-trained weights without the last layer. But that requires some manual work with forking this repo. |
vggish strictly extracts features every 0.96 seconds, but my image features extract features every 1s. Do you have a good way to align features and look forward to your suggestions? |
You should be able to just crop the 1 second audio to .96 seconds |
I'm sorry. I may have described the problem. For example, my video is half an hour. I select one frame of image every second to extract the image features after rensnet18, and the audio features are vggish. But I found that the dimension of the image is [30 * 60,512], but the audio feature test [30 * 60 / 0.96,128]. I want to align features in the time dimension. What should I do? |
I found that 4 seconds video does not have this problem. because [4,512] == [4/0.96,128].Any suggestion is welcome,thx very much. |
@leemengxing this repo is just for the port of vggish to pytorch. I suggest you ask this question on https://groups.google.com/forum/#!forum/audioset-users - you're more likely to have a useful response from those guys 😄 I'm not really sure how to help with that particular problem other than to crop the audio per second to 0.96 like @stevenguh suggested. As the GPU support has been resolved upthread, I'm closing this issue now. Thanks. |
I know this is closed, but when I try to send the model to cuda using def forward(self, x, fs=None):
if self.preprocess:
x = self._preprocess(x, fs)
# start added code
if next(self.parameters()).is_cuda:
x = x.cuda()
# end added code
x = VGG.forward(self, x)
if self.postprocess:
x = self._postprocess(x)
return x It's not the most elegant solution, but I am just checking if the model weights are cuda and if so changing the data to such. From my tests so far it seems to work, but please let me know if there is something wrong with this. |
@botkevin nothing wrong with that if it works! However I have realised that the offending line is: torchvggish/torchvggish/vggish.py Line 148 in e1e2273
|
Sending the model to GPU works fine but PyTorch will complain That said, the speedup is not dramatic because most of the time is spent in pre-processing. For a 2 second audio clip that I tested on CPU, 70 milliseconds were spent on pre-processing the audio file into an array of spectrogram patches, and 20 milliseconds were spent on inference itself. |
Hi, You guys can check my configuration based on v0.1 at https://github.com/nhattruongpham/torchvggish-gpu That worked for me because I had converted the PCA params tensor to cuda. Good luck! |
Thank you for your work, I would like to ask if you can add GPU support options in torch.hub?Another question is whether the obtained embedding_size must be a fixed value of 128, is there a way to convert to 2048 dimensions?
The text was updated successfully, but these errors were encountered: