Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the momentum encoder #125

Open
muaz1994 opened this issue Apr 10, 2022 · 1 comment
Open

question about the momentum encoder #125

muaz1994 opened this issue Apr 10, 2022 · 1 comment

Comments

@muaz1994
Copy link

muaz1994 commented Apr 10, 2022

Hi. Thanks for your work. This is a question related to the paper, not the code. It may be a stupid question but I would love to hear an explanation from you.

Your main reason for using a momentum encoder is to achieve consistency, in the sense that if you don't update the key encoder with a slow moving average of the query encoder and simply use a copy of the query encoder, the features of previous mini-batches you stored in the queue become inconsistent due to the rapidly changing query encoder when you update it. So the momentum encoder basically makes the inconsistency among all these different mini-batch features small.

My question is why can't you simply make the stored keys as the input image views (before running them through the key encoder) rather than the features of these image views, and then for every update (when you need the negative samples) you just run these stored image views (negative samples) through a copy of the same query encoder? All of the output features you get would be consistent as well.

@gergopool
Copy link

Hi!

I am not part of the Facebook AI team, but maybe I can answer your question.
So you suggest to use one network only and run that on a large batch of images? I think you are describing SimCLR, but correct me if I'm wrong. The main reason behind using FIFO queue is that storing 65k embeddings is very cheap. In order to get a new batch of examples, you only need to run a frozen network without calculating the gradients, you run it once and the results require a very small amount of memory (2048x4 bytes per sample). On the other hand, storing images require a lot of memory. Even if you generate them - just like SimCLR does - it requires a very large batch size and therefore a lot of VRAM and computation to achieve an adequate number of negative samples. So to sum up, using an embedding queue is computationally efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants