Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions #2

Open
zer0n opened this issue Jan 24, 2016 · 1 comment
Open

Questions #2

zer0n opened this issue Jan 24, 2016 · 1 comment

Comments

@zer0n
Copy link

zer0n commented Jan 24, 2016

Here: if I understand your code correctly, you use the FC7 layer output of a pretrained VGG net as input to your model. However, your model has another trainable layer to compute the embedding from FC7. Is that correct? Can't you just use FC7 as the embedding layer?

@jazzsaxmafia
Copy link
Owner

I am not sure what the authors were thinking, but from my view, there could
be two reasons.

  1. In Show and Tell, you can consider that the image is like the first word
    of a sentence. So you might want to embed the image feature vector once
    more so that it is embedded into semantic(text) space.
  2. The dimension of FC7 (4,096) is too large. Since the image and semantic
    vector need to have same dimension, you should make the semantic vector
    4,096D as well, which leads to too many weight parameters to train.

-Taeksoo

2016-01-24 20:17 GMT+09:00 Kenneth Tran [email protected]:

Here
https://github.com/jazzsaxmafia/show_and_tell.tensorflow/blob/master/model.py#L55:
if I understand your code correctly, you use the FC7 layer output of a
pretrained VGG net as input to your model. However, your model has another
trainable layer to compute the embedding from FC7. Is that correct? Can't
you just use FC7 as the embedding layer?


Reply to this email directly or view it on GitHub
#2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants