Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get better Result like README #54

Open
ghost opened this issue Jul 26, 2020 · 25 comments
Open

Cannot get better Result like README #54

ghost opened this issue Jul 26, 2020 · 25 comments

Comments

@ghost
Copy link

ghost commented Jul 26, 2020

Hi .
I really appreciate for your Implementation.

I try to execute your program,but result is too ugly.
Here is my result.

スクリーンショット 2020-07-27 1 02 39

スクリーンショット 2020-07-27 1 50 24

I tried to run train.py for 28 epoch ,thenembedeer_inference.py,and finally finetuning_training.py for 150 epoch.
I skipped some important training phase, did I?

If you have any suggestion about this result,please give me some advice.

(I'm not English speaker,so maybe this issue's grammer is wrong.I apologize for my poor English.)

Thanks,
Nekomo

@Jarvisss
Copy link

Jarvisss commented Jul 27, 2020

Did you train on the full dataset?

I suggest put more results here and give more information about yout training, this may be userful for others to help you.

I also met the problem for not able to get good result.

Model trained on full dev set for ~4 epoch with 3 GTX 2080ti, with K=8 and the result is blurry, and I haven't run inference yet.

image

@ghost
Copy link
Author

ghost commented Jul 27, 2020

@Jarvisss
Thanks for your comment.

Did you train on the full dataset?

Yes,I downloaded full dataset of VoxCeleb2 ,contains 18588 data,
I trained my model on full dataset with 1 GTX 2080ti, with K=8.

It seems that your batch size was very larger than me.( my batchsize was only 2...)
I didn't care that and it maybel the cause.
I'm going to raise batch_size more and train again.

Thank you.

Nekomo

@Jarvisss
Copy link

Jarvisss commented Jul 27, 2020

Yes,I downloaded full dataset of VoxCeleb2 ,contains 18588 data,

For me, I actually got 5994 speakers and 145,569 videos in VoxCeleb2 dev

It seems that your batch size was very larger than me.( my batchsize was only 2...)

I use batchsize=6, as I have 3 gpus each 2 batch,

The results I showed maybe misleading where 'batch' should be 'step'

@ghost
Copy link
Author

ghost commented Jul 27, 2020

@Jarvisss
Thank you for your many replies.
Your suggestion is really helpful for me.

For me, I actually got 5994 speakers and 145,569 videos in VoxCeleb2 dev
Sorry, I misunderstood about dataset size.

18588 means length of the dataLoader which is already preprocessed.

print(len(dataset))#37176
print(len(dataLoader))#18588

Actually,my dataset contains

  • 283101 *.mp4 file
  • 37181 movie folder
  • 1563 speakers

and dev_mp4.zip's md5 checksum value doesn't match to Official Dev md5.

maybe my VoxCeleb2 dataset is broken...
I also try to construct VoxCeleb2 dataset all over again.

The results I showed maybe misleading where 'batch' should be 'step'

Is 'step' means iteration during a epoch?
(I'm sorry to be so inquisitive)

And,if you do not mind , could you give me your pretrained model?
Of course, I do not use it on my study , without your permission.

Thanks,
Nekomo

@Jarvisss
Copy link

Jarvisss commented Jul 27, 2020

@nekomo

Is 'step' means iteration during a epoch?

Yes, it is

And,if you do not mind , could you give me your pretrained model?

ofcourse, how can I give you my model?

@ghost
Copy link
Author

ghost commented Jul 28, 2020

@Jarvisss

ofcourse, how can I give you my model?

Thanks a lot.
Please upload your model to cloud strage,and share model's url,like this implementation's pretrained model. This method is best for me .

Nekomo

@Jarvisss
Copy link

Jarvisss commented Aug 8, 2020

@nekomo Hi, sorry for late reply, I trained my model for another 10 epochs, and get comparable results to @vincent-thevenin,

the result can be seen here.

but when I ran the embedder_inference.py and finetuning_training.py, I also got ugly results.

and if I do not finetune the model and feed forward directly, I will get result like this:
image

There must be something wrong, I'm still debugging now.

And I wonder what your result looks like without finetuning, could you please share the result?

Jarvisss

@Jarvisss
Copy link

Jarvisss commented Aug 9, 2020

I figure it out, the network is trained on 224 * 224, but the code in embedder_inference.py and video_inference.py crops the input to 256 * 256, which in my case causes the ugly result.

and if I do not finetune the model and feed forward directly, I will get result like this:
image

And above issue was caused by my mistake, I forgot to set finetuning=False for feed forward prediction.

Here's my feed forward result:

I will try to finetune the model and comment later.

Update:
Finetune the model for 40 epochs

@mingkaihu
Copy link

Hi Jarvisss,
Thanks for sharing. Your result looks stunning. I was wondering if you could share the steps for reference?
Regards,
Mingkai

@Jarvisss
Copy link

Jarvisss commented Aug 26, 2020

@mingkaihu
Hi, mingkai,
The steps for me are as follow:

  1. run preprocess.py to get a lighter dataset
  2. run train.py
  3. run embedder_inference.py to get the embedding vector e_hat
  4. run finetuning_training.py to finetune the model, using the e_hat we've got in step 3
  5. run video_inference.py to get the result

You may first skip step 4 to see if the result is resonable. If true, then do the fine_tuning and step 5 again to see if the result is better.

Good luck,
Jarvisss

@ghost
Copy link
Author

ghost commented Aug 28, 2020

@Jarvisss
I'm sorry to reply late,and thank you for your suggestion.

And I wonder what your result looks like without finetuning, could you please share the result?

I ran embedeer_inference.py,and finetuning_training.py,so running model without finetuning was never tried.I also try that again.

I figure it out, the network is trained on 224 * 224, but the code in embedder_inference.py and video_inference.py crops the input to 256 * 256, which in my case causes the ugly result.

I got it. I 'll modify my local code too.

Thank you,
Nekomo

@mingkaihu
Copy link

@mingkaihu
Hi, mingkai,
The steps for me are as follow:

  1. run preprocess.py to get a lighter dataset
  2. run train.py
  3. run embedder_inference.py to get the embedding vector e_hat
  4. run finetuning_training.py to finetune the model, using the e_hat we've got in step 3
  5. run video_inference.py to get the result

You may first skip step 4 to see if the result is resonable. If true, then do the fine_tuning and step 5 again to see if the result is better.

Good luck,
Jarvisss

Thanks a lot for your feedback, Jarvisss.
Regards,
Mingkai

@tengshaofeng
Copy link

tengshaofeng commented Sep 9, 2020

@Jarvisss hi, I am so appreciated with your gread job. What is you newest code? it is "https://github.com/Jarvisss/Realistic-Neural-Talking-Head-Models", right? I knew you have chage emberdder_inference.py from 256 to 224:

I figure it out, the network is trained on 224 * 224, but the code in embedder_inference.py and video_inference.py crops the input to 256 * 256, which in my case causes the ugly result.

but when I read the code in "https://github.com/Jarvisss/Realistic-Neural-Talking-Head-Models", it is still 256.
So can you share your code with me? Thanks so much

@Jarvisss
Copy link

Jarvisss commented Sep 10, 2020

@tengshaofeng
Yes, you are right. https://github.com/Jarvisss/Realistic-Neural-Talking-Head-Models is my implementation with a few changes to the origin code

the network is trained on 224 * 224

The network is actually trained on 256 * 256, where the real input is 224 * 224 and zero-padding to 256 * 256.

So the network still takes 256 * 256 as input, but I changed the cropped image to 224 * 224 and padding to 256 * 256 just like in training, instead of crop images to 256 * 256 and without padding during testing

@tengshaofeng
Copy link

tengshaofeng commented Sep 16, 2020

@Jarvisss , sorry, I am confused now. Should I change it from 256 to 224 in embedder_infernce.py, finetuning_training.py and webcam_inference.py? what is the different betweent code in the master branch with yours?

@ghost
Copy link
Author

ghost commented Sep 23, 2020

@Jarvisss
Hello,
Thanks for your advice and forked branch, I reproduce the result like yours.
スクリーンショット 2020-09-23 16 00 58
Honestly I haven't understood why I got ugly result, so I will take the difference between your repo and this one and try to understand why.
It seems that other developer is still discussing,so I keep this issue open.

I really appliciate your support .

thanks,
Nenoko(Nekomo)

@Jarvisss
Copy link

Jarvisss commented Sep 23, 2020

@Jarvisss , sorry, I am confused now. Should I change it from 256 to 224 in embedder_infernce.py, finetuning_training.py and webcam_inference.py? what is the different betweent code in the master branch with yours?

@tengshaofeng Sorry for late reply, the code of my forked version (Jarvisss@da30930) was created for purpose of PR, and the code of crop was not added to that commit.

By the way, what you should do is to crop the images to (224, 224), in webcam_demo/webcam_extraction_conversion.py, in function generate_landmarks, like this:

if input.shape[0]==input.shape[1] and input.shape[0]==224:
    pass
else:
    input = crop_and_reshape_img(input, preds, pad=pad, out_shape=224)
    preds = crop_and_reshape_preds(preds, pad=pad, out_shape=224)  

to make it consistent with the training data.

yours,
jarvisss

@Jarvisss
Copy link

@Nenoko
You may have a look at another issue, some ugly result may come from very different landmark shape of driving and source
#12 (comment)

@lastapple
Copy link

@Jarvisss Hi, can you upload your trained checkpoints to google drive to share with us? Thanks!

@tengshaofeng
Copy link

tengshaofeng commented Dec 11, 2020

@Jarvisss thanks for your reply.

  1. I put a new image into examples/fine_tuning/test_images, and run the embeder_inference.py to get e_hat_images.tar.
  2. then run the finetuning_training.py given the new image and e_hat_images.tar to get finetuned_model.tar. the total epoch number is 40.
  3. finally, extract the landmark image from examples/fine_tuning/test_video.mp4 to generate fake face given e_hat_images.tar and finetuned_model.tar.

is it right?

I got the result like following:
result
and the given new image as following:
image

I do not thing the result is good. Can u give me some advice?

@Jarvisss
Copy link

@tengshaofeng

What's the problem with your result?

@tengshaofeng
Copy link

tengshaofeng commented Dec 11, 2020

@Jarvisss Can you see my shared images? Do u think there exist mistakes in my steps?

@Jarvisss
Copy link

@Jarvisss Can you see my shared images? Do u think there exist mistakes in my steps?

i see your result, but i dont understand whats the problem from the images you provide.

The steps are first embed image to code ,fine tuning, and then inference, as the author suggest in readme, can you share the landmarks for inference

@tengshaofeng
Copy link

tengshaofeng commented Dec 14, 2020

@Jarvisss Can you see my shared images? Do u think there exist mistakes in my steps?

i see your result, but i dont understand whats the problem from the images you provide.

The steps are first embed image to code ,fine tuning, and then inference, as the author suggest in readme, can you share the landmarks for inference

the landmarks is in the middle of the image. image

@ganqi91
Copy link

ganqi91 commented Jan 18, 2021

@Jarvisss hi, your result is so cool, but I want to known, if do not finetune , what the result like?
could you share some your result ?

and, could you share your pre-trained model weight ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants