Cannot get better Result like README #54

ghost · 2020-07-26T16:56:29Z

Hi .
I really appreciate for your Implementation.

I try to execute your program,but result is too ugly.
Here is my result.

I tried to run train.py for 28 epoch ,thenembedeer_inference.py,and finally finetuning_training.py for 150 epoch.
I skipped some important training phase, did I?

If you have any suggestion about this result,please give me some advice.

(I'm not English speaker,so maybe this issue's grammer is wrong.I apologize for my poor English.)

Thanks,
Nekomo

The text was updated successfully, but these errors were encountered:

Jarvisss · 2020-07-27T02:20:01Z

Did you train on the full dataset?

I suggest put more results here and give more information about yout training, this may be userful for others to help you.

I also met the problem for not able to get good result.

Model trained on full dev set for ~4 epoch with 3 GTX 2080ti, with K=8 and the result is blurry, and I haven't run inference yet.

ghost · 2020-07-27T04:51:01Z

@Jarvisss
Thanks for your comment.

Did you train on the full dataset?

Yes,I downloaded full dataset of VoxCeleb2 ,contains 18588 data,
I trained my model on full dataset with 1 GTX 2080ti, with K=8.

It seems that your batch size was very larger than me.( my batchsize was only 2...)
I didn't care that and it maybel the cause.
I'm going to raise batch_size more and train again.

Thank you.

Nekomo

Jarvisss · 2020-07-27T04:54:05Z

Yes,I downloaded full dataset of VoxCeleb2 ,contains 18588 data,

For me, I actually got 5994 speakers and 145,569 videos in VoxCeleb2 dev

It seems that your batch size was very larger than me.( my batchsize was only 2...)

I use batchsize=6, as I have 3 gpus each 2 batch,

The results I showed maybe misleading where 'batch' should be 'step'

ghost · 2020-07-27T11:59:46Z

@Jarvisss
Thank you for your many replies.
Your suggestion is really helpful for me.

For me, I actually got 5994 speakers and 145,569 videos in VoxCeleb2 dev
Sorry, I misunderstood about dataset size.

18588 means length of the dataLoader which is already preprocessed.

print(len(dataset))#37176
print(len(dataLoader))#18588

Actually,my dataset contains

283101 *.mp4 file
37181 movie folder
1563 speakers

and dev_mp4.zip's md5 checksum value doesn't match to Official Dev md5.

maybe my VoxCeleb2 dataset is broken...
I also try to construct VoxCeleb2 dataset all over again.

The results I showed maybe misleading where 'batch' should be 'step'

Is 'step' means iteration during a epoch?
(I'm sorry to be so inquisitive)

And,if you do not mind , could you give me your pretrained model?
Of course, I do not use it on my study , without your permission.

Thanks,
Nekomo

Jarvisss · 2020-07-27T12:37:06Z

@nekomo

Is 'step' means iteration during a epoch?

Yes, it is

And,if you do not mind , could you give me your pretrained model?

ofcourse, how can I give you my model?

ghost · 2020-07-28T06:02:11Z

@Jarvisss

ofcourse, how can I give you my model?

Thanks a lot.
Please upload your model to cloud strage,and share model's url,like this implementation's pretrained model. This method is best for me .

Nekomo

Jarvisss · 2020-08-08T02:34:04Z

@nekomo Hi, sorry for late reply, I trained my model for another 10 epochs, and get comparable results to @vincent-thevenin,

the result can be seen here.

but when I ran the embedder_inference.py and finetuning_training.py, I also got ugly results.

and if I do not finetune the model and feed forward directly, I will get result like this:

There must be something wrong, I'm still debugging now.

And I wonder what your result looks like without finetuning, could you please share the result?

Jarvisss

Jarvisss · 2020-08-09T05:50:20Z

I figure it out, the network is trained on 224 * 224, but the code in embedder_inference.py and video_inference.py crops the input to 256 * 256, which in my case causes the ugly result.

and if I do not finetune the model and feed forward directly, I will get result like this:

And above issue was caused by my mistake, I forgot to set finetuning=False for feed forward prediction.

Here's my feed forward result:

I will try to finetune the model and comment later.

Update:
Finetune the model for 40 epochs

mingkaihu · 2020-08-26T09:33:22Z

Hi Jarvisss,
Thanks for sharing. Your result looks stunning. I was wondering if you could share the steps for reference?
Regards,
Mingkai

Jarvisss · 2020-08-26T14:57:05Z

@mingkaihu
Hi, mingkai,
The steps for me are as follow:

run preprocess.py to get a lighter dataset
run train.py
run embedder_inference.py to get the embedding vector e_hat
run finetuning_training.py to finetune the model, using the e_hat we've got in step 3
run video_inference.py to get the result

You may first skip step 4 to see if the result is resonable. If true, then do the fine_tuning and step 5 again to see if the result is better.

Good luck,
Jarvisss

ghost · 2020-08-28T09:20:39Z

@Jarvisss
I'm sorry to reply late,and thank you for your suggestion.

And I wonder what your result looks like without finetuning, could you please share the result?

I ran embedeer_inference.py,and finetuning_training.py,so running model without finetuning was never tried.I also try that again.

I figure it out, the network is trained on 224 * 224, but the code in embedder_inference.py and video_inference.py crops the input to 256 * 256, which in my case causes the ugly result.

I got it. I 'll modify my local code too.

Thank you,
Nekomo

mingkaihu · 2020-09-01T15:04:25Z

@mingkaihu
Hi, mingkai,
The steps for me are as follow:

run preprocess.py to get a lighter dataset

run train.py

run embedder_inference.py to get the embedding vector e_hat

run finetuning_training.py to finetune the model, using the e_hat we've got in step 3

run video_inference.py to get the result

You may first skip step 4 to see if the result is resonable. If true, then do the fine_tuning and step 5 again to see if the result is better.

Good luck,
Jarvisss

Thanks a lot for your feedback, Jarvisss.
Regards,
Mingkai

tengshaofeng · 2020-09-09T09:28:44Z

@Jarvisss hi, I am so appreciated with your gread job. What is you newest code? it is "https://github.com/Jarvisss/Realistic-Neural-Talking-Head-Models", right? I knew you have chage emberdder_inference.py from 256 to 224:

I figure it out, the network is trained on 224 * 224, but the code in embedder_inference.py and video_inference.py crops the input to 256 * 256, which in my case causes the ugly result.

but when I read the code in "https://github.com/Jarvisss/Realistic-Neural-Talking-Head-Models", it is still 256.
So can you share your code with me? Thanks so much

Jarvisss · 2020-09-10T16:53:21Z

@tengshaofeng
Yes, you are right. https://github.com/Jarvisss/Realistic-Neural-Talking-Head-Models is my implementation with a few changes to the origin code

the network is trained on 224 * 224

The network is actually trained on 256 * 256, where the real input is 224 * 224 and zero-padding to 256 * 256.

So the network still takes 256 * 256 as input, but I changed the cropped image to 224 * 224 and padding to 256 * 256 just like in training, instead of crop images to 256 * 256 and without padding during testing

tengshaofeng · 2020-09-16T06:22:04Z

@Jarvisss , sorry, I am confused now. Should I change it from 256 to 224 in embedder_infernce.py, finetuning_training.py and webcam_inference.py? what is the different betweent code in the master branch with yours?

ghost · 2020-09-23T07:06:29Z

@Jarvisss
Hello,
Thanks for your advice and forked branch, I reproduce the result like yours.

Honestly I haven't understood why I got ugly result, so I will take the difference between your repo and this one and try to understand why.
It seems that other developer is still discussing,so I keep this issue open.

I really appliciate your support .

thanks,
Nenoko(Nekomo)

Jarvisss · 2020-09-23T07:26:49Z

@Jarvisss , sorry, I am confused now. Should I change it from 256 to 224 in embedder_infernce.py, finetuning_training.py and webcam_inference.py? what is the different betweent code in the master branch with yours?

@tengshaofeng Sorry for late reply, the code of my forked version (Jarvisss@da30930) was created for purpose of PR, and the code of crop was not added to that commit.

By the way, what you should do is to crop the images to (224, 224), in webcam_demo/webcam_extraction_conversion.py, in function generate_landmarks, like this:

if input.shape[0]==input.shape[1] and input.shape[0]==224:
    pass
else:
    input = crop_and_reshape_img(input, preds, pad=pad, out_shape=224)
    preds = crop_and_reshape_preds(preds, pad=pad, out_shape=224)

to make it consistent with the training data.

yours,
jarvisss

Jarvisss · 2020-09-23T07:31:49Z

@Nenoko
You may have a look at another issue, some ugly result may come from very different landmark shape of driving and source
#12 (comment)

lastapple · 2020-10-13T02:55:08Z

@Jarvisss Hi, can you upload your trained checkpoints to google drive to share with us? Thanks!

tengshaofeng · 2020-12-11T10:39:01Z

@Jarvisss thanks for your reply.

I put a new image into examples/fine_tuning/test_images, and run the embeder_inference.py to get e_hat_images.tar.
then run the finetuning_training.py given the new image and e_hat_images.tar to get finetuned_model.tar. the total epoch number is 40.
finally, extract the landmark image from examples/fine_tuning/test_video.mp4 to generate fake face given e_hat_images.tar and finetuned_model.tar.

is it right?

I got the result like following:

and the given new image as following:

I do not thing the result is good. Can u give me some advice?

Jarvisss · 2020-12-11T10:59:48Z

@tengshaofeng

What's the problem with your result?

tengshaofeng · 2020-12-11T11:01:57Z

@Jarvisss Can you see my shared images? Do u think there exist mistakes in my steps?

Jarvisss · 2020-12-11T11:40:45Z

@Jarvisss Can you see my shared images? Do u think there exist mistakes in my steps?

i see your result, but i dont understand whats the problem from the images you provide.

The steps are first embed image to code ,fine tuning, and then inference, as the author suggest in readme, can you share the landmarks for inference

tengshaofeng · 2020-12-14T03:31:02Z

@Jarvisss Can you see my shared images? Do u think there exist mistakes in my steps?

i see your result, but i dont understand whats the problem from the images you provide.

The steps are first embed image to code ,fine tuning, and then inference, as the author suggest in readme, can you share the landmarks for inference

the landmarks is in the middle of the image.

ganqi91 · 2021-01-18T09:51:03Z

@Jarvisss hi, your result is so cool, but I want to known, if do not finetune , what the result like?
could you share some your result ?

and, could you share your pre-trained model weight ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot get better Result like README #54

Cannot get better Result like README #54

ghost commented Jul 26, 2020

Jarvisss commented Jul 27, 2020 •

edited

Loading

ghost commented Jul 27, 2020

Jarvisss commented Jul 27, 2020 •

edited

Loading

ghost commented Jul 27, 2020 •

edited by ghost

Loading

Jarvisss commented Jul 27, 2020 •

edited

Loading

ghost commented Jul 28, 2020

Jarvisss commented Aug 8, 2020 •

edited

Loading

Jarvisss commented Aug 9, 2020 •

edited

Loading

mingkaihu commented Aug 26, 2020

Jarvisss commented Aug 26, 2020 •

edited

Loading

ghost commented Aug 28, 2020 •

edited by ghost

Loading

mingkaihu commented Sep 1, 2020

tengshaofeng commented Sep 9, 2020 •

edited

Loading

Jarvisss commented Sep 10, 2020 •

edited

Loading

tengshaofeng commented Sep 16, 2020 •

edited

Loading

ghost commented Sep 23, 2020 •

edited by ghost

Loading

Jarvisss commented Sep 23, 2020 •

edited

Loading

Jarvisss commented Sep 23, 2020

lastapple commented Oct 13, 2020

tengshaofeng commented Dec 11, 2020 •

edited

Loading

Jarvisss commented Dec 11, 2020

tengshaofeng commented Dec 11, 2020 •

edited

Loading

Jarvisss commented Dec 11, 2020

tengshaofeng commented Dec 14, 2020 •

edited

Loading

ganqi91 commented Jan 18, 2021

Cannot get better Result like README #54

Cannot get better Result like README #54

Comments

ghost commented Jul 26, 2020

Jarvisss commented Jul 27, 2020 • edited Loading

ghost commented Jul 27, 2020

Jarvisss commented Jul 27, 2020 • edited Loading

ghost commented Jul 27, 2020 • edited by ghost Loading

Jarvisss commented Jul 27, 2020 • edited Loading

ghost commented Jul 28, 2020

Jarvisss commented Aug 8, 2020 • edited Loading

Jarvisss commented Aug 9, 2020 • edited Loading

mingkaihu commented Aug 26, 2020

Jarvisss commented Aug 26, 2020 • edited Loading

ghost commented Aug 28, 2020 • edited by ghost Loading

mingkaihu commented Sep 1, 2020

tengshaofeng commented Sep 9, 2020 • edited Loading

I figure it out, the network is trained on 224 * 224, but the code in embedder_inference.py and video_inference.py crops the input to 256 * 256, which in my case causes the ugly result.

Jarvisss commented Sep 10, 2020 • edited Loading

tengshaofeng commented Sep 16, 2020 • edited Loading

ghost commented Sep 23, 2020 • edited by ghost Loading

Jarvisss commented Sep 23, 2020 • edited Loading

Jarvisss commented Sep 23, 2020

lastapple commented Oct 13, 2020

tengshaofeng commented Dec 11, 2020 • edited Loading

Jarvisss commented Dec 11, 2020

tengshaofeng commented Dec 11, 2020 • edited Loading

Jarvisss commented Dec 11, 2020

tengshaofeng commented Dec 14, 2020 • edited Loading

ganqi91 commented Jan 18, 2021

Jarvisss commented Jul 27, 2020 •

edited

Loading

Jarvisss commented Jul 27, 2020 •

edited

Loading

ghost commented Jul 27, 2020 •

edited by ghost

Loading

Jarvisss commented Jul 27, 2020 •

edited

Loading

Jarvisss commented Aug 8, 2020 •

edited

Loading

Jarvisss commented Aug 9, 2020 •

edited

Loading

Jarvisss commented Aug 26, 2020 •

edited

Loading

ghost commented Aug 28, 2020 •

edited by ghost

Loading

tengshaofeng commented Sep 9, 2020 •

edited

Loading

Jarvisss commented Sep 10, 2020 •

edited

Loading

tengshaofeng commented Sep 16, 2020 •

edited

Loading

ghost commented Sep 23, 2020 •

edited by ghost

Loading

Jarvisss commented Sep 23, 2020 •

edited

Loading

tengshaofeng commented Dec 11, 2020 •

edited

Loading

tengshaofeng commented Dec 11, 2020 •

edited

Loading

tengshaofeng commented Dec 14, 2020 •

edited

Loading