Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use blend_models.py for model conversion? #1

Open
watertianyi opened this issue Mar 28, 2023 · 5 comments
Open

Why use blend_models.py for model conversion? #1

watertianyi opened this issue Mar 28, 2023 · 5 comments

Comments

@watertianyi
Copy link

watertianyi commented Mar 28, 2023

Why do you need to use blend_models.py for model conversion? Didn’t you already perform CCN training? It already includes layer conversion. I followed your steps to train CCN and then perform blend_model. , my image size is 256?
LeslieZhoa/DCT-NET.Pytorch#11
2023-03-28 16-24-18 的屏幕截图

2023-03-28 16-24-54 的屏幕截图
Figure 1 is the CCN training generation graph
Figure 2 is the generated image of CCN+blend_models.py

@SShowbiz
Copy link
Owner

SShowbiz commented Mar 30, 2023

Why do you need to use blend_models.py for model conversion? Didn’t you already perform CCN training? It already includes layer conversion. I followed your steps to train CCN and then perform blend_model. , my image size is 256?
LeslieZhoa/DCT-NET.Pytorch#11

Thank you for your interest in my project. I used unofficial pytorch implementation of DCT-Net you mentioned.

  1. I used blend_models.py to guarantee the variety of generated dataset (generated by $G_t$). You can still apply the style without using blend_models.py (actually the style is more emphasized when you remove the process), but the character might seems to be similar regardless of source image. To wrap up, I use blend_models.py to naturally apply the style with preserving the source identity.

  2. Where can I find layer conversion in get_tcc_input.py? The model blending written in original DCT-Net paper is only applied in inference stage, layer of $G_t$ and $G_s$ have to be mixed as you mentioned. Actually I couldn't find the corresponding part in unofficial pytorch implementation of DCT-Net and manually get code from other implementation to implement the part written in paper.

  3. if your image size is 256, please check your pretrained StyleGAN model in CCN uses $1024 \times 1024$, and check your config in DCTNet/model/stylegANModule/config.py

I hope my answer is helpful for you!

@watertianyi
Copy link
Author

@SShowbiz Thank you for your prompt reply!

  1. The official DCT code is relatively clear about the process of the CCN stage. In my understanding, this stage should be:
    1.1 Use stylegan to train the aligned style face data to obtain a style model;
    1.2 Mix the layers of the style model obtained above with the ffhq model to obtain a hybrid model;
    1.3 The hybrid model generates paired data that generates real face images and corresponding style face images,
    Based on the above, I don't quite understand the difference between the CCN code and stylegan, and the generated image pairs have very low similarity.

  2. I did use your code for the CCN training model directly using get_tcc_input.py, but the error was reported when I ran it. I thought your CCN code had already included the layer blending conversion. If it didn’t, it means you missed it.

  3. The last thing I want to know is that the CCN stage is the stylegan training style model stage + layer mixing stage + paired data generation stage; is the TTN stage using paired data for pix2pix training?

@watertianyi
Copy link
Author

watertianyi commented Mar 30, 2023

@SShowbiz

I have trained for 434 epochs, can I stop the training, I see you set 1w epochs
2023-03-30 17-45-07 的屏幕截图

2023-03-30 17-44-46 的屏幕截图

@SShowbiz
Copy link
Owner

@SShowbiz Thank you for your prompt reply!

  1. The official DCT code is relatively clear about the process of the CCN stage. In my understanding, this stage should be:
    1.1 Use stylegan to train the aligned style face data to obtain a style model;
    1.2 Mix the layers of the style model obtained above with the ffhq model to obtain a hybrid model;
    1.3 The hybrid model generates paired data that generates real face images and corresponding style face images,
    Based on the above, I don't quite understand the difference between the CCN code and stylegan, and the generated image pairs have very low similarity.
  2. I did use your code for the CCN training model directly using get_tcc_input.py, but the error was reported when I ran it. I thought your CCN code had already included the layer blending conversion. If it didn’t, it means you missed it.
  3. The last thing I want to know is that the CCN stage is the stylegan training style model stage + layer mixing stage + paired data generation stage; is the TTN stage using paired data for pix2pix training?

Sry for you, I cannot attach my image works. The work is related with copyright issue.

  1. Your understanding is right. The "FFHQ Model" you mentioned is actually pre-trained StyleGAN generator (written in $G_s$ in paper), and the "Hybrid Model" you mentioned is actually the fine-tuned StyleGAN generator $G_t$ which is trained from copied $G_s$. As you mentioned, the core part of this process (CCN) is almost same as basic StyleGAN2 finetuning, and the point of the different mainly comes from applying ID Loss fine-tuning process.

  2. Fine-tuning StyleGAN2 generator model (I'll call this CCN training) doesn't includes layer blending. get_tcc_input.py generates the stylized image using fine-tuned StyleGAN2 generator ($G_t$). Latent mixing implemented in get_tcc_input.py is different from layer blending.

  3. Completely yes.

  4. (for your next question) It is completely your choice (depends on generated images), but I recommend 200+. I think 434 is enough.

@watertianyi
Copy link
Author

@SShowbiz

1.In fact, what I want to ask is why the data pair generated by layer blending is not stylized, just like Figure 2 in the first question, and is there any way to make the generated stylized image have a real person face similarity
2.Is the above training loss normal? I think the total_loss has been rising, and no matter how the training is kept at around 1.2, which one of the losses is the most effective?
3.I found that after training, the training data set is 256X256, the inference input is 1024X1024, and the output is still 1024X1024. Shouldn't it be the resolution used for training, and the resolution of the inference output? Is this using small resolution data for training? Does it output high resolution?
4.The results after training are almost all white faces, no stylized red faces, and black faces after stylization, what is going on?
msk_output
bill_output
9991_output
9988_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants