Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do style mixing quickly? #6

Open
kenmbkr opened this issue Feb 2, 2021 · 3 comments
Open

How to do style mixing quickly? #6

kenmbkr opened this issue Feb 2, 2021 · 3 comments

Comments

@kenmbkr
Copy link

kenmbkr commented Feb 2, 2021

I am trying to do style mixing between two images using file_based_simple_style_transfer.py with the following command, but it takes a couple of minutes to optimize and the artifacts are noticeable in the results. Reducing the latent-step or the noise-step makes the results worse.

python file_based_simple_style_transfer.py --content img1.png --style img2.png --mixing-index -1 --destination . --config trained_model/ffhq_stylegan_2_w_plus/2020-09-05T16:09:10.557656/config/config.json --latent-step 5000 --noise-step 3000

The official StyleGAN implementation can do style mixing by swapping some layers of the w variables and I tried to do the same by swapping some layers of the noise variables of two images. All I got are two images overlaying on top of each other.

How should I do style mixing with a good balance between performance and fidelity?

@Bartzi
Copy link
Owner

Bartzi commented Feb 2, 2021

Hi,

using our code, you can also do style mixing. However, you unfortunately used the wrong script for this.
file_based_simple_style_transfer.py can be used to perform real neural style transfer as described in the paper Image2StyleGAN, it can not be used for style mixing. For style mixing, you should use the reconstruct_image.py image script.
However, you will need to make some adjustments. You will need to adjust the encode function of an autoencoder to return a Latents object with the styles you want to mix, mixed.

Hope this helps!

@kenmbkr
Copy link
Author

kenmbkr commented Feb 3, 2021

Thank you for your suggestions. This is what I tried but my results look almost the same as the input image with only slight color changes if I swap the latent layers. My results look like one image overlay on top of another if I swap the noise layers. Please kindly let me know what I may have missed.

In networks/encoder/autoencoder.py, I changed to the following.

reconstructed_x, latent = self.decoder([latent_codes.latent], input_is_latent=self.is_wplus(latent_codes), noise=latent_codes.noise, return_latents=True)
return reconstructed_x, latent, latent_codes.noise

My changes in reconstruct_image.py

def reconstruct(path, config, device, autoencoder):
    input_image = Path(path)
    data_loader = build_data_loader(input_image, config, config['absolute'], shuffle_off=True, dataset_class=DemoDataset)

    image = next(iter(data_loader))
    image = {k: v.to(device) for k,v in image.items()}

    return autoencoder(image['input_image'])

content_image, content_latent, content_noise = reconstruct(args.content_image, config, args.device, autoencoder)
style_image, style_latent, style_noise = reconstruct(args.style_image, config, args.device, autoencoder)

for layer_id in args.styles:
    content_latent[:, layer_id, :] = style_latent[:, layer_id, :]

image, _ = autoencoder.decoder(
            [content_latent], input_is_latent=True, noise=content_noise, return_latents=True
        )

image = Image.fromarray(make_image(image.squeeze(0)))
image.save('out.png')

@Bartzi
Copy link
Owner

Bartzi commented Feb 3, 2021

Your changes look good so far.

Right now, you are running into a fundamental problem with our approach. If you train an autoencoder without any special care, the latent code will be degraded to plain color encoding, the heavy lifting for reconstruction is done by the noise inputs (as we have shown in the paper).
I don't know if you already had a look at the appendix of our paper. There we show some images where we interpolate in the latent space of a trained autoencoder model. There, you can see that if you do not train a model using the two network strategy the actual latent code does not encode any meaning anymore, it rather encodes some color information. You already observed that. So, style mixing might not work right out of the box.

However, there are two possible solutions that you can try (I'm not 100% sure they will work):

  1. Train a model using the two network method (denoted as two-stem) in the paper and try again. Using the two network method, the model will actually try to use the latents as much as possible and not rely on the noise that much.
  2. I don't know what kind of data you are experimenting with. If there is no StyleGAN model available that can unconditionally produce such images, you could train a StyleGAN model for unconditional image generation and then train an autoencoder with the two network strategy. This should improve the semantic meaning encoded in the latent code and hence enable better style mixing results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants