How to do style mixing quickly? #6

kenmbkr · 2021-02-02T01:47:02Z

I am trying to do style mixing between two images using file_based_simple_style_transfer.py with the following command, but it takes a couple of minutes to optimize and the artifacts are noticeable in the results. Reducing the latent-step or the noise-step makes the results worse.

python file_based_simple_style_transfer.py --content img1.png --style img2.png --mixing-index -1 --destination . --config trained_model/ffhq_stylegan_2_w_plus/2020-09-05T16:09:10.557656/config/config.json --latent-step 5000 --noise-step 3000

The official StyleGAN implementation can do style mixing by swapping some layers of the w variables and I tried to do the same by swapping some layers of the noise variables of two images. All I got are two images overlaying on top of each other.

How should I do style mixing with a good balance between performance and fidelity?

The text was updated successfully, but these errors were encountered:

Bartzi · 2021-02-02T10:52:03Z

Hi,

using our code, you can also do style mixing. However, you unfortunately used the wrong script for this.
file_based_simple_style_transfer.py can be used to perform real neural style transfer as described in the paper Image2StyleGAN, it can not be used for style mixing. For style mixing, you should use the reconstruct_image.py image script.
However, you will need to make some adjustments. You will need to adjust the encode function of an autoencoder to return a Latents object with the styles you want to mix, mixed.

Hope this helps!

kenmbkr · 2021-02-03T01:55:33Z

Thank you for your suggestions. This is what I tried but my results look almost the same as the input image with only slight color changes if I swap the latent layers. My results look like one image overlay on top of another if I swap the noise layers. Please kindly let me know what I may have missed.

In networks/encoder/autoencoder.py, I changed to the following.

reconstructed_x, latent = self.decoder([latent_codes.latent], input_is_latent=self.is_wplus(latent_codes), noise=latent_codes.noise, return_latents=True)
return reconstructed_x, latent, latent_codes.noise

My changes in reconstruct_image.py

def reconstruct(path, config, device, autoencoder):
    input_image = Path(path)
    data_loader = build_data_loader(input_image, config, config['absolute'], shuffle_off=True, dataset_class=DemoDataset)

    image = next(iter(data_loader))
    image = {k: v.to(device) for k,v in image.items()}

    return autoencoder(image['input_image'])

content_image, content_latent, content_noise = reconstruct(args.content_image, config, args.device, autoencoder)
style_image, style_latent, style_noise = reconstruct(args.style_image, config, args.device, autoencoder)

for layer_id in args.styles:
    content_latent[:, layer_id, :] = style_latent[:, layer_id, :]

image, _ = autoencoder.decoder(
            [content_latent], input_is_latent=True, noise=content_noise, return_latents=True
        )

image = Image.fromarray(make_image(image.squeeze(0)))
image.save('out.png')

Bartzi · 2021-02-03T08:45:20Z

Your changes look good so far.

Right now, you are running into a fundamental problem with our approach. If you train an autoencoder without any special care, the latent code will be degraded to plain color encoding, the heavy lifting for reconstruction is done by the noise inputs (as we have shown in the paper).
I don't know if you already had a look at the appendix of our paper. There we show some images where we interpolate in the latent space of a trained autoencoder model. There, you can see that if you do not train a model using the two network strategy the actual latent code does not encode any meaning anymore, it rather encodes some color information. You already observed that. So, style mixing might not work right out of the box.

However, there are two possible solutions that you can try (I'm not 100% sure they will work):

Train a model using the two network method (denoted as two-stem) in the paper and try again. Using the two network method, the model will actually try to use the latents as much as possible and not rely on the noise that much.
I don't know what kind of data you are experimenting with. If there is no StyleGAN model available that can unconditionally produce such images, you could train a StyleGAN model for unconditional image generation and then train an autoencoder with the two network strategy. This should improve the semantic meaning encoded in the latent code and hence enable better style mixing results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do style mixing quickly? #6

How to do style mixing quickly? #6

kenmbkr commented Feb 2, 2021

Bartzi commented Feb 2, 2021

kenmbkr commented Feb 3, 2021

Bartzi commented Feb 3, 2021

How to do style mixing quickly? #6

How to do style mixing quickly? #6

Comments

kenmbkr commented Feb 2, 2021

Bartzi commented Feb 2, 2021

kenmbkr commented Feb 3, 2021

Bartzi commented Feb 3, 2021