GANs in computer vision - Conditional image and object generation

GANs in computer vision: Conditional image and object generation (part2)

The previous post was more or less introductory in GANs, generative learning, and computer vision. We reached the point of generating distinguishable image features in 128x128 images. In this part, we will continue our GAN journey in computer vision diving in more complex designs and better visual results. We will see mode collapse, 3D object generation, single RGB image to 3D object generation, and improved quality image to image mappings.


1. AC-GAN (Conditional Image Synthesis with Auxiliary Classifier GANs)

2. 3D-GAN (Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling)

3. PacGAN (The power of two samples in generative adversarial networks)

4. Pix2Pix GAN (Image-to-Image Translation with Conditional Adversarial Networks)

5. Cycle-GAN (Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks)

Let’s commence with our second part of the GAN series!

AC-GAN (Conditional Image Synthesis with Auxiliary Classifier GANs 2016)

This amazing paper presents comprehensively the first attempt to produce detailed high-resolution image samples (for that time 128x128) with high variability among the class (intra-class variability). As we have already seen, the class is the conditional label. In this work, a GAN was trained that simultaneously tries to generate 10 different classes!

It is known that when you try to force a model to perform additional tasks (multi-task learning) the performance on the original task can be significantly increased. But, how can you do it? Using reconstruction loss!

Auxiliary or reconstruction loss

Combining the ideas of InfoGAN (information regularization) and conditional GAN (use image labels), the AC-GAN is an extension of GAN that uses side information (provides image class). Instead of just providing G and D with the conditional information, they let D to learn to reconstruct this side information (the so-called reconstruction loss).

Specifically, they modified D to contain an auxiliary (extra) decoder network that can utilize pre-trained weights from a standard classification setup. The auxiliary decoder network outputs the class label for the training data . That way, synthesizing image quality is greatly improved. The AC-GAN model learns a representation for noise (z) that is independent of the class label, therefore not necessary at the inference time of the generator.

Furthermore, the particular reconstruction objective , even though it is quite simple, appears to stabilize training . Training an ensemble (multiple models) of 100 AC-GANs, wherein each model is trained on 10 different classes, the ensemble AC-GANs generate 1000 realistic image classes (from Imagenet 1K dataset)