GANs in computer vision - self-supervised adversarial training and high-resolution image synthesis with style incorporation

GANs in computer vision: self-supervised adversarial training and high-resolution image synthesis with style incorporation (part 5)

An important lesson from my journey in GANs is that you cannot start to learn deep learning from GANs. There is a tremendous set of background knowledge to understand each design choice. Each paper has its own creativity that derives from a general understanding of how deep learning really works. Proposing general solutions in generative learning is extremely tough. But when you start to focus on the specific tasks, creativity has no ceiling in the game of designing a GAN. This is one of the reasons that we chose to focus on computer vision. After a bunch of reviews, you start to understand that the top papers we include start to make sense. It is indeed like a puzzle.

So, let’s try to solve it!

In the previous post , we discussed 2K image-to-image translation, video to video synthesis, and large-scale class-conditional image generation. Namely, pix2pixHD, vid-to-vid, and BigGAN. In this part, we will start with unconditional image generation in ImageNet, exploiting the recent advancements in self-supervised learning . Finally, we will focus on style incorporation via adaptive instance normalization . To do so, we will revisit concepts of in-layer normalization that will be proven quite useful in our understanding of GANs.


Self-Supervised GANs via Auxiliary Rotation Loss

StyleGAN (A Style-Based Generator Architecture for Generative Adversarial Networks

Self-Supervised GANs via Auxiliary Rotation Loss (2018)

We have already discussed a lot about class-conditional GANs ( BigGAN ) as well as image-conditioned ones ( pix2pixHD ), in the previous post. These methods have achieved high quality in large resolutions, specifically 512x512 and 2048x1024, respectively. Nevertheless, we did not discuss what problems one may face when he tries to scale GANs in an unconditional setup.

This is one of the first works that bring ideas from the field of self-supervised learning into generative adversarial learning on large scales. Moreover, it is revolutionary in terms of introducing the notion of forgetting in GANs, as well as a method to encounter it. Unlabeled data are abundant compared to human-annotated datasets that are fairly limited. Therefore, it is a direction that we would like to explore.

Before we start, let’s clarify one thing that may seem vague in the beginning: the role of self-supervision targets the discriminator in order to learn meaningful feature representations. Through adversarial training, the generator is also affected by the injection of self-supervised guidance. Now, let us start by first understanding the notion of self-supervision.

1. What is self-supervised learning?

In general, GANs are considered a form of unsupervised learning. But why?...