Generative Adversarial Networks(GANs)

Generative Adversarial Networks (GANs) introduced by Goodfellow are a type of deep learning model used to generate new data that looks like real data such as images, audio, or text.
A GAN is a machine learning system where two neural networks compete against each other:
- Generative Model (Generator, G):
  - Starts with random noise (like static on a TV).
  - Tries to create fake data that looks like the real thing (e.g. images, sounds).
- Discriminative Model (Discriminator, D):
  - Looks at both real data (from the training set) and fake data (from the generator).
  - Tries to distinguish between real and fake.
  - Outputs a score between 0 and 1 (closer to 1 = real, closer to 0 = fake).
Adversarial Training:
- These two networks are like adversaries (hence the name “adversarial”). They compete in a game:
  - The Generator wants to fool the Discriminator.
  - The Discriminator wants to catch the Generator’s fakes.
- This competition improves both models until fakes are indistinguishable from real data.

Adversarial Nets & How GANs Work

Both of the models Generative Model & Discriminative Model are multilayer perceptrons
To create synthetic (fake) data, we start by feeding input noise variables p_z(z) into the Generator to learn the Generator’s distribution p_g over data x.
We then define a mapping from this noise space to the data space using a differentiable function G(z; Ɵ_g), which is implemented as a multilayer perceptron with parameters Ɵ_g.
The Discriminator compares Real data from the dataset and Fake data from the Generator.
To do this comparison, we also define a second multilayer perceptron, D(x;θ_d)
which produces a single scalar output.
The function estimates the probability that a given input originates from the real data distribution rather than from p_g
Both networks are trained using backpropagation, which adjusts their parameters to get better over time.
We train the discriminator to maximize the probability of classifying correctly both real training data and samples generated by .
At the same time, we train the generator to minimize , thereby encouraging it to produce outputs that the discriminator is more likely to classify as real.
The Discriminator gives feedback to the Generator.
The Generator learns from this and produces better fakes.
Over time, the fake data becomes more and more realistic.
From Mathematical Intuition, the training is based on a minimax game with a value function:

min_Gmax_D V(D,G) = E_x_∼Pdata[logD(x)] + E_z_∼Pz[log(1−D(G(z)))]

- : the probability that input is real
- : Generator’s output from random noise
In practice, the above Equation may not give strong enough gradients for effective training of .
In the initial stage of training, when generates low-quality samples that can easily distinguish them from real data, resulting in weak learning signals for
That’s why, instead of training G to minimize , G can be trained to maximize which produce stronger learning signals.

Types of GANs

Vanilla GANs

Basic form of GANs with a generator and a discriminator in an adversarial setup.
Generator creates fake data samples.
Discriminator tries to distinguish between real and fake samples.
Uses simple multilayer perceptrons (MLPs) for both components.
Easy to implement due to straightforward architecture.
MLPs help process and classify data based on known patterns.
Training is often unstable and requires careful hyperparameter tuning for good performance.

Conditional GANs (cGAN)

A cGAN is a type of GAN that incorporates additional input (labels or conditions) into both the generator and the discriminator.
Allows the model to generate data with specific, controlled characteristics.
Uses both random noise and condition labels as input to the generator so that the generator can give output with specific characteristics
cGAN is being used now for generating images, textual, and synthetic data based on specific objects, topics, or styles

Deep convolutional GAN (DCGAN)

Uses Convolutional Neural Networks (CNNs) for both generator and discriminator.
Generator takes random noise as input & uses transposed convolutions (deconvolutions) to upscale and generate structured outputs (e.g., images).
Generator “zooms in” to build detailed images from noise.
Discriminator uses standard convolutions to analyze and classify the input data.
Discriminator “zooms out” to assess the global structure of data.
In this way, DCGANs are capable of generating high-quality, realistic images and other structured data.

StyleGAN

Generates high-resolution images (up to 1024 × 1024 pixels).
Uses datasets with images of the same object type (e.g., faces) to train.
Generator built with multiple layers where each layer adds progressive detail, from basic shapes to fine textures.
Discriminator is also multi-layered, which evaluates image detail and overall realism.
StyleGAN can produce highly realistic and detailed images.

CycleGAN

Designed for image-to-image translation using unpaired datasets.
Involves two generators and two discriminators, which are trained in a cyclic manner.
One generator translates an image to a new style (e.g., photo → painting).
A reverse generator translates it back to the original style.
Uses cycle consistency to ensure the translated image can be accurately reconstructed.
Useful for style transfer, image enhancement, and domain adaptation without paired data.

Laplacian pyramid GAN (LAPGAN)

Generates high-quality, high-resolution images using a multi-scale approach.
Starts with a low-resolution image.
Progressively adds finer details at higher resolutions using a series of GANs.
Based on the Laplacian pyramid, a technique for hierarchical image refinement.
Effectively manages the complexity of generating detailed, high-resolution images.

DiscoGAN

Learns cross-domain relationships without needing paired data.
Uses two generators and two discriminators.
Translates images between two domains and back.
Ensures cycle consistency to preserve original image features.
Effective for image-to-image translation, style transfer, and image enhancement with unpaired datasets.

Applications of GANs

1. Image Generation

GANs create realistic images from scratch or textual input, useful in art, design, and data augmentation.
BigGAN and other models produce class-specific, high-quality visuals for fields like fMRI decoding and medical imaging.

2. Image Super-Resolution

GANs upscale low-resolution images into high-resolution versions with enhanced detail.
StyleGAN2 allows fine control over image attributes, useful for visual editing and content creation.

3. Image-to-Image Translation

GANs translate images between domains, such as sketches to paintings or day to night scenes.
CycleGAN ensures consistent transformations using unpaired datasets and cyclic reconstruction.

4. Video Retargeting

GANs adapt videos for different formats while preserving key content and motion.
Recycle-GAN modifies aspect ratios (e.g., widescreen to square) with temporal consistency.

5. Facial Attribute Manipulation

GANs alter facial features like age, expression, or hair color with precision.
StyleGAN allows intuitive, style-based editing of facial attributes in high-resolution images.

6. Object Detection

GANs enhance training datasets by generating diverse, high-quality synthetic images.
Frameworks like GAN-DO improve model robustness against noise, blur, and distortions.

Limitations of GANs

Training instability: Generator and discriminator may not converge which might give poor output
Mode collapse: Generator may produce a limited variety and fail to capture the full diversity of training data
High data and computational demands
Output evaluation is difficult with standard metrics.
Ethical concerns: Risk of misuse in deepfakes and misleading content.

Python Implementation for Generative Adversarial Networks(GANs)

# Import Necessary Library
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import fashion_mnist

# Step 1: Load and preprocess dataset
(x_train, _), (_, _) = fashion_mnist.load_data()
x_train = x_train / 127.5 - 1.0  # Normalize to [-1, 1]
x_train = np.expand_dims(x_train, axis=-1)  # Add channel dimension

BUFFER_SIZE = 60000
BATCH_SIZE = 128
LATENT_DIM = 100  # Dimension of noise vector

dataset = tf.data.Dataset.from_tensor_slices(x_train).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

# Step 2: Build Generator
def build_generator():
    model = models.Sequential([
        layers.Dense(7*7*256, use_bias=False, input_shape=(LATENT_DIM,)),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.Reshape((7, 7, 256)),
        layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh')
    ])
    return model

# Step 3: Build Discriminator
def build_discriminator():
    model = models.Sequential([
        layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=[28, 28, 1]),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
        layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
        layers.Flatten(),
        layers.Dense(1)
    ])
    return model

generator = build_generator()
discriminator = build_discriminator()

# Step 4: Define loss functions and optimizers
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    return real_loss + fake_loss

generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

# Step 5: Training step
@tf.function
def train_step(images):
    noise = tf.random.normal([BATCH_SIZE, LATENT_DIM])
    
    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = generator(noise, training=True)

        real_output = discriminator(images, training=True)
        fake_output = discriminator(generated_images, training=True)

        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_gen = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_disc = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_gen, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_disc, discriminator.trainable_variables))

# Step 6: Training loop
def train(dataset, epochs):
    for epoch in range(epochs):
        for image_batch in dataset:
            train_step(image_batch)
        print(f'Epoch {epoch + 1} completed.')
        generate_and_plot_images(generator, tf.random.normal([16, LATENT_DIM]))

# Step 7: Generate and plot images
def generate_and_plot_images(model, test_input):
    predictions = model(test_input, training=False)
    fig = plt.figure(figsize=(4, 4))
    
    for i in range(predictions.shape[0]):
        plt.subplot(4, 4, i+1)
        plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='gray')
        plt.axis('off')
    plt.show()

# Step 8: Run training
EPOCHS = 30
train(dataset, EPOCHS)

N.B.:

The reason of early images look like fuzzy blobs or abstract shapes is that in the early stages of training a GAN, the generator has no idea what a real image looks like.
It starts from pure noise. As training progresses, it slowly learns to produce more realistic outputs by trying to “fool” the discriminator.
I have provided images after 6 epochs, as it took 1 hour and 44 minutes already, so I didn’t allow the code to run up to 30 epochs due to time & computational cost.
However, you can see that these images are becoming clearer gradually from total blur.
It was supposed to give some new fashion product images.

Reference

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems. https://arxiv.org/abs/1406.2661
IBM. (n.d.). What are Generative Adversarial Networks (GANs)? IBM Think. Retrieved from https://www.ibm.com/think/topics/generative-adversarial-networks

Generative Adversarial Networks(GANs)

Generative Adversarial Networks(GANs)

Adversarial Nets & How GANs Work

Types of GANs

Applications of GANs

2. Image Super-Resolution

3. Image-to-Image Translation

4. Video Retargeting

5. Facial Attribute Manipulation

6. Object Detection

Limitations of GANs

Python Implementation for Generative Adversarial Networks(GANs)

N.B.:

Reference

Legal Menu

Tutorial

Generative Adversarial Networks(GANs)

Adversarial Nets & How GANs Work

Types of GANs

Applications of GANs

2. Image Super-Resolution

3. Image-to-Image Translation

4. Video Retargeting

5. Facial Attribute Manipulation

6. Object Detection

Limitations of GANs

Python Implementation for Generative Adversarial Networks(GANs)

N.B.:

Reference

Register

Login here

Forgot your password?

Subscribe to our email list

Legal Menu

Tutorial