RSK World - Complete Documentation - StyleGAN Image Generation | StyleGAN Image Generation | Style-Based Generator | High-Resolution Image Generation | PyTorch | RSK World

Project Description

This project implements StyleGAN for generating high-resolution photorealistic images using style-based generator architecture with adaptive instance normalization (AdaIN). The architecture uses mapping network to transform random noise to intermediate latent space (W), synthesis network with progressive growing from low to high resolution, and style mixing capabilities for fine-grained control over image attributes. Perfect for learning StyleGAN fundamentals and high-quality image generation. The system includes adversarial training, style mixing, truncation trick, TensorBoard integration, and comprehensive training tools.

StyleGAN uses style-based generation where a mapping network transforms random noise Z to intermediate latent space W, and a synthesis network generates images progressively with style injection at each layer using AdaIN. The generator uses progressive growing to build images from low to high resolution (up to 1024x1024), while the discriminator provides adversarial feedback. The implementation provides complete PyTorch support, comprehensive training pipeline, style mixing utilities, evaluation metrics, and deployment tools for high-resolution image generation applications.

Project Screenshots

1 / 4

Core Features

StyleGAN Architecture

Style-based generator
Mapping network
Synthesis network
Progressive growing
High-resolution generation

Style-Based Generation

Intermediate latent space (W)
Adaptive instance normalization
Style injection at layers
Controllable attributes
Fine-grained control

Mapping & Synthesis Networks

Mapping network (Z to W)
Synthesis network (progressive)
AdaIN layers
Multi-resolution generation
Deep network architecture

Adversarial Training

Generator loss
Discriminator loss
Gradient penalty
Truncation trick
Training stability

TensorBoard Integration

Real-time loss visualization
Multi-resolution image tracking
Training progress monitoring
Interactive dashboard
Comprehensive logging

Web Interface

Flask-based web app
Interactive image generation
Real-time generation
Download generated images
User-friendly interface

Advanced Features

Style Mixing & Interpolation

Style mixing at layers
Linear interpolation in W space
Controllable style attributes
Smooth image transitions
Visual exploration

Data Augmentation

Adaptive augmentation
Mixup augmentation
Cutout augmentation
Multiple augmentation levels

Multiple Dataset Support

Custom dataset support
CelebA dataset
CIFAR-10 dataset
MNIST dataset

Resume Training

Checkpoint resuming
Automatic checkpoint detection
Training continuation
Progress preservation
Early stopping support

Web Interface Features

Feature	Description	Usage
Image Generation	Generate images from random noise	Select number of images and click Generate
Real-time Generation	Generate images in real-time	Images appear as they are generated
Download Images	Download generated images	Click download button for each image
Model Selection	Choose different trained models	Select from available checkpoints

Technologies Used

This StyleGAN Image Generation project is built using modern deep learning and computer vision technologies. The core implementation uses Python as the primary programming language and PyTorch for deep learning operations. The project includes a StyleGAN architecture with style-based generator and discriminator networks for high-resolution photorealistic image generation. The project includes Jupyter Notebook support for interactive development and demonstrations, and comprehensive adversarial training with style mixing capabilities for assessing image quality.

StyleGAN uses style-based generation where a mapping network transforms random noise Z to intermediate latent space W, and a synthesis network generates images progressively with style injection at each layer using AdaIN. The system supports progressive growing for building images from low to high resolution, style mixing for fine-grained control over image attributes, and truncation trick for quality vs diversity trade-off, making it suitable for various high-resolution image generation applications.

Python 3.8+ PyTorch 2.0+ StyleGAN AdaIN Image Generation TensorBoard Computer Vision Jupyter Notebook Progressive Growing Style Mixing

Installation & Usage

Installation

Install all required dependencies for the StyleGAN Image Generation project:

# Install all requirements
pip install -r requirements.txt

# The StyleGAN model will be trained on your data
# Prepare your dataset in data/train/ directory
# Images should be preprocessed and resized

PyTorch Installation

Install PyTorch (CPU or GPU version):

# For CPU only
pip install torch torchvision torchaudio

# For CUDA (GPU support) - CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Verify installation
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"

Verify Installation

Test the model and verify all components work:

# Test model architecture
python test_model.py

# This will verify:
# - Model can be instantiated
# - Forward pass works
# - All components function correctly
# - Device compatibility (CPU/CUDA)

Training the Model

Train the StyleGAN model on your image dataset:

# Prepare your dataset
# Place images in data/train/ directory
# Preprocess images to consistent size

# Basic training with default parameters
python train.py --dataset ./data/train --output_dir ./outputs --epochs 100

# Configure in config.py:
# - BATCH_SIZE = 4
# - EPOCHS = 100
# - LEARNING_RATE = 0.002
# - LATENT_DIM = 512
# - MAX_RESOLUTION = 256
# - TRUNCATION = 1.0

# Or use Jupyter notebook
jupyter notebook notebooks/02_training_stylegan.ipynb

# Training will:
# - Load and preprocess images
# - Initialize generator and discriminator
# - Train with adversarial loss + gradient penalty
# - Progressive growing from low to high resolution
# - Save checkpoints and generated samples
# - Log to TensorBoard

Training Parameters (config.py):

BATCH_SIZE: Training batch size (default: 4)
EPOCHS: Number of training epochs (default: 100)
LATENT_DIM: Dimension of latent space Z (default: 512)
STYLE_DIM: Dimension of style space W (default: 512)
LEARNING_RATE: Learning rate (default: 0.002)
MAX_RESOLUTION: Maximum image resolution (default: 256)
TRUNCATION: Truncation trick value (default: 1.0)
LAMBDA_GP: Gradient penalty weight (default: 10.0)
DEVICE: Device to use - 'cuda' or 'cpu' (default: 'cuda')

Image Generation

Generate images using the trained StyleGAN model:

# Generate images from trained model
python generate.py --checkpoint outputs/checkpoints/generator.pth --num_images 16 --output_dir ./generated

# Generate with style interpolation
python generate.py --checkpoint outputs/checkpoints/generator.pth --interpolate --num_steps 10 --output_dir ./generated

# Or use Jupyter notebook
jupyter notebook notebooks/03_generate_images.ipynb

# Using Python API
from stylegan import StyleGANGenerator
import torch

generator = StyleGANGenerator(latent_dim=512, style_dim=512, max_resolution=256)
generator.load_state_dict(torch.load("outputs/checkpoints/generator.pth"))
generator.eval()

# Generate images
z = torch.randn(16, 512)
images = generator(z, truncation=0.7)
print(f"Generated {len(images)} images")

Model Evaluation

Evaluate the trained StyleGAN model performance:

# Evaluate trained model
python evaluate.py --checkpoint outputs/checkpoints/generator.pth --dataset ./data/train

# The evaluation includes:
# - SSIM (Structural Similarity Index)
# - Diversity metrics
# - Generated image quality assessment
# - Model performance metrics

Style Mixing Visualization

Visualize style mixing and generate interpolations:

# Style mixing visualization
python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode all --output_dir ./visualizations

# The visualization includes:
# - Style mixing at different layers
# - Interpolation in W space
# - Generated image samples
# - Progressive growing visualization

Jupyter Notebooks

Open the interactive Jupyter notebooks for demonstrations:

# StyleGAN demonstration notebooks
jupyter notebook notebooks/01_stylegan_introduction.ipynb
jupyter notebook notebooks/02_training_stylegan.ipynb
jupyter notebook notebooks/03_generate_images.ipynb
jupyter notebook notebooks/04_style_mixing.ipynb

# The notebooks include:
# - Model architecture visualization
# - Mapping and synthesis network explanation
# - Training setup examples
# - Image generation examples
# - Style mixing demonstrations

# Or use JupyterLab
jupyter lab notebooks/

Project Structure

                stylegan-generation/

                ├── README.md                          # Main documentation

                ├── requirements.txt                   # Python dependencies

                ├── LICENSE                            # License file

                ├── CHANGELOG.md                      # Changelog

                ├── CONTRIBUTING.md                   # Contribution guidelines

                ├── PROJECT_SUMMARY.md                 # Project summary

                │

                ├── Core Modules

                │   ├── train.py                       # Training script

                │   ├── generate.py                    # Image generation

                │   ├── evaluate.py                    # Model evaluation

                │   ├── visualize.py                   # Visualization tools

                │   ├── preprocess.py                  # Data preprocessing

                │   ├── example.py                     # Simple example

                │   └── config.py                      # Configuration settings

                │

                ├── StyleGAN Package (stylegan/)

                │   ├── model.py                       # Generator & Discriminator

                │   ├── layers.py                      # AdaIN, Noise, Style layers

                │   ├── losses.py                     # Loss functions

                │   ├── utils.py                       # Utility functions

                │   ├── style_mix.py                  # Style mixing utilities

                │   └── metrics.py                     # Evaluation metrics

                │

                ├── Data

                │   └── (training datasets: Custom images)

                │

                ├── Outputs

                │   ├── checkpoints/                   # Model checkpoints

                │   ├── samples/                        # Generated samples

                │   └── logs/                          # TensorBoard logs

                │

                ├── Notebooks

                │   ├── 01_stylegan_introduction.ipynb

                │   ├── 02_training_stylegan.ipynb

                │   ├── 03_generate_images.ipynb

                │   └── 04_style_mixing.ipynb

                │

                ├── Scripts

                │   ├── download_dataset.py            # Dataset download utility

                │   └── convert_checkpoint.py         # Checkpoint converter

Configuration Options

Model Configuration

Customize model and training parameters in config.py and train.py:

# Model Architecture (config.py)
LATENT_DIM = 512                  # Dimension of latent space Z
STYLE_DIM = 512                   # Dimension of style space W
MAX_RESOLUTION = 256              # Maximum image resolution
BASE_CHANNELS = 512               # Base channels for generator
NUM_MAPPING_LAYERS = 8            # Number of mapping network layers

# Training Parameters (config.py)
BATCH_SIZE = 4                    # Training batch size
LEARNING_RATE = 0.002             # Learning rate
EPOCHS = 100                      # Number of training epochs
BETA1 = 0.0                       # Adam optimizer beta1
BETA2 = 0.99                      # Adam optimizer beta2
TRUNCATION = 1.0                  # Truncation trick value
LAMBDA_GP = 10.0                  # Gradient penalty weight
N_CRITIC = 1                      # Discriminator updates per generator update
DEVICE = 'cuda'                   # Device: 'cuda' or 'cpu'

# Generation Configuration
NUM_IMAGES = 16                   # Number of images to generate
TRUNCATION = 0.7                  # Truncation for generation

Configuration Tips:

LATENT_DIM: Standard is 512. Higher = more expressive but slower
MAX_RESOLUTION: Maximum image size. Common: 256, 512, 1024. Higher = more memory
BATCH_SIZE: Start with 4-8. Larger = faster but needs more GPU memory
LEARNING_RATE: Start with 0.002. StyleGAN uses lower learning rates
TRUNCATION: 1.0 = full diversity, 0.7 = better quality. Lower = higher quality, less diversity
LAMBDA_GP: Gradient penalty weight. 10.0 is standard for stable training
NUM_MAPPING_LAYERS: 8 layers is standard. More = better style control

Training Progress Logging

The training script automatically logs progress to TensorBoard and saves checkpoints:

# Training logs are saved to:
# outputs/logs/ - TensorBoard logs
# outputs/checkpoints/ - Model checkpoints
# outputs/samples/ - Generated sample images

# View TensorBoard logs
tensorboard --logdir outputs/logs

# TensorBoard shows:
# - Generator loss per epoch
# - Discriminator loss per epoch
# - Gradient penalty per epoch
# - Generated image samples at progressive resolutions
# - Style mixing examples
# - Model parameters

# Checkpoints are saved as:
# outputs/checkpoints/generator_epoch_XX.pth
# outputs/checkpoints/discriminator_epoch_XX.pth
# outputs/checkpoints/best_generator.pth

Advanced Training Options

Use the advanced training script with additional features:

# Advanced training with all features
python train_advanced.py \
    --dataset mnist \
    --epochs 50 \
    --batch-size 128 \
    --latent-dim 128 \
    --beta 1.0 \
    --tensorboard \
    --early-stopping 10 \
    --lr-scheduler \
    --augment \
    --grad-clip 1.0 \
    --amp

# Features enabled:
# --tensorboard: Enable TensorBoard logging
# --early-stopping N: Stop if no improvement for N epochs
# --lr-scheduler: Use learning rate scheduling
# --augment: Enable data augmentation
# --grad-clip: Gradient clipping value
# --amp: Mixed precision training (faster, less memory)

# Resume training from checkpoint
python train_advanced.py \
    --resume outputs/checkpoints/generator_epoch_25.pth \
    --epochs 50

Training on Different Datasets

Examples for training on different datasets:

# Training on MNIST (grayscale, 28x28)
python train.py --dataset mnist --epochs 50 --latent-dim 64

# Training on CIFAR-10 (RGB, 32x32)
python train.py --dataset cifar10 --epochs 100 --latent-dim 128 --beta 0.5

# Training on custom dataset (RGB, any size)
python train.py \
    --dataset custom \
    --data-dir data/custom \
    --epochs 50 \
    --latent-dim 256 \
    --image-size 64

# For high-resolution images (128x128 or larger)
python train.py \
    --dataset custom \
    --data-dir data/custom \
    --latent-dim 512 \
    --hidden-dims "[64, 128, 256, 512, 1024]" \
    --image-size 128 \
    --batch-size 32

Detailed Architecture

StyleGAN Components

1. Mapping Network:

8-layer fully connected network
Transforms random noise Z to intermediate latent space W
Learns meaningful style representations
Enables disentangled style control
Outputs style vectors for each synthesis layer

2. Synthesis Network:

Progressive growing architecture
Generates images from low to high resolution
Uses adaptive instance normalization (AdaIN) for style injection
Each layer receives style information from W space
Supports resolution up to 1024x1024

3. Discriminator:

Progressive discriminator matching generator resolution
Provides adversarial feedback during training
Uses gradient penalty for training stability
Learns to distinguish real from generated images
Enables high-quality image generation

Loss Function

StyleGAN uses adversarial training with gradient penalty:

Generator Loss:
L_G = -E[D(G(z))]

Discriminator Loss:
L_D = E[D(x_real)] - E[D(G(z))] + λ × GP

Gradient Penalty:
GP = E[(||∇D(x̂)||₂ - 1)²]

Where:
- G: Generator network
- D: Discriminator network
- z: Random noise vector
- x_real: Real images
- x̂: Interpolated samples
- λ: Gradient penalty weight (default: 10.0)

The gradient penalty ensures stable training
and prevents mode collapse

Adaptive Instance Normalization (AdaIN)

Enables style injection at each layer:

AdaIN(x, y) = σ(y) × normalize(x) + μ(y)

Where:
- x: Feature map from previous layer
- y: Style vector from mapping network
- μ(y), σ(y): Style statistics (mean, std)
- normalize(x): Normalized feature map

This allows:
- Style injection at each resolution
- Fine-grained control over image attributes
- Disentangled style representation
- Progressive style application

Layer-by-Layer Architecture Details

Encoder Architecture:

Input Layer: Receives images of size [batch, channels, height, width]
Convolutional Blocks: Each block contains Conv2d → BatchNorm → LeakyReLU
Downsampling: Uses stride=2 convolutions to reduce spatial dimensions
Hidden Dimensions: Progressively increases channels [32→64→128→256]
Flattening: Converts feature maps to 1D vectors
Latent Projection: Two linear layers output μ (mean) and log σ² (log variance)

Decoder Architecture:

Latent Input: Receives sampled latent vector z of size [batch, latent_dim]
Initial Projection: Linear layer expands z to feature map size
Reshaping: Converts 1D vector to 2D feature maps
Transposed Convolutional Blocks: Each block contains ConvTranspose2d → BatchNorm → ReLU
Upsampling: Uses stride=2 transposed convolutions to increase spatial dimensions
Hidden Dimensions: Progressively decreases channels [256→128→64→32]
Output Layer: Final transposed convolution outputs reconstructed image

Mathematical Formulation

Complete mathematical description of StyleGAN:

# Mapping Network: f: Z → W
w = f(z)
where z ~ N(0, I), w ∈ W space

# Synthesis Network: g: W → X
x = g(w)
where x is generated image

# Style Injection (AdaIN):
y = AdaIN(x, w)
where y is styled feature map

# Generator: G(z) = g(f(z))
G(z) = g(f(z))

# Discriminator: D(x)
D(x) ∈ [0, 1]  # Real/fake probability

# Loss Function: Wasserstein GAN with Gradient Penalty
L_G = -E[D(G(z))]
L_D = E[D(x_real)] - E[D(G(z))] + λ × GP

# Where:
# - G: Generator network
# - D: Discriminator network
# - z: Random noise
# - w: Intermediate latent code
# - λ: Gradient penalty weight

Truncation Trick

Understanding the truncation parameter in StyleGAN:

Truncation = 1.0: Full diversity, uses entire W space distribution
Truncation = 0.7: Balanced quality and diversity (recommended)
Truncation < 0.7: Higher quality but less diversity, truncates W space
Truncation > 1.0: More diversity but may reduce quality
Formula: w' = w_avg + ψ × (w - w_avg), where ψ is truncation
Recommendation: Use 0.7 for generation, 1.0 for training

Advanced Features Usage

Image Generation Usage

Generate images using the trained StyleGAN model:

# Generate images from random latent vectors
from stylegan import StyleGANGenerator
import torch

generator = StyleGANGenerator(latent_dim=512, style_dim=512, max_resolution=256)
generator.load_state_dict(torch.load("outputs/checkpoints/generator.pth"))
generator.eval()

# Generate images
z = torch.randn(16, 512).to("cuda")
images = generator(z, truncation=0.7)

# Save generated images
from stylegan.utils import save_image_grid
save_image_grid(images, "generated_samples.png", nrow=4)

# Generate with specific latent vector
z = torch.randn(1, 512).to("cuda")
generated = generator(z, truncation=0.7)

Style Mixing & Interpolation

Mix styles and generate smooth transitions between images:

from stylegan.style_mix import style_mix
from stylegan.utils import interpolate_styles

# Style mixing at specific layers
z1 = torch.randn(1, 512).to("cuda")
z2 = torch.randn(1, 512).to("cuda")
mixed = style_mix(generator, z1, z2, mix_layers=[4, 5, 6])

# Linear interpolation in W space
w1 = generator.mapping(z1)
w2 = generator.mapping(z2)
interpolated = interpolate_styles(generator, w1, w2, steps=10)

# Save results
save_image_grid(interpolated, "interpolation.png", nrow=10)

Model Evaluation

Evaluate model performance with SSIM and diversity metrics:

from evaluate import evaluate_model

# Evaluate on test dataset
results = evaluate_model(
    checkpoint_path="outputs/checkpoints/generator.pth",
    dataset_path="./data/train",
    device="cuda"
)

# Results include:
# - SSIM (Structural Similarity Index)
# - Diversity metrics
# - Generated image quality
# - Model performance statistics

print(f"SSIM: {results['ssim']:.4f}")
print(f"Diversity: {results['diversity']:.4f}")
print(f"Quality Score: {results['quality']:.4f}")

Dataset Preparation

Prepare your custom dataset for training:

# Prepare custom dataset
# Place images in data/custom/ directory
# Supported formats: .jpg, .png, .jpeg

# Directory structure:
# data/custom/
#   ├── image1.jpg
#   ├── image2.png
#   └── ...

# The training script will automatically:
# - Load images from directory
# - Apply data augmentation
# - Resize to specified dimensions
# - Normalize pixel values
# - Create data loaders for training

# Use custom dataset for training
python train.py --dataset custom --data-dir data/custom --epochs 50

Latent Space Exploration

Explore the learned latent space:

# Style mixing visualization is available in the Jupyter notebooks
jupyter notebook notebooks/04_style_mixing.ipynb

# The notebooks include:
# - Style mixing at different layers
# - Interpolation examples in W space
# - Progressive growing visualization
# - Generated image samples

# Visualize style mixing
from visualize import visualize_style_mixing
visualize_style_mixing(generator, num_samples=16, mix_layers=[4, 5, 6])

Advanced Visualization Techniques

Use the visualization script for comprehensive analysis:

# Style mixing visualization
python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode style_mix --output style_mix.png

# Progressive growing visualization
python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode progressive --output progressive.png

# Interpolation visualization
python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode interpolate --num-steps 10 --output interpolation.png

# Generated samples grid
python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode samples --num-samples 16 --output samples.png

# All visualizations at once
python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode all --output_dir visualizations/

Model Export and Deployment

Export trained models for deployment:

# Export to ONNX format (for production deployment)
python scripts/convert_checkpoint.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --format onnx \
    --output stylegan_generator.onnx \
    --latent-dim 512

# Export to TorchScript (PyTorch mobile/edge)
python scripts/convert_checkpoint.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --format torchscript \
    --output stylegan_generator.pt

# Export generator and discriminator separately
python scripts/convert_checkpoint.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --format onnx \
    --components generator \
    --output-dir exported_models/

# Verify exported model
python scripts/convert_checkpoint.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --format onnx \
    --output stylegan_generator.onnx \
    --verify

Model Comparison and Analysis

Compare different trained models:

# Compare multiple models
python evaluate.py \
    --checkpoints outputs/checkpoints/generator_epoch_50.pth \
                  outputs/checkpoints/generator_epoch_100.pth \
    --dataset ./data/train \
    --output comparison_report.html

# Compare models with different truncation values
python generate.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --num_images 16 \
    --truncation 0.5 \
    --output_dir comparisons/trunc_0.5

python generate.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --num_images 16 \
    --truncation 0.7 \
    --output_dir comparisons/trunc_0.7

# Generate side-by-side comparisons
python visualize.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --mode compare \
    --num-samples 16 \
    --output-dir model_comparisons/

Complete Training Workflow

Step-by-Step Training Process

Step 1: Prepare Data

# Prepare custom dataset in data/train/ directory

# For custom dataset:
# Place images in data/train/
# Supported formats: .jpg, .png, .jpeg

# Preprocess images first:
python preprocess.py --input_dir ./raw_images --output_dir ./data/train --size 256

# The training script will automatically:
# - Load images from directory
# - Resize and normalize images
# - Create data loaders for progressive training

Step 2: Train Model

# Start training
python train.py --dataset ./data/train --output_dir ./outputs --epochs 100

# Training will:
# 1. Load and preprocess images
# 2. Initialize generator and discriminator
# 3. Train with adversarial loss + gradient penalty
# 4. Progressive growing from low to high resolution
# 5. Save checkpoints and best model
# 6. Log training history to TensorBoard
# 7. Generate sample images during training

Step 3: Monitor Training

Watch console output for epoch progress
Check TensorBoard: tensorboard --logdir outputs/logs
View generated samples in outputs/samples/
Best model saved as outputs/checkpoints/best_generator.pth

Step 4: Evaluate Model

# Evaluate on test set
python evaluate.py --checkpoint outputs/checkpoints/generator.pth --dataset ./data/train

# Calculate SSIM and diversity metrics

Step 5: Generate Images

# Generate images
python generate.py --checkpoint outputs/checkpoints/generator.pth --num_images 16 --output_dir ./generated

# Generate style interpolation
python generate.py --checkpoint outputs/checkpoints/generator.pth --interpolate --num_steps 10 --output_dir ./generated

API Usage Examples

Image Generation Endpoint (cURL)

Generate images using the REST API:

curl -X POST http://localhost:5000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "num_images": 10
  }'

# Response:
# {
#   "num_images": 10,
#   "images": ["base64_encoded_image1", ...],
#   "status": "success"
# }

Latent Interpolation Endpoint (cURL)

Generate interpolation between latent points:

curl -X POST http://localhost:5000/interpolate \
  -H "Content-Type: application/json" \
  -d '{
    "num_steps": 10
  }'

# Response:
# {
#   "num_steps": 10,
#   "images": ["base64_encoded_image1", ...],
#   "status": "success"
# }

Health Check (cURL)

Check API server health and model status:

curl -X GET http://localhost:5000/health

# Response:
# {
#   "status": "healthy",
#   "model_loaded": true,
#   "device": "cuda"
# }

Python Requests Example

Use the API with Python requests library:

import requests
import base64
from PIL import Image
from io import BytesIO

# Image generation endpoint
response = requests.post(
    'http://localhost:5000/generate',
    json={'num_images': 10}
)
data = response.json()

# Decode and save images
for i, img_base64 in enumerate(data['images']):
    img_data = base64.b64decode(img_base64)
    img = Image.open(BytesIO(img_data))
    img.save(f'generated_{i}.png')

# Interpolation endpoint
interp_response = requests.post(
    'http://localhost:5000/interpolate',
    json={'num_steps': 10}
)
print(interp_response.json())

# Health check
health = requests.get('http://localhost:5000/health')
print(health.json())

JavaScript/Fetch Example

Use the API with JavaScript fetch API:

// Image generation
fetch('http://localhost:5000/generate', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
        num_images: 10
    })
})
.then(res => res.json())
.then(data => {
    console.log('Generated', data.num_images, 'images');
    // Display images from base64 data
    data.images.forEach((imgBase64, i) => {
        const img = document.createElement('img');
        img.src = 'data:image/png;base64,' + imgBase64;
        document.body.appendChild(img);
    });
});

// Interpolation
fetch('http://localhost:5000/interpolate', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
        num_steps: 10
    })
})
.then(res => res.json())
.then(data => {
    console.log('Interpolation steps:', data.num_steps);
});

// Health check
fetch('http://localhost:5000/health')
.then(res => res.json())
.then(data => console.log('Status:', data));

StyleGAN Model Variants

Model	Max Resolution	Latent Dim	Use Case	Quality
Small StyleGAN	128x128	512	Fast inference, basic tasks	Good
Medium StyleGAN	256x256	512	Balanced quality/speed	Better
Large StyleGAN	512x512	512	Higher quality generation	Best
XL StyleGAN	1024x1024	512	Research, high quality	Excellent

Dataset Information

Dataset Formats

The project supports multiple dataset formats for image training:

Built-in datasets: MNIST, CIFAR-10 (automatically downloaded)
Custom dataset: Directory of images (JPG, PNG, JPEG)
Automatic image loading and preprocessing
Data augmentation support
Train/validation split support
Multiple image format support

Custom Dataset Format

Training data is stored as image files in a directory:

# Custom dataset directory structure
data/custom/
├── image1.jpg
├── image2.png
├── image3.jpeg
└── ...

# The training script automatically:
# - Loads images from directory
# - Applies data augmentation
# - Resizes to specified dimensions
# - Normalizes pixel values
# - Creates data loaders for training

Adding Custom Training Data

Add your own image dataset for training:

# Place images in data/custom/ directory
# Supported formats: .jpg, .png, .jpeg

# Example:
mkdir -p data/custom
cp your_images/*.jpg data/custom/

# Use in training
python train.py --dataset custom --data-dir data/custom --epochs 50

# The script will automatically:
# - Load all images from directory
# - Apply augmentation if enabled
# - Resize and normalize images
# - Create train/validation splits

Troubleshooting & Best Practices

Common Issues

CUDA Out of Memory: Reduce batch_size in train.py, use smaller d_model (256 instead of 512), reduce num_layers, or use CPU mode
Model Not Found: Ensure model is trained first by running train.py or loading from models/ directory. Check model path is correct
Vocabulary Not Found: Ensure vocabularies are saved during training. Check vocab_dir path matches training save_dir
Slow Generation: Use smaller latent_dim (64 instead of 128), reduce hidden_dims, or use CPU mode
API Connection Error: Check if api.py is running on port 5000. Verify model path is correct
Import Errors: Verify all dependencies installed: pip install -r requirements.txt. Check Python version (3.8+)
Image Size Mismatch: Ensure all images are same size or use data augmentation to resize. Check IMAGE_SIZE in config
Poor Generation Quality: Train for more epochs, use larger latent_dim, increase training data, or adjust beta (KL weight)
Training Loss Not Decreasing: Check learning rate (may be too high/low), verify data format, check for data issues
Validation Loss Increasing: Model may be overfitting. Increase beta (KL weight), use more data, or reduce model size
Blurry Generated Images: Increase latent_dim, train longer, or adjust beta to balance reconstruction and KL divergence
Mode Collapse: Increase beta value, use more diverse training data, or try different architectures
KL Divergence Too High: Reduce beta value or increase model capacity
KL Divergence Too Low: Increase beta value to encourage better latent structure
Training Instability: Reduce learning rate, use gradient clipping, or try different optimizer
Memory Issues: Reduce batch size, use smaller image size, or enable mixed precision training

Performance Optimization Tips

GPU Memory: Use gradient accumulation for effective larger batch sizes: accumulate gradients over N batches before updating
Mixed Precision: Enable AMP (Automatic Mixed Precision) for 2x speedup and 50% memory reduction
Data Loading: Use multiple workers (num_workers=4-8) and pin_memory=True for faster data loading
Model Pruning: Remove unnecessary layers or reduce hidden dimensions for faster inference
Quantization: Use INT8 quantization for 4x speedup in production (with slight quality loss)
Batch Inference: Generate multiple images in batches rather than one at a time
Model Caching: Load model once and reuse for multiple generations
ONNX Runtime: Use exported ONNX models with ONNX Runtime for faster inference

Best Practices

Training Data: Use diverse, high-quality image datasets. More data = better results. Aim for 1K+ images minimum
Data Format: Ensure images are in supported formats (JPG, PNG, JPEG). Consistent image sizes recommended
Data Preprocessing: Normalize pixel values, resize images to consistent dimensions, apply augmentation if needed
Batch Size: Use smaller batches (32-64) for limited GPU memory. Larger batches (128+) for faster training if memory allows
Learning Rate: Start with 0.001 and adjust based on training loss. Use learning rate scheduling for better convergence
Gradient Clipping: Default is 1.0. Increase if training is unstable, decrease if gradients are too small
Beta (KL Weight): Start with 1.0. Higher = more regularization, lower = better reconstruction. Adjust based on results
Model Selection: Start with latent_dim=64 for speed/testing. Use 128+ for production quality. Higher = more expressive
Evaluation: Regularly evaluate reconstruction and KL divergence on validation set. Monitor for overfitting
Latent Dimension: Higher latent_dim = more expressive but slower. Common values: 64, 128, 256. Start with 128
Checkpointing: Model saves checkpoints automatically. Can resume training from checkpoint if needed
API Rate Limiting: Implement rate limiting for production deployments. Consider using nginx or similar
Logging: Monitor TensorBoard logs for debugging and optimization. View in outputs/logs/
Device Selection: Use CUDA if available for faster training. CPU works but much slower
Beta Tuning: Start with β=1.0, increase for better latent structure, decrease for better reconstruction
Latent Dimension: Start with 128, increase for more expressive models, decrease for faster training
Early Stopping: Monitor validation loss, stop if no improvement for 10-15 epochs
Checkpointing: Save checkpoints every 5-10 epochs to avoid losing progress
Data Augmentation: Use for small datasets to improve generalization
Learning Rate: Use learning rate scheduling (ReduceLROnPlateau) for better convergence

Use Cases and Applications

Image Generation: Generate new images from random latent vectors
Image Reconstruction: Reconstruct and denoise images
Image Interpolation: Create smooth transitions between images
Anomaly Detection: Detect outliers by high reconstruction error
Data Augmentation: Generate synthetic training data
Image Editing: Manipulate images in latent space
Feature Learning: Learn meaningful image representations
Dimensionality Reduction: Compress images to low-dimensional latent space
Style Transfer: Transfer styles by manipulating latent vectors
Image Completion: Complete missing parts of images

Performance Optimization

GPU Usage: Set CUDA_VISIBLE_DEVICES for multi-GPU systems. Use GPU for training and inference when available
Model Selection: Use latent_dim=64 for fastest inference. Larger models (128+, 256+) for better quality
Batch Processing: Generate multiple images in batches for efficient processing. Reduces overhead
Caching: API server caches model in memory. Model loads once on first request, then reused
Image Size: Use smaller image sizes (64x64) for faster generation. Larger images (128x128+) for better quality
Latent Sampling: Sample from standard normal distribution N(0, I) for generation. Use interpolation for smooth transitions
Model Quantization: Consider model quantization for production to reduce memory and speed up inference
Async Processing: For high-throughput, consider async API or queue system for batch image generation
Memory Management: Clear GPU cache between batches if running out of memory: torch.cuda.empty_cache()

Expected Training Times

Approximate training times for different configurations:

Dataset	Image Size	Latent Dim	Batch Size	Epochs	GPU	Time
MNIST	28×28	64	128	50	GTX 1080	~15 min
MNIST	28×28	128	128	50	GTX 1080	~20 min
CIFAR-10	32×32	128	128	100	GTX 1080	~2 hours
Custom	64×64	256	64	50	RTX 3090	~3 hours
Custom	128×128	512	32	50	RTX 3090	~8 hours

Note: Times are approximate and depend on hardware, dataset size, and other factors. CPU training is typically 10-20x slower.

Model Size and Memory Requirements

Approximate model sizes and memory usage:

Latent Dim	Hidden Dims	Model Size	GPU Memory (Training)	GPU Memory (Inference)
64	[32, 64, 128]	~5 MB	~500 MB	~200 MB
128	[32, 64, 128, 256]	~15 MB	~1.5 GB	~500 MB
256	[64, 128, 256, 512]	~50 MB	~4 GB	~1.5 GB
512	[128, 256, 512, 1024]	~200 MB	~12 GB	~4 GB

Note: Memory usage depends on batch size and image size. Larger batches and images require more memory.

Real-World Examples & Use Cases

Example 1: Training on Custom Face Dataset

Complete workflow for training on a custom face dataset:

# 1. Prepare dataset
mkdir -p data/faces
# Copy face images to data/faces/

# 2. Train StyleGAN model
python train.py \
    --dataset data/faces \
    --output_dir ./outputs \
    --epochs 100 \
    --batch-size 4 \
    --max-resolution 256 \
    --tensorboard

# 3. Generate new faces
python generate.py \
    --checkpoint outputs/checkpoints/best_generator.pth \
    --num_images 16 \
    --output_dir ./generated_faces

# 4. Create face style mixing
python generate.py \
    --checkpoint outputs/checkpoints/best_generator.pth \
    --interpolate \
    --num_steps 20 \
    --output_dir ./face_interpolation

Example 2: Style Transfer

Use StyleGAN for style transfer and manipulation:

# Train on style dataset
python train.py --dataset style_images --epochs 100

# Mix styles from different images
from stylegan.style_mix import style_mix

z1 = torch.randn(1, 512)  # Style 1
z2 = torch.randn(1, 512)  # Style 2

# Mix styles at specific layers
mixed = style_mix(generator, z1, z2, mix_layers=[4, 5, 6])

# Save result
save_image(mixed, "style_mixed.png")

Example 3: High-Resolution Generation

Generate high-resolution images with StyleGAN:

# Train on high-resolution dataset
python train.py --dataset high_res_images --max-resolution 1024 --epochs 200

# Generate high-resolution images
python generate.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --num_images 16 \
    --output_dir ./high_res_generated

Example 4: Data Augmentation

Generate synthetic training data:

# Train StyleGAN on small dataset
python train.py --dataset small_dataset --epochs 100

# Generate synthetic images to augment dataset
python generate.py \
    --checkpoint outputs/checkpoints/generator.pth \
    --num_images 1000 \
    --output_dir data/augmented/

# Use generated images as additional training data
# Combine with original dataset for better model performance

Example 5: Style Space Manipulation

Manipulate images by editing style vectors in W space:

# Generate image from latent code
from stylegan import StyleGANGenerator
import torch

generator = StyleGANGenerator(latent_dim=512, style_dim=512, max_resolution=256)
generator.load_state_dict(torch.load("outputs/checkpoints/generator.pth"))
generator.eval()

# Generate base image
z = torch.randn(1, 512)
w = generator.mapping(z)
image = generator.synthesis(w)

# Manipulate style vector (e.g., change specific attributes)
w_modified = w.clone()
w_modified[:, 0:10] += 0.5  # Modify first 10 dimensions

# Generate modified image
modified_image = generator.synthesis(w_modified)

# Save result
save_image(modified_image, "modified_image.png")

Integration Examples

Integration with Flask Web Application

Integrate VAE into a Flask web application:

from flask import Flask, request, jsonify, send_file
from generate import load_model, generate_images
import io
from PIL import Image

app = Flask(__name__)
model = load_model("outputs/vae_model.pth", device="cuda")

@app.route('/generate', methods=['POST'])
def generate():
    num_images = request.json.get('num_images', 10)
    images = generate_images(model, num_images=num_images, device="cuda")
    
    # Convert to base64 for JSON response
    import base64
    image_data = []
    for img in images:
        buffered = io.BytesIO()
        img.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode()
        image_data.append(img_str)
    
    return jsonify({'images': image_data})

@app.route('/reconstruct', methods=['POST'])
def reconstruct():
    # Receive image file
    file = request.files['image']
    image = Image.open(file.stream)
    
    # Reconstruct using VAE
    reconstructed = model.reconstruct(image)
    
    # Return reconstructed image
    img_io = io.BytesIO()
    reconstructed.save(img_io, 'PNG')
    img_io.seek(0)
    return send_file(img_io, mimetype='image/png')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Integration with FastAPI

Create a FastAPI service for VAE image generation:

from fastapi import FastAPI, File, UploadFile
from fastapi.responses import FileResponse
from generate import load_model, generate_images
import torch

app = FastAPI()
model = load_model("outputs/vae_model.pth", device="cuda")

@app.post("/generate")
async def generate_images_endpoint(num_images: int = 10):
    """Generate images from random latent vectors."""
    images = generate_images(model, num_images=num_images, device="cuda")
    # Save and return images
    return {"status": "success", "num_images": num_images}

@app.post("/reconstruct")
async def reconstruct_image(file: UploadFile = File(...)):
    """Reconstruct uploaded image."""
    image_data = await file.read()
    # Process and reconstruct image
    reconstructed = model.reconstruct(image_data)
    return FileResponse(reconstructed)

@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {"status": "healthy", "model_loaded": True}

Integration with Streamlit

Create an interactive Streamlit application:

import streamlit as st
from generate import load_model, generate_images
import torch

st.title("VAE Image Generation")

# Load model
@st.cache_resource
def load_vae_model():
    return load_model("outputs/vae_model.pth", device="cuda")

model = load_vae_model()

# Sidebar controls
num_images = st.sidebar.slider("Number of Images", 1, 64, 16)
latent_dim = st.sidebar.slider("Latent Dimension", 64, 512, 128)

# Generate button
if st.button("Generate Images"):
    with st.spinner("Generating images..."):
        images = generate_images(model, num_images=num_images, device="cuda")
        st.image(images, width=200)

# Interpolation
st.header("Latent Space Interpolation")
steps = st.slider("Interpolation Steps", 5, 30, 10)
if st.button("Create Interpolation"):
    # Create interpolation
    st.image(interpolated_images, width=200)

Contact Information

Get in Touch

Developer: Molla Samser
Designer & Tester: Rima Khatun

rskworld.in

help@rskworld.in support@rskworld.in

+91 93305 39277

License

This project is for educational purposes only. See LICENSE file for more details.

Theme Settings

Color Scheme

Display Options

Font Size

StyleGAN Image Generation

Project Description

Project Screenshots

Core Features

StyleGAN Architecture

Style-Based Generation

Mapping & Synthesis Networks

Adversarial Training

TensorBoard Integration

Web Interface

Advanced Features

Style Mixing & Interpolation

Data Augmentation

Multiple Dataset Support

Resume Training

Web Interface Features

Technologies Used

Installation & Usage

Installation

PyTorch Installation

Verify Installation

Training the Model

Image Generation

Model Evaluation

Style Mixing Visualization

Jupyter Notebooks

Project Structure

Configuration Options

Model Configuration

Training Progress Logging

Advanced Training Options

Training on Different Datasets

Detailed Architecture

StyleGAN Components

Loss Function

Adaptive Instance Normalization (AdaIN)

Layer-by-Layer Architecture Details

Mathematical Formulation

Truncation Trick

Advanced Features Usage

Image Generation Usage

Style Mixing & Interpolation

Model Evaluation

Dataset Preparation

Latent Space Exploration

Advanced Visualization Techniques

Model Export and Deployment

Model Comparison and Analysis

Complete Training Workflow

Step-by-Step Training Process

API Usage Examples

Image Generation Endpoint (cURL)

Latent Interpolation Endpoint (cURL)

Health Check (cURL)

Python Requests Example

JavaScript/Fetch Example

StyleGAN Model Variants

Dataset Information

Dataset Formats

Custom Dataset Format

Adding Custom Training Data

Troubleshooting & Best Practices

Common Issues

Performance Optimization Tips

Best Practices

Use Cases and Applications

Performance Optimization

Expected Training Times

Model Size and Memory Requirements

Real-World Examples & Use Cases

Example 1: Training on Custom Face Dataset

Example 2: Style Transfer

Example 3: High-Resolution Generation

Example 4: Data Augmentation

Example 5: Style Space Manipulation

Integration Examples