help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back

StyleGAN Image Generation

Complete Documentation & Project Details for Style-Based Generator & High-Resolution Image Generation

Project Description

This project implements StyleGAN for generating high-resolution photorealistic images using style-based generator architecture with adaptive instance normalization (AdaIN). The architecture uses mapping network to transform random noise to intermediate latent space (W), synthesis network with progressive growing from low to high resolution, and style mixing capabilities for fine-grained control over image attributes. Perfect for learning StyleGAN fundamentals and high-quality image generation. The system includes adversarial training, style mixing, truncation trick, TensorBoard integration, and comprehensive training tools.

StyleGAN uses style-based generation where a mapping network transforms random noise Z to intermediate latent space W, and a synthesis network generates images progressively with style injection at each layer using AdaIN. The generator uses progressive growing to build images from low to high resolution (up to 1024x1024), while the discriminator provides adversarial feedback. The implementation provides complete PyTorch support, comprehensive training pipeline, style mixing utilities, evaluation metrics, and deployment tools for high-resolution image generation applications.

Project Screenshots

1 / 4
StyleGAN Image Generation

Core Features

StyleGAN Architecture

  • Style-based generator
  • Mapping network
  • Synthesis network
  • Progressive growing
  • High-resolution generation

Style-Based Generation

  • Intermediate latent space (W)
  • Adaptive instance normalization
  • Style injection at layers
  • Controllable attributes
  • Fine-grained control

Mapping & Synthesis Networks

  • Mapping network (Z to W)
  • Synthesis network (progressive)
  • AdaIN layers
  • Multi-resolution generation
  • Deep network architecture

Adversarial Training

  • Generator loss
  • Discriminator loss
  • Gradient penalty
  • Truncation trick
  • Training stability

TensorBoard Integration

  • Real-time loss visualization
  • Multi-resolution image tracking
  • Training progress monitoring
  • Interactive dashboard
  • Comprehensive logging

Web Interface

  • Flask-based web app
  • Interactive image generation
  • Real-time generation
  • Download generated images
  • User-friendly interface

Advanced Features

Style Mixing & Interpolation

  • Style mixing at layers
  • Linear interpolation in W space
  • Controllable style attributes
  • Smooth image transitions
  • Visual exploration

Data Augmentation

  • Adaptive augmentation
  • Mixup augmentation
  • Cutout augmentation
  • Multiple augmentation levels

Multiple Dataset Support

  • Custom dataset support
  • CelebA dataset
  • CIFAR-10 dataset
  • MNIST dataset

Resume Training

  • Checkpoint resuming
  • Automatic checkpoint detection
  • Training continuation
  • Progress preservation
  • Early stopping support

Web Interface Features

Feature Description Usage
Image Generation Generate images from random noise Select number of images and click Generate
Real-time Generation Generate images in real-time Images appear as they are generated
Download Images Download generated images Click download button for each image
Model Selection Choose different trained models Select from available checkpoints

Technologies Used

This StyleGAN Image Generation project is built using modern deep learning and computer vision technologies. The core implementation uses Python as the primary programming language and PyTorch for deep learning operations. The project includes a StyleGAN architecture with style-based generator and discriminator networks for high-resolution photorealistic image generation. The project includes Jupyter Notebook support for interactive development and demonstrations, and comprehensive adversarial training with style mixing capabilities for assessing image quality.

StyleGAN uses style-based generation where a mapping network transforms random noise Z to intermediate latent space W, and a synthesis network generates images progressively with style injection at each layer using AdaIN. The system supports progressive growing for building images from low to high resolution, style mixing for fine-grained control over image attributes, and truncation trick for quality vs diversity trade-off, making it suitable for various high-resolution image generation applications.

Python 3.8+ PyTorch 2.0+ StyleGAN AdaIN Image Generation TensorBoard Computer Vision Jupyter Notebook Progressive Growing Style Mixing

Installation & Usage

Installation

Install all required dependencies for the StyleGAN Image Generation project:

# Install all requirements pip install -r requirements.txt # The StyleGAN model will be trained on your data # Prepare your dataset in data/train/ directory # Images should be preprocessed and resized

PyTorch Installation

Install PyTorch (CPU or GPU version):

# For CPU only pip install torch torchvision torchaudio # For CUDA (GPU support) - CUDA 11.8 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # For CUDA 12.1 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # Verify installation python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"

Verify Installation

Test the model and verify all components work:

# Test model architecture python test_model.py # This will verify: # - Model can be instantiated # - Forward pass works # - All components function correctly # - Device compatibility (CPU/CUDA)

Training the Model

Train the StyleGAN model on your image dataset:

# Prepare your dataset # Place images in data/train/ directory # Preprocess images to consistent size # Basic training with default parameters python train.py --dataset ./data/train --output_dir ./outputs --epochs 100 # Configure in config.py: # - BATCH_SIZE = 4 # - EPOCHS = 100 # - LEARNING_RATE = 0.002 # - LATENT_DIM = 512 # - MAX_RESOLUTION = 256 # - TRUNCATION = 1.0 # Or use Jupyter notebook jupyter notebook notebooks/02_training_stylegan.ipynb # Training will: # - Load and preprocess images # - Initialize generator and discriminator # - Train with adversarial loss + gradient penalty # - Progressive growing from low to high resolution # - Save checkpoints and generated samples # - Log to TensorBoard

Training Parameters (config.py):

  • BATCH_SIZE: Training batch size (default: 4)
  • EPOCHS: Number of training epochs (default: 100)
  • LATENT_DIM: Dimension of latent space Z (default: 512)
  • STYLE_DIM: Dimension of style space W (default: 512)
  • LEARNING_RATE: Learning rate (default: 0.002)
  • MAX_RESOLUTION: Maximum image resolution (default: 256)
  • TRUNCATION: Truncation trick value (default: 1.0)
  • LAMBDA_GP: Gradient penalty weight (default: 10.0)
  • DEVICE: Device to use - 'cuda' or 'cpu' (default: 'cuda')

Image Generation

Generate images using the trained StyleGAN model:

# Generate images from trained model python generate.py --checkpoint outputs/checkpoints/generator.pth --num_images 16 --output_dir ./generated # Generate with style interpolation python generate.py --checkpoint outputs/checkpoints/generator.pth --interpolate --num_steps 10 --output_dir ./generated # Or use Jupyter notebook jupyter notebook notebooks/03_generate_images.ipynb # Using Python API from stylegan import StyleGANGenerator import torch generator = StyleGANGenerator(latent_dim=512, style_dim=512, max_resolution=256) generator.load_state_dict(torch.load("outputs/checkpoints/generator.pth")) generator.eval() # Generate images z = torch.randn(16, 512) images = generator(z, truncation=0.7) print(f"Generated {len(images)} images")

Model Evaluation

Evaluate the trained StyleGAN model performance:

# Evaluate trained model python evaluate.py --checkpoint outputs/checkpoints/generator.pth --dataset ./data/train # The evaluation includes: # - SSIM (Structural Similarity Index) # - Diversity metrics # - Generated image quality assessment # - Model performance metrics

Style Mixing Visualization

Visualize style mixing and generate interpolations:

# Style mixing visualization python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode all --output_dir ./visualizations # The visualization includes: # - Style mixing at different layers # - Interpolation in W space # - Generated image samples # - Progressive growing visualization

Jupyter Notebooks

Open the interactive Jupyter notebooks for demonstrations:

# StyleGAN demonstration notebooks jupyter notebook notebooks/01_stylegan_introduction.ipynb jupyter notebook notebooks/02_training_stylegan.ipynb jupyter notebook notebooks/03_generate_images.ipynb jupyter notebook notebooks/04_style_mixing.ipynb # The notebooks include: # - Model architecture visualization # - Mapping and synthesis network explanation # - Training setup examples # - Image generation examples # - Style mixing demonstrations # Or use JupyterLab jupyter lab notebooks/

Project Structure

stylegan-generation/
├── README.md # Main documentation
├── requirements.txt # Python dependencies
├── LICENSE # License file
├── CHANGELOG.md # Changelog
├── CONTRIBUTING.md # Contribution guidelines
├── PROJECT_SUMMARY.md # Project summary
│
├── Core Modules
│ ├── train.py # Training script
│ ├── generate.py # Image generation
│ ├── evaluate.py # Model evaluation
│ ├── visualize.py # Visualization tools
│ ├── preprocess.py # Data preprocessing
│ ├── example.py # Simple example
│ └── config.py # Configuration settings
│
├── StyleGAN Package (stylegan/)
│ ├── model.py # Generator & Discriminator
│ ├── layers.py # AdaIN, Noise, Style layers
│ ├── losses.py # Loss functions
│ ├── utils.py # Utility functions
│ ├── style_mix.py # Style mixing utilities
│ └── metrics.py # Evaluation metrics
│
├── Data
│ └── (training datasets: Custom images)
│
├── Outputs
│ ├── checkpoints/ # Model checkpoints
│ ├── samples/ # Generated samples
│ └── logs/ # TensorBoard logs
│
├── Notebooks
│ ├── 01_stylegan_introduction.ipynb
│ ├── 02_training_stylegan.ipynb
│ ├── 03_generate_images.ipynb
│ └── 04_style_mixing.ipynb
│
├── Scripts
│ ├── download_dataset.py # Dataset download utility
│ └── convert_checkpoint.py # Checkpoint converter

Configuration Options

Model Configuration

Customize model and training parameters in config.py and train.py:

# Model Architecture (config.py) LATENT_DIM = 512 # Dimension of latent space Z STYLE_DIM = 512 # Dimension of style space W MAX_RESOLUTION = 256 # Maximum image resolution BASE_CHANNELS = 512 # Base channels for generator NUM_MAPPING_LAYERS = 8 # Number of mapping network layers # Training Parameters (config.py) BATCH_SIZE = 4 # Training batch size LEARNING_RATE = 0.002 # Learning rate EPOCHS = 100 # Number of training epochs BETA1 = 0.0 # Adam optimizer beta1 BETA2 = 0.99 # Adam optimizer beta2 TRUNCATION = 1.0 # Truncation trick value LAMBDA_GP = 10.0 # Gradient penalty weight N_CRITIC = 1 # Discriminator updates per generator update DEVICE = 'cuda' # Device: 'cuda' or 'cpu' # Generation Configuration NUM_IMAGES = 16 # Number of images to generate TRUNCATION = 0.7 # Truncation for generation

Configuration Tips:

  • LATENT_DIM: Standard is 512. Higher = more expressive but slower
  • MAX_RESOLUTION: Maximum image size. Common: 256, 512, 1024. Higher = more memory
  • BATCH_SIZE: Start with 4-8. Larger = faster but needs more GPU memory
  • LEARNING_RATE: Start with 0.002. StyleGAN uses lower learning rates
  • TRUNCATION: 1.0 = full diversity, 0.7 = better quality. Lower = higher quality, less diversity
  • LAMBDA_GP: Gradient penalty weight. 10.0 is standard for stable training
  • NUM_MAPPING_LAYERS: 8 layers is standard. More = better style control

Training Progress Logging

The training script automatically logs progress to TensorBoard and saves checkpoints:

# Training logs are saved to: # outputs/logs/ - TensorBoard logs # outputs/checkpoints/ - Model checkpoints # outputs/samples/ - Generated sample images # View TensorBoard logs tensorboard --logdir outputs/logs # TensorBoard shows: # - Generator loss per epoch # - Discriminator loss per epoch # - Gradient penalty per epoch # - Generated image samples at progressive resolutions # - Style mixing examples # - Model parameters # Checkpoints are saved as: # outputs/checkpoints/generator_epoch_XX.pth # outputs/checkpoints/discriminator_epoch_XX.pth # outputs/checkpoints/best_generator.pth

Advanced Training Options

Use the advanced training script with additional features:

# Advanced training with all features python train_advanced.py \ --dataset mnist \ --epochs 50 \ --batch-size 128 \ --latent-dim 128 \ --beta 1.0 \ --tensorboard \ --early-stopping 10 \ --lr-scheduler \ --augment \ --grad-clip 1.0 \ --amp # Features enabled: # --tensorboard: Enable TensorBoard logging # --early-stopping N: Stop if no improvement for N epochs # --lr-scheduler: Use learning rate scheduling # --augment: Enable data augmentation # --grad-clip: Gradient clipping value # --amp: Mixed precision training (faster, less memory) # Resume training from checkpoint python train_advanced.py \ --resume outputs/checkpoints/generator_epoch_25.pth \ --epochs 50

Training on Different Datasets

Examples for training on different datasets:

# Training on MNIST (grayscale, 28x28) python train.py --dataset mnist --epochs 50 --latent-dim 64 # Training on CIFAR-10 (RGB, 32x32) python train.py --dataset cifar10 --epochs 100 --latent-dim 128 --beta 0.5 # Training on custom dataset (RGB, any size) python train.py \ --dataset custom \ --data-dir data/custom \ --epochs 50 \ --latent-dim 256 \ --image-size 64 # For high-resolution images (128x128 or larger) python train.py \ --dataset custom \ --data-dir data/custom \ --latent-dim 512 \ --hidden-dims "[64, 128, 256, 512, 1024]" \ --image-size 128 \ --batch-size 32

Detailed Architecture

StyleGAN Components

1. Mapping Network:

  • 8-layer fully connected network
  • Transforms random noise Z to intermediate latent space W
  • Learns meaningful style representations
  • Enables disentangled style control
  • Outputs style vectors for each synthesis layer

2. Synthesis Network:

  • Progressive growing architecture
  • Generates images from low to high resolution
  • Uses adaptive instance normalization (AdaIN) for style injection
  • Each layer receives style information from W space
  • Supports resolution up to 1024x1024

3. Discriminator:

  • Progressive discriminator matching generator resolution
  • Provides adversarial feedback during training
  • Uses gradient penalty for training stability
  • Learns to distinguish real from generated images
  • Enables high-quality image generation

Loss Function

StyleGAN uses adversarial training with gradient penalty:

Generator Loss: L_G = -E[D(G(z))] Discriminator Loss: L_D = E[D(x_real)] - E[D(G(z))] + λ × GP Gradient Penalty: GP = E[(||∇D(x̂)||₂ - 1)²] Where: - G: Generator network - D: Discriminator network - z: Random noise vector - x_real: Real images - x̂: Interpolated samples - λ: Gradient penalty weight (default: 10.0) The gradient penalty ensures stable training and prevents mode collapse

Adaptive Instance Normalization (AdaIN)

Enables style injection at each layer:

AdaIN(x, y) = σ(y) × normalize(x) + μ(y) Where: - x: Feature map from previous layer - y: Style vector from mapping network - μ(y), σ(y): Style statistics (mean, std) - normalize(x): Normalized feature map This allows: - Style injection at each resolution - Fine-grained control over image attributes - Disentangled style representation - Progressive style application

Layer-by-Layer Architecture Details

Encoder Architecture:

  • Input Layer: Receives images of size [batch, channels, height, width]
  • Convolutional Blocks: Each block contains Conv2d → BatchNorm → LeakyReLU
  • Downsampling: Uses stride=2 convolutions to reduce spatial dimensions
  • Hidden Dimensions: Progressively increases channels [32→64→128→256]
  • Flattening: Converts feature maps to 1D vectors
  • Latent Projection: Two linear layers output μ (mean) and log σ² (log variance)

Decoder Architecture:

  • Latent Input: Receives sampled latent vector z of size [batch, latent_dim]
  • Initial Projection: Linear layer expands z to feature map size
  • Reshaping: Converts 1D vector to 2D feature maps
  • Transposed Convolutional Blocks: Each block contains ConvTranspose2d → BatchNorm → ReLU
  • Upsampling: Uses stride=2 transposed convolutions to increase spatial dimensions
  • Hidden Dimensions: Progressively decreases channels [256→128→64→32]
  • Output Layer: Final transposed convolution outputs reconstructed image

Mathematical Formulation

Complete mathematical description of StyleGAN:

# Mapping Network: f: Z → W w = f(z) where z ~ N(0, I), w ∈ W space # Synthesis Network: g: W → X x = g(w) where x is generated image # Style Injection (AdaIN): y = AdaIN(x, w) where y is styled feature map # Generator: G(z) = g(f(z)) G(z) = g(f(z)) # Discriminator: D(x) D(x) ∈ [0, 1] # Real/fake probability # Loss Function: Wasserstein GAN with Gradient Penalty L_G = -E[D(G(z))] L_D = E[D(x_real)] - E[D(G(z))] + λ × GP # Where: # - G: Generator network # - D: Discriminator network # - z: Random noise # - w: Intermediate latent code # - λ: Gradient penalty weight

Truncation Trick

Understanding the truncation parameter in StyleGAN:

  • Truncation = 1.0: Full diversity, uses entire W space distribution
  • Truncation = 0.7: Balanced quality and diversity (recommended)
  • Truncation < 0.7: Higher quality but less diversity, truncates W space
  • Truncation > 1.0: More diversity but may reduce quality
  • Formula: w' = w_avg + ψ × (w - w_avg), where ψ is truncation
  • Recommendation: Use 0.7 for generation, 1.0 for training

Advanced Features Usage

Image Generation Usage

Generate images using the trained StyleGAN model:

# Generate images from random latent vectors from stylegan import StyleGANGenerator import torch generator = StyleGANGenerator(latent_dim=512, style_dim=512, max_resolution=256) generator.load_state_dict(torch.load("outputs/checkpoints/generator.pth")) generator.eval() # Generate images z = torch.randn(16, 512).to("cuda") images = generator(z, truncation=0.7) # Save generated images from stylegan.utils import save_image_grid save_image_grid(images, "generated_samples.png", nrow=4) # Generate with specific latent vector z = torch.randn(1, 512).to("cuda") generated = generator(z, truncation=0.7)

Style Mixing & Interpolation

Mix styles and generate smooth transitions between images:

from stylegan.style_mix import style_mix from stylegan.utils import interpolate_styles # Style mixing at specific layers z1 = torch.randn(1, 512).to("cuda") z2 = torch.randn(1, 512).to("cuda") mixed = style_mix(generator, z1, z2, mix_layers=[4, 5, 6]) # Linear interpolation in W space w1 = generator.mapping(z1) w2 = generator.mapping(z2) interpolated = interpolate_styles(generator, w1, w2, steps=10) # Save results save_image_grid(interpolated, "interpolation.png", nrow=10)

Model Evaluation

Evaluate model performance with SSIM and diversity metrics:

from evaluate import evaluate_model # Evaluate on test dataset results = evaluate_model( checkpoint_path="outputs/checkpoints/generator.pth", dataset_path="./data/train", device="cuda" ) # Results include: # - SSIM (Structural Similarity Index) # - Diversity metrics # - Generated image quality # - Model performance statistics print(f"SSIM: {results['ssim']:.4f}") print(f"Diversity: {results['diversity']:.4f}") print(f"Quality Score: {results['quality']:.4f}")

Dataset Preparation

Prepare your custom dataset for training:

# Prepare custom dataset # Place images in data/custom/ directory # Supported formats: .jpg, .png, .jpeg # Directory structure: # data/custom/ # ├── image1.jpg # ├── image2.png # └── ... # The training script will automatically: # - Load images from directory # - Apply data augmentation # - Resize to specified dimensions # - Normalize pixel values # - Create data loaders for training # Use custom dataset for training python train.py --dataset custom --data-dir data/custom --epochs 50

Latent Space Exploration

Explore the learned latent space:

# Style mixing visualization is available in the Jupyter notebooks jupyter notebook notebooks/04_style_mixing.ipynb # The notebooks include: # - Style mixing at different layers # - Interpolation examples in W space # - Progressive growing visualization # - Generated image samples # Visualize style mixing from visualize import visualize_style_mixing visualize_style_mixing(generator, num_samples=16, mix_layers=[4, 5, 6])

Advanced Visualization Techniques

Use the visualization script for comprehensive analysis:

# Style mixing visualization python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode style_mix --output style_mix.png # Progressive growing visualization python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode progressive --output progressive.png # Interpolation visualization python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode interpolate --num-steps 10 --output interpolation.png # Generated samples grid python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode samples --num-samples 16 --output samples.png # All visualizations at once python visualize.py --checkpoint outputs/checkpoints/generator.pth --mode all --output_dir visualizations/

Model Export and Deployment

Export trained models for deployment:

# Export to ONNX format (for production deployment) python scripts/convert_checkpoint.py \ --checkpoint outputs/checkpoints/generator.pth \ --format onnx \ --output stylegan_generator.onnx \ --latent-dim 512 # Export to TorchScript (PyTorch mobile/edge) python scripts/convert_checkpoint.py \ --checkpoint outputs/checkpoints/generator.pth \ --format torchscript \ --output stylegan_generator.pt # Export generator and discriminator separately python scripts/convert_checkpoint.py \ --checkpoint outputs/checkpoints/generator.pth \ --format onnx \ --components generator \ --output-dir exported_models/ # Verify exported model python scripts/convert_checkpoint.py \ --checkpoint outputs/checkpoints/generator.pth \ --format onnx \ --output stylegan_generator.onnx \ --verify

Model Comparison and Analysis

Compare different trained models:

# Compare multiple models python evaluate.py \ --checkpoints outputs/checkpoints/generator_epoch_50.pth \ outputs/checkpoints/generator_epoch_100.pth \ --dataset ./data/train \ --output comparison_report.html # Compare models with different truncation values python generate.py \ --checkpoint outputs/checkpoints/generator.pth \ --num_images 16 \ --truncation 0.5 \ --output_dir comparisons/trunc_0.5 python generate.py \ --checkpoint outputs/checkpoints/generator.pth \ --num_images 16 \ --truncation 0.7 \ --output_dir comparisons/trunc_0.7 # Generate side-by-side comparisons python visualize.py \ --checkpoint outputs/checkpoints/generator.pth \ --mode compare \ --num-samples 16 \ --output-dir model_comparisons/

Complete Training Workflow

Step-by-Step Training Process

Step 1: Prepare Data

# Prepare custom dataset in data/train/ directory # For custom dataset: # Place images in data/train/ # Supported formats: .jpg, .png, .jpeg # Preprocess images first: python preprocess.py --input_dir ./raw_images --output_dir ./data/train --size 256 # The training script will automatically: # - Load images from directory # - Resize and normalize images # - Create data loaders for progressive training

Step 2: Train Model

# Start training python train.py --dataset ./data/train --output_dir ./outputs --epochs 100 # Training will: # 1. Load and preprocess images # 2. Initialize generator and discriminator # 3. Train with adversarial loss + gradient penalty # 4. Progressive growing from low to high resolution # 5. Save checkpoints and best model # 6. Log training history to TensorBoard # 7. Generate sample images during training

Step 3: Monitor Training

  • Watch console output for epoch progress
  • Check TensorBoard: tensorboard --logdir outputs/logs
  • View generated samples in outputs/samples/
  • Best model saved as outputs/checkpoints/best_generator.pth

Step 4: Evaluate Model

# Evaluate on test set python evaluate.py --checkpoint outputs/checkpoints/generator.pth --dataset ./data/train # Calculate SSIM and diversity metrics

Step 5: Generate Images

# Generate images python generate.py --checkpoint outputs/checkpoints/generator.pth --num_images 16 --output_dir ./generated # Generate style interpolation python generate.py --checkpoint outputs/checkpoints/generator.pth --interpolate --num_steps 10 --output_dir ./generated

API Usage Examples

Image Generation Endpoint (cURL)

Generate images using the REST API:

curl -X POST http://localhost:5000/generate \ -H "Content-Type: application/json" \ -d '{ "num_images": 10 }' # Response: # { # "num_images": 10, # "images": ["base64_encoded_image1", ...], # "status": "success" # }

Latent Interpolation Endpoint (cURL)

Generate interpolation between latent points:

curl -X POST http://localhost:5000/interpolate \ -H "Content-Type: application/json" \ -d '{ "num_steps": 10 }' # Response: # { # "num_steps": 10, # "images": ["base64_encoded_image1", ...], # "status": "success" # }

Health Check (cURL)

Check API server health and model status:

curl -X GET http://localhost:5000/health # Response: # { # "status": "healthy", # "model_loaded": true, # "device": "cuda" # }

Python Requests Example

Use the API with Python requests library:

import requests import base64 from PIL import Image from io import BytesIO # Image generation endpoint response = requests.post( 'http://localhost:5000/generate', json={'num_images': 10} ) data = response.json() # Decode and save images for i, img_base64 in enumerate(data['images']): img_data = base64.b64decode(img_base64) img = Image.open(BytesIO(img_data)) img.save(f'generated_{i}.png') # Interpolation endpoint interp_response = requests.post( 'http://localhost:5000/interpolate', json={'num_steps': 10} ) print(interp_response.json()) # Health check health = requests.get('http://localhost:5000/health') print(health.json())

JavaScript/Fetch Example

Use the API with JavaScript fetch API:

// Image generation fetch('http://localhost:5000/generate', { method: 'POST', headers: {'Content-Type': 'application/json'}, body: JSON.stringify({ num_images: 10 }) }) .then(res => res.json()) .then(data => { console.log('Generated', data.num_images, 'images'); // Display images from base64 data data.images.forEach((imgBase64, i) => { const img = document.createElement('img'); img.src = 'data:image/png;base64,' + imgBase64; document.body.appendChild(img); }); }); // Interpolation fetch('http://localhost:5000/interpolate', { method: 'POST', headers: {'Content-Type': 'application/json'}, body: JSON.stringify({ num_steps: 10 }) }) .then(res => res.json()) .then(data => { console.log('Interpolation steps:', data.num_steps); }); // Health check fetch('http://localhost:5000/health') .then(res => res.json()) .then(data => console.log('Status:', data));

StyleGAN Model Variants

Model Max Resolution Latent Dim Use Case Quality
Small StyleGAN 128x128 512 Fast inference, basic tasks Good
Medium StyleGAN 256x256 512 Balanced quality/speed Better
Large StyleGAN 512x512 512 Higher quality generation Best
XL StyleGAN 1024x1024 512 Research, high quality Excellent

Dataset Information

Dataset Formats

The project supports multiple dataset formats for image training:

  • Built-in datasets: MNIST, CIFAR-10 (automatically downloaded)
  • Custom dataset: Directory of images (JPG, PNG, JPEG)
  • Automatic image loading and preprocessing
  • Data augmentation support
  • Train/validation split support
  • Multiple image format support

Custom Dataset Format

Training data is stored as image files in a directory:

# Custom dataset directory structure data/custom/ ├── image1.jpg ├── image2.png ├── image3.jpeg └── ... # The training script automatically: # - Loads images from directory # - Applies data augmentation # - Resizes to specified dimensions # - Normalizes pixel values # - Creates data loaders for training

Adding Custom Training Data

Add your own image dataset for training:

# Place images in data/custom/ directory # Supported formats: .jpg, .png, .jpeg # Example: mkdir -p data/custom cp your_images/*.jpg data/custom/ # Use in training python train.py --dataset custom --data-dir data/custom --epochs 50 # The script will automatically: # - Load all images from directory # - Apply augmentation if enabled # - Resize and normalize images # - Create train/validation splits

Troubleshooting & Best Practices

Common Issues

  • CUDA Out of Memory: Reduce batch_size in train.py, use smaller d_model (256 instead of 512), reduce num_layers, or use CPU mode
  • Model Not Found: Ensure model is trained first by running train.py or loading from models/ directory. Check model path is correct
  • Vocabulary Not Found: Ensure vocabularies are saved during training. Check vocab_dir path matches training save_dir
  • Slow Generation: Use smaller latent_dim (64 instead of 128), reduce hidden_dims, or use CPU mode
  • API Connection Error: Check if api.py is running on port 5000. Verify model path is correct
  • Import Errors: Verify all dependencies installed: pip install -r requirements.txt. Check Python version (3.8+)
  • Image Size Mismatch: Ensure all images are same size or use data augmentation to resize. Check IMAGE_SIZE in config
  • Poor Generation Quality: Train for more epochs, use larger latent_dim, increase training data, or adjust beta (KL weight)
  • Training Loss Not Decreasing: Check learning rate (may be too high/low), verify data format, check for data issues
  • Validation Loss Increasing: Model may be overfitting. Increase beta (KL weight), use more data, or reduce model size
  • Blurry Generated Images: Increase latent_dim, train longer, or adjust beta to balance reconstruction and KL divergence
  • Mode Collapse: Increase beta value, use more diverse training data, or try different architectures
  • KL Divergence Too High: Reduce beta value or increase model capacity
  • KL Divergence Too Low: Increase beta value to encourage better latent structure
  • Training Instability: Reduce learning rate, use gradient clipping, or try different optimizer
  • Memory Issues: Reduce batch size, use smaller image size, or enable mixed precision training

Performance Optimization Tips

  • GPU Memory: Use gradient accumulation for effective larger batch sizes: accumulate gradients over N batches before updating
  • Mixed Precision: Enable AMP (Automatic Mixed Precision) for 2x speedup and 50% memory reduction
  • Data Loading: Use multiple workers (num_workers=4-8) and pin_memory=True for faster data loading
  • Model Pruning: Remove unnecessary layers or reduce hidden dimensions for faster inference
  • Quantization: Use INT8 quantization for 4x speedup in production (with slight quality loss)
  • Batch Inference: Generate multiple images in batches rather than one at a time
  • Model Caching: Load model once and reuse for multiple generations
  • ONNX Runtime: Use exported ONNX models with ONNX Runtime for faster inference

Best Practices

  • Training Data: Use diverse, high-quality image datasets. More data = better results. Aim for 1K+ images minimum
  • Data Format: Ensure images are in supported formats (JPG, PNG, JPEG). Consistent image sizes recommended
  • Data Preprocessing: Normalize pixel values, resize images to consistent dimensions, apply augmentation if needed
  • Batch Size: Use smaller batches (32-64) for limited GPU memory. Larger batches (128+) for faster training if memory allows
  • Learning Rate: Start with 0.001 and adjust based on training loss. Use learning rate scheduling for better convergence
  • Gradient Clipping: Default is 1.0. Increase if training is unstable, decrease if gradients are too small
  • Beta (KL Weight): Start with 1.0. Higher = more regularization, lower = better reconstruction. Adjust based on results
  • Model Selection: Start with latent_dim=64 for speed/testing. Use 128+ for production quality. Higher = more expressive
  • Evaluation: Regularly evaluate reconstruction and KL divergence on validation set. Monitor for overfitting
  • Latent Dimension: Higher latent_dim = more expressive but slower. Common values: 64, 128, 256. Start with 128
  • Checkpointing: Model saves checkpoints automatically. Can resume training from checkpoint if needed
  • API Rate Limiting: Implement rate limiting for production deployments. Consider using nginx or similar
  • Logging: Monitor TensorBoard logs for debugging and optimization. View in outputs/logs/
  • Device Selection: Use CUDA if available for faster training. CPU works but much slower
  • Beta Tuning: Start with β=1.0, increase for better latent structure, decrease for better reconstruction
  • Latent Dimension: Start with 128, increase for more expressive models, decrease for faster training
  • Early Stopping: Monitor validation loss, stop if no improvement for 10-15 epochs
  • Checkpointing: Save checkpoints every 5-10 epochs to avoid losing progress
  • Data Augmentation: Use for small datasets to improve generalization
  • Learning Rate: Use learning rate scheduling (ReduceLROnPlateau) for better convergence

Use Cases and Applications

  • Image Generation: Generate new images from random latent vectors
  • Image Reconstruction: Reconstruct and denoise images
  • Image Interpolation: Create smooth transitions between images
  • Anomaly Detection: Detect outliers by high reconstruction error
  • Data Augmentation: Generate synthetic training data
  • Image Editing: Manipulate images in latent space
  • Feature Learning: Learn meaningful image representations
  • Dimensionality Reduction: Compress images to low-dimensional latent space
  • Style Transfer: Transfer styles by manipulating latent vectors
  • Image Completion: Complete missing parts of images

Performance Optimization

  • GPU Usage: Set CUDA_VISIBLE_DEVICES for multi-GPU systems. Use GPU for training and inference when available
  • Model Selection: Use latent_dim=64 for fastest inference. Larger models (128+, 256+) for better quality
  • Batch Processing: Generate multiple images in batches for efficient processing. Reduces overhead
  • Caching: API server caches model in memory. Model loads once on first request, then reused
  • Image Size: Use smaller image sizes (64x64) for faster generation. Larger images (128x128+) for better quality
  • Latent Sampling: Sample from standard normal distribution N(0, I) for generation. Use interpolation for smooth transitions
  • Model Quantization: Consider model quantization for production to reduce memory and speed up inference
  • Async Processing: For high-throughput, consider async API or queue system for batch image generation
  • Memory Management: Clear GPU cache between batches if running out of memory: torch.cuda.empty_cache()

Expected Training Times

Approximate training times for different configurations:

Dataset Image Size Latent Dim Batch Size Epochs GPU Time
MNIST 28×28 64 128 50 GTX 1080 ~15 min
MNIST 28×28 128 128 50 GTX 1080 ~20 min
CIFAR-10 32×32 128 128 100 GTX 1080 ~2 hours
Custom 64×64 256 64 50 RTX 3090 ~3 hours
Custom 128×128 512 32 50 RTX 3090 ~8 hours

Note: Times are approximate and depend on hardware, dataset size, and other factors. CPU training is typically 10-20x slower.

Model Size and Memory Requirements

Approximate model sizes and memory usage:

Latent Dim Hidden Dims Model Size GPU Memory (Training) GPU Memory (Inference)
64 [32, 64, 128] ~5 MB ~500 MB ~200 MB
128 [32, 64, 128, 256] ~15 MB ~1.5 GB ~500 MB
256 [64, 128, 256, 512] ~50 MB ~4 GB ~1.5 GB
512 [128, 256, 512, 1024] ~200 MB ~12 GB ~4 GB

Note: Memory usage depends on batch size and image size. Larger batches and images require more memory.

Real-World Examples & Use Cases

Example 1: Training on Custom Face Dataset

Complete workflow for training on a custom face dataset:

# 1. Prepare dataset mkdir -p data/faces # Copy face images to data/faces/ # 2. Train StyleGAN model python train.py \ --dataset data/faces \ --output_dir ./outputs \ --epochs 100 \ --batch-size 4 \ --max-resolution 256 \ --tensorboard # 3. Generate new faces python generate.py \ --checkpoint outputs/checkpoints/best_generator.pth \ --num_images 16 \ --output_dir ./generated_faces # 4. Create face style mixing python generate.py \ --checkpoint outputs/checkpoints/best_generator.pth \ --interpolate \ --num_steps 20 \ --output_dir ./face_interpolation

Example 2: Style Transfer

Use StyleGAN for style transfer and manipulation:

# Train on style dataset python train.py --dataset style_images --epochs 100 # Mix styles from different images from stylegan.style_mix import style_mix z1 = torch.randn(1, 512) # Style 1 z2 = torch.randn(1, 512) # Style 2 # Mix styles at specific layers mixed = style_mix(generator, z1, z2, mix_layers=[4, 5, 6]) # Save result save_image(mixed, "style_mixed.png")

Example 3: High-Resolution Generation

Generate high-resolution images with StyleGAN:

# Train on high-resolution dataset python train.py --dataset high_res_images --max-resolution 1024 --epochs 200 # Generate high-resolution images python generate.py \ --checkpoint outputs/checkpoints/generator.pth \ --num_images 16 \ --output_dir ./high_res_generated

Example 4: Data Augmentation

Generate synthetic training data:

# Train StyleGAN on small dataset python train.py --dataset small_dataset --epochs 100 # Generate synthetic images to augment dataset python generate.py \ --checkpoint outputs/checkpoints/generator.pth \ --num_images 1000 \ --output_dir data/augmented/ # Use generated images as additional training data # Combine with original dataset for better model performance

Example 5: Style Space Manipulation

Manipulate images by editing style vectors in W space:

# Generate image from latent code from stylegan import StyleGANGenerator import torch generator = StyleGANGenerator(latent_dim=512, style_dim=512, max_resolution=256) generator.load_state_dict(torch.load("outputs/checkpoints/generator.pth")) generator.eval() # Generate base image z = torch.randn(1, 512) w = generator.mapping(z) image = generator.synthesis(w) # Manipulate style vector (e.g., change specific attributes) w_modified = w.clone() w_modified[:, 0:10] += 0.5 # Modify first 10 dimensions # Generate modified image modified_image = generator.synthesis(w_modified) # Save result save_image(modified_image, "modified_image.png")

Integration Examples

Integration with Flask Web Application

Integrate VAE into a Flask web application:

from flask import Flask, request, jsonify, send_file from generate import load_model, generate_images import io from PIL import Image app = Flask(__name__) model = load_model("outputs/vae_model.pth", device="cuda") @app.route('/generate', methods=['POST']) def generate(): num_images = request.json.get('num_images', 10) images = generate_images(model, num_images=num_images, device="cuda") # Convert to base64 for JSON response import base64 image_data = [] for img in images: buffered = io.BytesIO() img.save(buffered, format="PNG") img_str = base64.b64encode(buffered.getvalue()).decode() image_data.append(img_str) return jsonify({'images': image_data}) @app.route('/reconstruct', methods=['POST']) def reconstruct(): # Receive image file file = request.files['image'] image = Image.open(file.stream) # Reconstruct using VAE reconstructed = model.reconstruct(image) # Return reconstructed image img_io = io.BytesIO() reconstructed.save(img_io, 'PNG') img_io.seek(0) return send_file(img_io, mimetype='image/png') if __name__ == '__main__': app.run(host='0.0.0.0', port=5000)

Integration with FastAPI

Create a FastAPI service for VAE image generation:

from fastapi import FastAPI, File, UploadFile from fastapi.responses import FileResponse from generate import load_model, generate_images import torch app = FastAPI() model = load_model("outputs/vae_model.pth", device="cuda") @app.post("/generate") async def generate_images_endpoint(num_images: int = 10): """Generate images from random latent vectors.""" images = generate_images(model, num_images=num_images, device="cuda") # Save and return images return {"status": "success", "num_images": num_images} @app.post("/reconstruct") async def reconstruct_image(file: UploadFile = File(...)): """Reconstruct uploaded image.""" image_data = await file.read() # Process and reconstruct image reconstructed = model.reconstruct(image_data) return FileResponse(reconstructed) @app.get("/health") async def health_check(): """Health check endpoint.""" return {"status": "healthy", "model_loaded": True}

Integration with Streamlit

Create an interactive Streamlit application:

import streamlit as st from generate import load_model, generate_images import torch st.title("VAE Image Generation") # Load model @st.cache_resource def load_vae_model(): return load_model("outputs/vae_model.pth", device="cuda") model = load_vae_model() # Sidebar controls num_images = st.sidebar.slider("Number of Images", 1, 64, 16) latent_dim = st.sidebar.slider("Latent Dimension", 64, 512, 128) # Generate button if st.button("Generate Images"): with st.spinner("Generating images..."): images = generate_images(model, num_images=num_images, device="cuda") st.image(images, width=200) # Interpolation st.header("Latent Space Interpolation") steps = st.slider("Interpolation Steps", 5, 30, 10) if st.button("Create Interpolation"): # Create interpolation st.image(interpolated_images, width=200)

Contact Information

Get in Touch

Developer: Molla Samser
Designer & Tester: Rima Khatun

rskworld.in
help@rskworld.in support@rskworld.in
+91 93305 39277

License

This project is for educational purposes only. See LICENSE file for more details.

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • Software Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2025 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer