RSK World - Complete Documentation - VAE Image Generation | VAE Image Generation | Variational Autoencoder | Image Generation | PyTorch | RSK World

Project Description

This project implements a Variational Autoencoder (VAE) for generating realistic images using probabilistic latent space with encoder-decoder architecture. The architecture uses convolutional encoder and decoder networks with reparameterization trick, KL divergence regularization, and reconstruction loss for learning meaningful latent representations. Perfect for learning VAE fundamentals and image generation. The system includes KL divergence and reconstruction loss evaluation, TensorBoard integration, latent space interpolation, data augmentation, and comprehensive training tools.

The VAE uses probabilistic latent space where an encoder network maps images to latent distributions (μ, σ), and a decoder network reconstructs images from sampled latent vectors. The encoder uses standard convolutions to downsample images to latent parameters, while the decoder uses transposed convolutions to upsample from latent vectors to images. The implementation provides complete PyTorch support, comprehensive training pipeline, web interface, evaluation metrics, and deployment tools for image generation applications.

Project Screenshots

1 / 4

Core Features

VAE Architecture

Convolutional encoder
Convolutional decoder
Reparameterization trick
Probabilistic latent space
Image reconstruction

Probabilistic Latent Space

Latent distributions (μ, σ)
Reparameterization trick
KL divergence regularization
Meaningful representations
Standard normal prior

Encoder-Decoder Layers

Convolutional encoder
Transposed convolutions (decoder)
Latent space mapping
Image reconstruction
Deep network architecture

KL Divergence & Reconstruction Loss

Reconstruction loss (MSE)
KL divergence regularization
Beta weighting parameter
Model performance evaluation
Comprehensive evaluation

TensorBoard Integration

Real-time loss visualization
Generated image tracking
Training progress monitoring
Interactive dashboard
Comprehensive logging

Web Interface

Flask-based web app
Interactive image generation
Real-time generation
Download generated images
User-friendly interface

Advanced Features

Latent Space Interpolation

Linear interpolation
Spherical interpolation (SLERP)
Latent walk generation
Smooth image transitions
Visual exploration

Data Augmentation

Adaptive augmentation
Mixup augmentation
Cutout augmentation
Multiple augmentation levels

Multiple Dataset Support

Custom dataset support
CelebA dataset
CIFAR-10 dataset
MNIST dataset

Resume Training

Checkpoint resuming
Automatic checkpoint detection
Training continuation
Progress preservation
Early stopping support

Web Interface Features

Feature	Description	Usage
Image Generation	Generate images from random noise	Select number of images and click Generate
Real-time Generation	Generate images in real-time	Images appear as they are generated
Download Images	Download generated images	Click download button for each image
Model Selection	Choose different trained models	Select from available checkpoints

Technologies Used

This VAE Image Generation project is built using modern deep learning and computer vision technologies. The core implementation uses Python as the primary programming language and PyTorch for deep learning operations. The project includes a Variational Autoencoder architecture with convolutional encoder and decoder networks for realistic image generation. The project includes a Flask-based web interface for interactive image generation, Jupyter Notebook support for interactive development and demonstrations, and comprehensive KL divergence and reconstruction loss evaluation for assessing image quality.

The VAE model uses probabilistic latent space where an encoder maps images to latent distributions (μ, σ), and a decoder reconstructs images from sampled latent vectors. The system supports reparameterization trick for enabling backpropagation through random sampling, KL divergence regularization for ensuring latent space follows standard normal distribution, and latent space interpolation for exploring the learned representation space, making it suitable for various image generation applications.

Python 3.8+ PyTorch 2.0+ VAE Autoencoder Image Generation TensorBoard Computer Vision Jupyter Notebook Flask 2.3+ KL Divergence

Installation & Usage

Installation

Install all required dependencies for the VAE Image Generation project:

# Install all requirements
pip install -r requirements.txt

# The VAE model will be trained on your data
# Prepare your dataset in data/custom/ directory
# Or use built-in datasets (MNIST, CIFAR-10, Custom)

PyTorch Installation

Install PyTorch (CPU or GPU version):

# For CPU only
pip install torch torchvision torchaudio

# For CUDA (GPU support) - CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Verify installation
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"

Verify Installation

Test the model and verify all components work:

# Test model architecture
python test_model.py

# This will verify:
# - Model can be instantiated
# - Forward pass works
# - All components function correctly
# - Device compatibility (CPU/CUDA)

Training the Model

Train the VAE model on your image dataset:

# Prepare your dataset
# Place images in data/custom/ directory
# Or use built-in datasets: 'mnist', 'cifar10', 'custom'

# Basic training with default parameters
python train.py --dataset mnist --epochs 50 --batch-size 128 --latent-dim 128

# Configure in config.py:
# - DATASET = 'mnist'  # or 'cifar10', 'custom'
# - BATCH_SIZE = 128
# - EPOCHS = 50
# - LATENT_DIM = 128
# - LEARNING_RATE = 0.001
# - BETA = 1.0  # KL divergence weight

# Or use Jupyter notebook
jupyter notebook VAE_Image_Generation.ipynb

# Training will:
# - Load and preprocess images
# - Initialize encoder and decoder
# - Train with reconstruction + KL divergence loss
# - Save checkpoints and generated samples
# - Log to TensorBoard

Training Parameters (config.py):

DATASET: Dataset to use - 'mnist', 'cifar10', 'custom'
BATCH_SIZE: Training batch size (default: 128)
EPOCHS: Number of training epochs (default: 50)
LATENT_DIM: Dimension of latent space (default: 128)
LEARNING_RATE: Learning rate (default: 0.001)
BETA: Weight for KL divergence term (default: 1.0)
INPUT_CHANNELS: Number of input channels - 3 for RGB, 1 for grayscale (default: 3)
HIDDEN_DIMS: Encoder/decoder hidden dimensions (default: [32, 64, 128, 256])
DEVICE: Device to use - 'cuda' or 'cpu' (default: 'cuda')

Image Generation

Generate images using the trained VAE model:

# Generate images from trained model
python generate.py --model outputs/vae_model.pth --num-images 64 --output generated.png

# Generate with interpolation
python generate.py --model outputs/vae_model.pth --interpolate --interpolation-steps 10 --output interpolation.png

# Or use Jupyter notebook
jupyter notebook VAE_Image_Generation.ipynb

# Using Python API
from generate import load_model, generate_images

images = generate_images(
    model_path="outputs/vae_model.pth",
    num_images=64,
    device="cuda"
)
print(f"Generated {len(images)} images")

REST API Server

Start the Flask API server for web integration:

# Start API server (default port 5000)
python api.py --model outputs/vae_model.pth --port 5000

# Start on custom port
python api.py --model outputs/vae_model.pth --port 8080

# Start on custom host and port
python api.py --model outputs/vae_model.pth --host 0.0.0.0 --port 5000

# API will be available at http://localhost:5000

# Example API calls:
# POST /generate - {"num_images": 10}
# POST /interpolate - {"num_steps": 10}
# GET /health - Check API health

API Server Parameters:

--model: Path to trained model checkpoint (required)
--port: Port to run server on (default: 5000)
--host: Host to bind to (default: 0.0.0.0)

Model Evaluation

Evaluate the trained VAE model performance:

# Evaluate trained model
python evaluate.py --model outputs/vae_model.pth --dataset mnist

# The evaluation includes:
# - Reconstruction error calculation
# - KL divergence statistics
# - Latent space statistics
# - Model performance metrics

Latent Space Visualization

Visualize the learned latent space and generate interpolations:

# Latent space visualization
python visualize.py --model outputs/vae_model.pth --mode manifold --output latent_manifold.png

# The visualization includes:
# - Latent space manifold visualization
# - Interpolation between latent points
# - Reconstruction examples
# - Generated image samples

Jupyter Notebook

Open the interactive Jupyter notebook for demonstrations:

# VAE Image Generation demonstration notebook
jupyter notebook VAE_Image_Generation.ipynb

# The notebook includes:
# - Model architecture visualization
# - Encoder-decoder explanation
# - Training setup examples
# - Image generation examples
# - Latent space visualization

# Or use JupyterLab
jupyter lab VAE_Image_Generation.ipynb

Project Structure

                vae-image-generation/

                ├── README.md                          # Main documentation

                ├── requirements.txt                   # Python dependencies

                ├── LICENSE                            # License file

                ├── CHANGELOG.md                      # Changelog

                ├── CONTRIBUTING.md                   # Contribution guidelines

                ├── PROJECT_SUMMARY.md                 # Project summary

                │

                ├── Core Modules

                │   ├── vae_model.py                   # VAE architecture

                │   ├── data_loader.py                  # Data loading and augmentation

                │   ├── train.py                       # Basic training script

                │   ├── train_advanced.py              # Advanced training script

                │   ├── generate.py                    # Image generation

                │   ├── evaluate.py                    # Model evaluation

                │   ├── visualize.py                   # Visualization tools

                │   ├── utils.py                       # Utility functions

                │   └── config.py                      # Configuration settings

                │

                ├── API & Services

                │   ├── api.py                         # Flask REST API

                │   └── index.html                     # Web interface

                │

                ├── Data

                │   └── (training datasets: MNIST, CIFAR-10, Custom)

                │

                ├── Models

                │   └── (trained model checkpoints)

                │

                ├── Notebooks

                │   └── VAE_Image_Generation.ipynb    # Jupyter notebook demo

                │

                ├── Scripts

                │   ├── export_model.py               # Model export utilities

                │   └── compare_models.py             # Model comparison

Configuration Options

Model Configuration

Customize model and training parameters in config.py and train.py:

# Model Architecture (config.py)
INPUT_CHANNELS = 3                # Input channels (1 for grayscale, 3 for RGB)
LATENT_DIM = 128                  # Dimension of latent space
HIDDEN_DIMS = [32, 64, 128, 256]  # Encoder/decoder hidden dimensions

# Training Parameters (config.py)
BATCH_SIZE = 128                  # Training batch size
LEARNING_RATE = 0.001             # Learning rate
EPOCHS = 50                       # Number of training epochs
BETA = 1.0                        # KL divergence weight
DATASET = 'mnist'                 # Dataset: 'mnist', 'cifar10', 'custom'
GRAD_CLIP = 1.0                   # Gradient clipping value
DEVICE = 'cuda'                   # Device: 'cuda' or 'cpu'

# Generation Configuration
NUM_IMAGES = 64                   # Number of images to generate
INTERPOLATION_STEPS = 10          # Steps for latent interpolation

Configuration Tips:

LATENT_DIM: Higher = more expressive but slower. Common values: 64, 128, 256
HIDDEN_DIMS: Encoder/decoder channel dimensions. More channels = better quality but slower
BETA: KL divergence weight. Higher = more regularization. Start with 1.0
BATCH_SIZE: Larger = faster but needs more memory. Adjust based on GPU
LEARNING_RATE: Start with 0.001. Use learning rate scheduling for better convergence
INPUT_CHANNELS: 1 for grayscale (MNIST), 3 for RGB (CIFAR-10, custom)
DATASET: Choose 'mnist', 'cifar10', or 'custom' based on your data

Training Progress Logging

The training script automatically logs progress to TensorBoard and saves checkpoints:

# Training logs are saved to:
# outputs/logs/ - TensorBoard logs
# outputs/models/ - Model checkpoints
# outputs/samples/ - Generated sample images

# View TensorBoard logs
tensorboard --logdir outputs/logs

# TensorBoard shows:
# - Reconstruction loss per epoch
# - KL divergence loss per epoch
# - Total loss per epoch
# - Generated image samples
# - Learning rate schedule
# - Model parameters

# Checkpoints are saved as:
# outputs/models/vae_model_epoch_XX.pth
# outputs/models/best_vae_model.pth

Advanced Training Options

Use the advanced training script with additional features:

# Advanced training with all features
python train_advanced.py \
    --dataset mnist \
    --epochs 50 \
    --batch-size 128 \
    --latent-dim 128 \
    --beta 1.0 \
    --tensorboard \
    --early-stopping 10 \
    --lr-scheduler \
    --augment \
    --grad-clip 1.0 \
    --amp

# Features enabled:
# --tensorboard: Enable TensorBoard logging
# --early-stopping N: Stop if no improvement for N epochs
# --lr-scheduler: Use learning rate scheduling
# --augment: Enable data augmentation
# --grad-clip: Gradient clipping value
# --amp: Mixed precision training (faster, less memory)

# Resume training from checkpoint
python train_advanced.py \
    --resume outputs/models/vae_model_epoch_25.pth \
    --epochs 50

Training on Different Datasets

Examples for training on different datasets:

# Training on MNIST (grayscale, 28x28)
python train.py --dataset mnist --epochs 50 --latent-dim 64

# Training on CIFAR-10 (RGB, 32x32)
python train.py --dataset cifar10 --epochs 100 --latent-dim 128 --beta 0.5

# Training on custom dataset (RGB, any size)
python train.py \
    --dataset custom \
    --data-dir data/custom \
    --epochs 50 \
    --latent-dim 256 \
    --image-size 64

# For high-resolution images (128x128 or larger)
python train.py \
    --dataset custom \
    --data-dir data/custom \
    --latent-dim 512 \
    --hidden-dims "[64, 128, 256, 512, 1024]" \
    --image-size 128 \
    --batch-size 32

Detailed Architecture

VAE Components

1. Encoder:

Convolutional layers that downsample input images
Maps images to latent space parameters (μ, σ)
Outputs mean (μ) and log variance (log σ²) for each latent dimension
Uses standard convolutions with stride for downsampling
Creates rich feature representations before latent space

2. Reparameterization Trick:

Samples latent vectors z from N(μ, σ²) distribution
Enables backpropagation through random sampling
Formula: z = μ + σ ⊙ ε, where ε ~ N(0, I)
Allows gradient flow from decoder to encoder
Key innovation that makes VAE trainable

3. Decoder:

Transposed convolutional layers that upsample latent vectors
Reconstructs images from sampled latent vectors z
Uses transposed convolutions with stride for upsampling
Outputs reconstructed images matching input dimensions
Learns to generate realistic images from latent space

Loss Function

The VAE loss combines reconstruction and regularization:

Total Loss = Reconstruction Loss + β × KL Divergence

Reconstruction Loss (MSE):
L_recon = ||x - x_recon||²

KL Divergence:
L_KL = -0.5 × Σ(1 + log(σ²) - μ² - σ²)

Where:
- x: Original input image
- x_recon: Reconstructed image
- μ: Latent mean vector
- σ: Latent standard deviation vector
- β: Weight for KL divergence (default: 1.0)

The KL divergence ensures the latent space
follows a standard normal distribution N(0, I)

Reparameterization Trick

Enables backpropagation through random sampling:

z = μ + σ ⊙ ε

Where:
- μ: Mean vector from encoder
- σ: Standard deviation vector from encoder
- ε: Random noise sampled from N(0, I)
- ⊙: Element-wise multiplication

This allows:
- Sampling from latent distribution
- Gradient flow through random sampling
- Training encoder and decoder jointly
- Learning meaningful latent representations

Layer-by-Layer Architecture Details

Encoder Architecture:

Input Layer: Receives images of size [batch, channels, height, width]
Convolutional Blocks: Each block contains Conv2d → BatchNorm → LeakyReLU
Downsampling: Uses stride=2 convolutions to reduce spatial dimensions
Hidden Dimensions: Progressively increases channels [32→64→128→256]
Flattening: Converts feature maps to 1D vectors
Latent Projection: Two linear layers output μ (mean) and log σ² (log variance)

Decoder Architecture:

Latent Input: Receives sampled latent vector z of size [batch, latent_dim]
Initial Projection: Linear layer expands z to feature map size
Reshaping: Converts 1D vector to 2D feature maps
Transposed Convolutional Blocks: Each block contains ConvTranspose2d → BatchNorm → ReLU
Upsampling: Uses stride=2 transposed convolutions to increase spatial dimensions
Hidden Dimensions: Progressively decreases channels [256→128→64→32]
Output Layer: Final transposed convolution outputs reconstructed image

Mathematical Formulation

Complete mathematical description of the VAE:

# Encoder: q_φ(z|x)
μ, log σ² = Encoder(x)
q_φ(z|x) = N(z; μ, σ²I)

# Reparameterization: z ~ q_φ(z|x)
ε ~ N(0, I)
z = μ + σ ⊙ ε

# Decoder: p_θ(x|z)
x_recon = Decoder(z)
p_θ(x|z) = N(x; x_recon, I)  # For MSE loss

# Prior: p(z)
p(z) = N(z; 0, I)  # Standard normal

# Loss Function: ELBO (Evidence Lower BOund)
L = E_q[log p_θ(x|z)] - β × KL(q_φ(z|x) || p(z))

# Where:
# - First term: Reconstruction loss (MSE)
# - Second term: KL divergence (regularization)
# - β: Weight parameter (β-VAE)

Beta-VAE and KL Weighting

Understanding the beta parameter in VAE training:

β = 0: No KL regularization, pure reconstruction (may collapse to standard autoencoder)
β = 1: Standard VAE, balanced reconstruction and regularization
β > 1: Stronger regularization, better latent space structure, may sacrifice reconstruction quality
β < 1: Weaker regularization, better reconstruction, may have less structured latent space
β-VAE: Using β > 1 encourages disentangled representations
Recommendation: Start with β = 1.0, adjust based on reconstruction quality vs. latent structure

Advanced Features Usage

Image Generation Usage

Generate images using the trained VAE model:

# Generate images from random latent vectors
from generate import load_model, generate_images

model = load_model("outputs/vae_model.pth", device="cuda")
images = generate_images(model, num_images=64, device="cuda")

# Save generated images
from utils import save_images
save_images(images, "generated_samples.png")

# Generate with specific latent vector
import torch
z = torch.randn(1, 128).to("cuda")  # 128 is latent_dim
generated = model.decode(z)

Latent Space Interpolation

Generate smooth transitions between images:

from generate import interpolate_latent_space

# Linear interpolation between two latent vectors
z1 = torch.randn(1, 128).to("cuda")
z2 = torch.randn(1, 128).to("cuda")
interpolated = interpolate_latent_space(model, z1, z2, steps=10)

# Spherical interpolation (SLERP)
interpolated = interpolate_latent_space(
    model, z1, z2, steps=10, method="slerp"
)

# Save interpolation results
save_images(interpolated, "interpolation.png")

Model Evaluation

Evaluate model performance with reconstruction and KL divergence:

from evaluate import evaluate_model

# Evaluate on test dataset
results = evaluate_model(
    model_path="outputs/vae_model.pth",
    dataset="mnist",
    device="cuda"
)

# Results include:
# - Average reconstruction loss
# - Average KL divergence
# - Total loss
# - Latent space statistics

print(f"Reconstruction Loss: {results['recon_loss']:.4f}")
print(f"KL Divergence: {results['kl_loss']:.4f}")
print(f"Total Loss: {results['total_loss']:.4f}")

Dataset Preparation

Prepare your custom dataset for training:

# Prepare custom dataset
# Place images in data/custom/ directory
# Supported formats: .jpg, .png, .jpeg

# Directory structure:
# data/custom/
#   ├── image1.jpg
#   ├── image2.png
#   └── ...

# The training script will automatically:
# - Load images from directory
# - Apply data augmentation
# - Resize to specified dimensions
# - Normalize pixel values
# - Create data loaders for training

# Use custom dataset for training
python train.py --dataset custom --data-dir data/custom --epochs 50

Latent Space Exploration

Explore the learned latent space:

# Latent space visualization is available in the Jupyter notebook
jupyter notebook VAE_Image_Generation.ipynb

# The notebook includes:
# - Latent space manifold visualization
# - Interpolation examples
# - Reconstruction examples
# - Generated image samples

# Visualize latent space with t-SNE or PCA
from visualize import visualize_latent_space
visualize_latent_space(model, test_loader, method="tsne")

Advanced Visualization Techniques

Use the visualization script for comprehensive analysis:

# Latent manifold visualization (2D grid of generated images)
python visualize.py --model outputs/vae_model.pth --mode manifold --output latent_manifold.png

# Reconstruction comparison (original vs reconstructed)
python visualize.py --model outputs/vae_model.pth --mode reconstruct --num-samples 16 --output reconstruction.png

# Latent space traversal (vary one dimension at a time)
python visualize.py --model outputs/vae_model.pth --mode traverse --dim 0 --steps 10 --output traversal.png

# t-SNE visualization of latent space
python visualize.py --model outputs/vae_model.pth --mode tsne --output latent_tsne.png

# PCA visualization of latent space
python visualize.py --model outputs/vae_model.pth --mode pca --output latent_pca.png

# All visualizations at once
python visualize.py --model outputs/vae_model.pth --mode all --output-dir visualizations/

Model Export and Deployment

Export trained models for deployment:

# Export to ONNX format (for production deployment)
python scripts/export_model.py \
    --model outputs/vae_model.pth \
    --format onnx \
    --output vae_model.onnx \
    --input-size 1 3 64 64

# Export to TorchScript (PyTorch mobile/edge)
python scripts/export_model.py \
    --model outputs/vae_model.pth \
    --format torchscript \
    --output vae_model.pt

# Export encoder and decoder separately
python scripts/export_model.py \
    --model outputs/vae_model.pth \
    --format onnx \
    --components encoder decoder \
    --output-dir exported_models/

# Verify exported model
python scripts/export_model.py \
    --model outputs/vae_model.pth \
    --format onnx \
    --output vae_model.onnx \
    --verify

Model Comparison and Analysis

Compare different trained models:

# Compare multiple models
python scripts/compare_models.py \
    --models outputs/models/vae_model_beta1.pth \
              outputs/models/vae_model_beta2.pth \
              outputs/models/vae_model_beta5.pth \
    --dataset mnist \
    --output comparison_report.html

# Compare models with different latent dimensions
python scripts/compare_models.py \
    --models outputs/models/vae_latent64.pth \
              outputs/models/vae_latent128.pth \
              outputs/models/vae_latent256.pth \
    --metrics reconstruction kl_divergence total_loss \
    --output comparison.png

# Generate side-by-side comparisons
python scripts/compare_models.py \
    --models outputs/models/*.pth \
    --mode visual \
    --num-samples 16 \
    --output-dir model_comparisons/

Complete Training Workflow

Step-by-Step Training Process

Step 1: Prepare Data

# Use built-in datasets (MNIST, CIFAR-10)
# Or prepare custom dataset in data/custom/ directory

# For custom dataset:
# Place images in data/custom/
# Supported formats: .jpg, .png, .jpeg

# The training script will automatically:
# - Load images from directory
# - Apply data augmentation
# - Resize and normalize images
# - Create data loaders

Step 2: Train Model

# Start training
python train.py --dataset mnist --epochs 50 --batch-size 128 --latent-dim 128

# Training will:
# 1. Load and preprocess images
# 2. Initialize encoder and decoder
# 3. Train with reconstruction + KL divergence loss
# 4. Save checkpoints and best model
# 5. Log training history to TensorBoard
# 6. Generate sample images during training

Step 3: Monitor Training

Watch console output for epoch progress
Check TensorBoard: tensorboard --logdir outputs/logs
View generated samples in outputs/samples/
Best model saved as outputs/models/vae_model.pth

Step 4: Evaluate Model

# Evaluate on test set
python evaluate.py --model outputs/vae_model.pth --dataset mnist

# Calculate reconstruction loss and KL divergence

Step 5: Generate Images

# Generate images
python generate.py --model outputs/vae_model.pth --num-images 64 --output generated.png

# Generate interpolation
python generate.py --model outputs/vae_model.pth --interpolate --interpolation-steps 10

API Usage Examples

Image Generation Endpoint (cURL)

Generate images using the REST API:

curl -X POST http://localhost:5000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "num_images": 10
  }'

# Response:
# {
#   "num_images": 10,
#   "images": ["base64_encoded_image1", ...],
#   "status": "success"
# }

Latent Interpolation Endpoint (cURL)

Generate interpolation between latent points:

curl -X POST http://localhost:5000/interpolate \
  -H "Content-Type: application/json" \
  -d '{
    "num_steps": 10
  }'

# Response:
# {
#   "num_steps": 10,
#   "images": ["base64_encoded_image1", ...],
#   "status": "success"
# }

Health Check (cURL)

Check API server health and model status:

curl -X GET http://localhost:5000/health

# Response:
# {
#   "status": "healthy",
#   "model_loaded": true,
#   "device": "cuda"
# }

Python Requests Example

Use the API with Python requests library:

import requests
import base64
from PIL import Image
from io import BytesIO

# Image generation endpoint
response = requests.post(
    'http://localhost:5000/generate',
    json={'num_images': 10}
)
data = response.json()

# Decode and save images
for i, img_base64 in enumerate(data['images']):
    img_data = base64.b64decode(img_base64)
    img = Image.open(BytesIO(img_data))
    img.save(f'generated_{i}.png')

# Interpolation endpoint
interp_response = requests.post(
    'http://localhost:5000/interpolate',
    json={'num_steps': 10}
)
print(interp_response.json())

# Health check
health = requests.get('http://localhost:5000/health')
print(health.json())

JavaScript/Fetch Example

Use the API with JavaScript fetch API:

// Image generation
fetch('http://localhost:5000/generate', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
        num_images: 10
    })
})
.then(res => res.json())
.then(data => {
    console.log('Generated', data.num_images, 'images');
    // Display images from base64 data
    data.images.forEach((imgBase64, i) => {
        const img = document.createElement('img');
        img.src = 'data:image/png;base64,' + imgBase64;
        document.body.appendChild(img);
    });
});

// Interpolation
fetch('http://localhost:5000/interpolate', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
        num_steps: 10
    })
})
.then(res => res.json())
.then(data => {
    console.log('Interpolation steps:', data.num_steps);
});

// Health check
fetch('http://localhost:5000/health')
.then(res => res.json())
.then(data => console.log('Status:', data));

VAE Model Variants

Model	Latent Dim	Hidden Dims	Use Case	Quality
Small VAE	64	[16, 32, 64]	Fast inference, basic tasks	Good
Medium VAE	128	[32, 64, 128, 256]	Balanced quality/speed	Better
Large VAE	256	[64, 128, 256, 512]	Higher quality generation	Best
XL VAE	512	[128, 256, 512, 1024]	Research, high quality	Excellent

Dataset Information

Dataset Formats

The project supports multiple dataset formats for image training:

Built-in datasets: MNIST, CIFAR-10 (automatically downloaded)
Custom dataset: Directory of images (JPG, PNG, JPEG)
Automatic image loading and preprocessing
Data augmentation support
Train/validation split support
Multiple image format support

Custom Dataset Format

Training data is stored as image files in a directory:

# Custom dataset directory structure
data/custom/
├── image1.jpg
├── image2.png
├── image3.jpeg
└── ...

# The training script automatically:
# - Loads images from directory
# - Applies data augmentation
# - Resizes to specified dimensions
# - Normalizes pixel values
# - Creates data loaders for training

Adding Custom Training Data

Add your own image dataset for training:

# Place images in data/custom/ directory
# Supported formats: .jpg, .png, .jpeg

# Example:
mkdir -p data/custom
cp your_images/*.jpg data/custom/

# Use in training
python train.py --dataset custom --data-dir data/custom --epochs 50

# The script will automatically:
# - Load all images from directory
# - Apply augmentation if enabled
# - Resize and normalize images
# - Create train/validation splits

Troubleshooting & Best Practices

Common Issues

CUDA Out of Memory: Reduce batch_size in train.py, use smaller d_model (256 instead of 512), reduce num_layers, or use CPU mode
Model Not Found: Ensure model is trained first by running train.py or loading from models/ directory. Check model path is correct
Vocabulary Not Found: Ensure vocabularies are saved during training. Check vocab_dir path matches training save_dir
Slow Generation: Use smaller latent_dim (64 instead of 128), reduce hidden_dims, or use CPU mode
API Connection Error: Check if api.py is running on port 5000. Verify model path is correct
Import Errors: Verify all dependencies installed: pip install -r requirements.txt. Check Python version (3.8+)
Image Size Mismatch: Ensure all images are same size or use data augmentation to resize. Check IMAGE_SIZE in config
Poor Generation Quality: Train for more epochs, use larger latent_dim, increase training data, or adjust beta (KL weight)
Training Loss Not Decreasing: Check learning rate (may be too high/low), verify data format, check for data issues
Validation Loss Increasing: Model may be overfitting. Increase beta (KL weight), use more data, or reduce model size
Blurry Generated Images: Increase latent_dim, train longer, or adjust beta to balance reconstruction and KL divergence
Mode Collapse: Increase beta value, use more diverse training data, or try different architectures
KL Divergence Too High: Reduce beta value or increase model capacity
KL Divergence Too Low: Increase beta value to encourage better latent structure
Training Instability: Reduce learning rate, use gradient clipping, or try different optimizer
Memory Issues: Reduce batch size, use smaller image size, or enable mixed precision training

Performance Optimization Tips

GPU Memory: Use gradient accumulation for effective larger batch sizes: accumulate gradients over N batches before updating
Mixed Precision: Enable AMP (Automatic Mixed Precision) for 2x speedup and 50% memory reduction
Data Loading: Use multiple workers (num_workers=4-8) and pin_memory=True for faster data loading
Model Pruning: Remove unnecessary layers or reduce hidden dimensions for faster inference
Quantization: Use INT8 quantization for 4x speedup in production (with slight quality loss)
Batch Inference: Generate multiple images in batches rather than one at a time
Model Caching: Load model once and reuse for multiple generations
ONNX Runtime: Use exported ONNX models with ONNX Runtime for faster inference

Best Practices

Training Data: Use diverse, high-quality image datasets. More data = better results. Aim for 1K+ images minimum
Data Format: Ensure images are in supported formats (JPG, PNG, JPEG). Consistent image sizes recommended
Data Preprocessing: Normalize pixel values, resize images to consistent dimensions, apply augmentation if needed
Batch Size: Use smaller batches (32-64) for limited GPU memory. Larger batches (128+) for faster training if memory allows
Learning Rate: Start with 0.001 and adjust based on training loss. Use learning rate scheduling for better convergence
Gradient Clipping: Default is 1.0. Increase if training is unstable, decrease if gradients are too small
Beta (KL Weight): Start with 1.0. Higher = more regularization, lower = better reconstruction. Adjust based on results
Model Selection: Start with latent_dim=64 for speed/testing. Use 128+ for production quality. Higher = more expressive
Evaluation: Regularly evaluate reconstruction and KL divergence on validation set. Monitor for overfitting
Latent Dimension: Higher latent_dim = more expressive but slower. Common values: 64, 128, 256. Start with 128
Checkpointing: Model saves checkpoints automatically. Can resume training from checkpoint if needed
API Rate Limiting: Implement rate limiting for production deployments. Consider using nginx or similar
Logging: Monitor TensorBoard logs for debugging and optimization. View in outputs/logs/
Device Selection: Use CUDA if available for faster training. CPU works but much slower
Beta Tuning: Start with β=1.0, increase for better latent structure, decrease for better reconstruction
Latent Dimension: Start with 128, increase for more expressive models, decrease for faster training
Early Stopping: Monitor validation loss, stop if no improvement for 10-15 epochs
Checkpointing: Save checkpoints every 5-10 epochs to avoid losing progress
Data Augmentation: Use for small datasets to improve generalization
Learning Rate: Use learning rate scheduling (ReduceLROnPlateau) for better convergence

Use Cases and Applications

Image Generation: Generate new images from random latent vectors
Image Reconstruction: Reconstruct and denoise images
Image Interpolation: Create smooth transitions between images
Anomaly Detection: Detect outliers by high reconstruction error
Data Augmentation: Generate synthetic training data
Image Editing: Manipulate images in latent space
Feature Learning: Learn meaningful image representations
Dimensionality Reduction: Compress images to low-dimensional latent space
Style Transfer: Transfer styles by manipulating latent vectors
Image Completion: Complete missing parts of images

Performance Optimization

GPU Usage: Set CUDA_VISIBLE_DEVICES for multi-GPU systems. Use GPU for training and inference when available
Model Selection: Use latent_dim=64 for fastest inference. Larger models (128+, 256+) for better quality
Batch Processing: Generate multiple images in batches for efficient processing. Reduces overhead
Caching: API server caches model in memory. Model loads once on first request, then reused
Image Size: Use smaller image sizes (64x64) for faster generation. Larger images (128x128+) for better quality
Latent Sampling: Sample from standard normal distribution N(0, I) for generation. Use interpolation for smooth transitions
Model Quantization: Consider model quantization for production to reduce memory and speed up inference
Async Processing: For high-throughput, consider async API or queue system for batch image generation
Memory Management: Clear GPU cache between batches if running out of memory: torch.cuda.empty_cache()

Expected Training Times

Approximate training times for different configurations:

Dataset	Image Size	Latent Dim	Batch Size	Epochs	GPU	Time
MNIST	28×28	64	128	50	GTX 1080	~15 min
MNIST	28×28	128	128	50	GTX 1080	~20 min
CIFAR-10	32×32	128	128	100	GTX 1080	~2 hours
Custom	64×64	256	64	50	RTX 3090	~3 hours
Custom	128×128	512	32	50	RTX 3090	~8 hours

Note: Times are approximate and depend on hardware, dataset size, and other factors. CPU training is typically 10-20x slower.

Model Size and Memory Requirements

Approximate model sizes and memory usage:

Latent Dim	Hidden Dims	Model Size	GPU Memory (Training)	GPU Memory (Inference)
64	[32, 64, 128]	~5 MB	~500 MB	~200 MB
128	[32, 64, 128, 256]	~15 MB	~1.5 GB	~500 MB
256	[64, 128, 256, 512]	~50 MB	~4 GB	~1.5 GB
512	[128, 256, 512, 1024]	~200 MB	~12 GB	~4 GB

Note: Memory usage depends on batch size and image size. Larger batches and images require more memory.

Real-World Examples & Use Cases

Example 1: Training on Custom Face Dataset

Complete workflow for training on a custom face dataset:

# 1. Prepare dataset
mkdir -p data/faces
# Copy face images to data/faces/

# 2. Train VAE model
python train_advanced.py \
    --dataset custom \
    --data-dir data/faces \
    --epochs 100 \
    --batch-size 64 \
    --latent-dim 256 \
    --beta 1.0 \
    --image-size 64 \
    --tensorboard \
    --early-stopping 15 \
    --lr-scheduler \
    --augment

# 3. Generate new faces
python generate.py \
    --model outputs/models/best_vae_model.pth \
    --num-images 16 \
    --output generated_faces.png

# 4. Create face interpolation
python generate.py \
    --model outputs/models/best_vae_model.pth \
    --interpolate \
    --interpolation-steps 20 \
    --output face_interpolation.png

Example 2: Anomaly Detection

Use VAE for detecting anomalies in images:

# Train on normal images only
python train.py --dataset normal_images --epochs 50

# Detect anomalies by reconstruction error
from evaluate import detect_anomalies

anomalies = detect_anomalies(
    model_path="outputs/vae_model.pth",
    test_images="data/test/",
    threshold=0.1  # Reconstruction error threshold
)

# Images with reconstruction error > threshold are anomalies
for img_path, error in anomalies:
    if error > threshold:
        print(f"Anomaly detected: {img_path} (error: {error:.4f})")

Example 3: Image Denoising

Use VAE to denoise corrupted images:

# Train on clean images
python train.py --dataset clean_images --epochs 50

# Denoise corrupted images
from generate import denoise_image

# Load noisy image
noisy_image = load_image("noisy_image.png")

# Encode to latent space and decode (denoising)
denoised = denoise_image(
    model_path="outputs/vae_model.pth",
    noisy_image=noisy_image,
    device="cuda"
)

# Save denoised image
save_image(denoised, "denoised_image.png")

Example 4: Data Augmentation

Generate synthetic training data:

# Train VAE on small dataset
python train.py --dataset small_dataset --epochs 50

# Generate synthetic images to augment dataset
python generate.py \
    --model outputs/vae_model.pth \
    --num-images 1000 \
    --output-dir data/augmented/

# Use generated images as additional training data
# Combine with original dataset for better model performance

Example 5: Latent Space Manipulation

Manipulate images by editing latent vectors:

# Encode image to latent space
from vae_model import VAE
import torch

model = VAE(latent_dim=128)
model.load_state_dict(torch.load("outputs/vae_model.pth"))
model.eval()

# Encode image
image = load_image("input_image.png")
mu, logvar = model.encode(image)
z = model.reparameterize(mu, logvar)

# Manipulate latent vector (e.g., change style)
z_modified = z.clone()
z_modified[:, 0:10] += 0.5  # Modify first 10 dimensions

# Decode modified latent vector
modified_image = model.decode(z_modified)

# Save result
save_image(modified_image, "modified_image.png")

Integration Examples

Integration with Flask Web Application

Integrate VAE into a Flask web application:

from flask import Flask, request, jsonify, send_file
from generate import load_model, generate_images
import io
from PIL import Image

app = Flask(__name__)
model = load_model("outputs/vae_model.pth", device="cuda")

@app.route('/generate', methods=['POST'])
def generate():
    num_images = request.json.get('num_images', 10)
    images = generate_images(model, num_images=num_images, device="cuda")
    
    # Convert to base64 for JSON response
    import base64
    image_data = []
    for img in images:
        buffered = io.BytesIO()
        img.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode()
        image_data.append(img_str)
    
    return jsonify({'images': image_data})

@app.route('/reconstruct', methods=['POST'])
def reconstruct():
    # Receive image file
    file = request.files['image']
    image = Image.open(file.stream)
    
    # Reconstruct using VAE
    reconstructed = model.reconstruct(image)
    
    # Return reconstructed image
    img_io = io.BytesIO()
    reconstructed.save(img_io, 'PNG')
    img_io.seek(0)
    return send_file(img_io, mimetype='image/png')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Integration with FastAPI

Create a FastAPI service for VAE image generation:

from fastapi import FastAPI, File, UploadFile
from fastapi.responses import FileResponse
from generate import load_model, generate_images
import torch

app = FastAPI()
model = load_model("outputs/vae_model.pth", device="cuda")

@app.post("/generate")
async def generate_images_endpoint(num_images: int = 10):
    """Generate images from random latent vectors."""
    images = generate_images(model, num_images=num_images, device="cuda")
    # Save and return images
    return {"status": "success", "num_images": num_images}

@app.post("/reconstruct")
async def reconstruct_image(file: UploadFile = File(...)):
    """Reconstruct uploaded image."""
    image_data = await file.read()
    # Process and reconstruct image
    reconstructed = model.reconstruct(image_data)
    return FileResponse(reconstructed)

@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {"status": "healthy", "model_loaded": True}

Integration with Streamlit

Create an interactive Streamlit application:

import streamlit as st
from generate import load_model, generate_images
import torch

st.title("VAE Image Generation")

# Load model
@st.cache_resource
def load_vae_model():
    return load_model("outputs/vae_model.pth", device="cuda")

model = load_vae_model()

# Sidebar controls
num_images = st.sidebar.slider("Number of Images", 1, 64, 16)
latent_dim = st.sidebar.slider("Latent Dimension", 64, 512, 128)

# Generate button
if st.button("Generate Images"):
    with st.spinner("Generating images..."):
        images = generate_images(model, num_images=num_images, device="cuda")
        st.image(images, width=200)

# Interpolation
st.header("Latent Space Interpolation")
steps = st.slider("Interpolation Steps", 5, 30, 10)
if st.button("Create Interpolation"):
    # Create interpolation
    st.image(interpolated_images, width=200)

Contact Information

Get in Touch

Developer: Molla Samser
Designer & Tester: Rima Khatun

rskworld.in

help@rskworld.in support@rskworld.in

+91 93305 39277

License

This project is for educational purposes only. See LICENSE file for more details.

Theme Settings

Color Scheme

Display Options

Font Size

VAE Image Generation

Project Description

Project Screenshots

Core Features

VAE Architecture

Probabilistic Latent Space

Encoder-Decoder Layers

KL Divergence & Reconstruction Loss

TensorBoard Integration

Web Interface

Advanced Features

Latent Space Interpolation

Data Augmentation

Multiple Dataset Support

Resume Training

Web Interface Features

Technologies Used

Installation & Usage

Installation

PyTorch Installation

Verify Installation

Training the Model

Image Generation

REST API Server

Model Evaluation

Latent Space Visualization

Jupyter Notebook

Project Structure

Configuration Options

Model Configuration

Training Progress Logging

Advanced Training Options

Training on Different Datasets

Detailed Architecture

VAE Components

Loss Function

Reparameterization Trick

Layer-by-Layer Architecture Details

Mathematical Formulation

Beta-VAE and KL Weighting

Advanced Features Usage

Image Generation Usage

Latent Space Interpolation

Model Evaluation

Dataset Preparation

Latent Space Exploration

Advanced Visualization Techniques

Model Export and Deployment

Model Comparison and Analysis

Complete Training Workflow

Step-by-Step Training Process

API Usage Examples

Image Generation Endpoint (cURL)

Latent Interpolation Endpoint (cURL)

Health Check (cURL)

Python Requests Example

JavaScript/Fetch Example

VAE Model Variants

Dataset Information

Dataset Formats

Custom Dataset Format

Adding Custom Training Data

Troubleshooting & Best Practices

Common Issues

Performance Optimization Tips

Best Practices

Use Cases and Applications

Performance Optimization

Expected Training Times

Model Size and Memory Requirements

Real-World Examples & Use Cases

Example 1: Training on Custom Face Dataset

Example 2: Anomaly Detection

Example 3: Image Denoising

Example 4: Data Augmentation

Example 5: Latent Space Manipulation