help@rskworld.in +91 93305 39277
RSK World
  • Home
  • Development
    • Web Development
    • Mobile Apps
    • Software
    • Games
    • Project
  • Technologies
    • Data Science
    • AI Development
    • Cloud Development
    • Blockchain
    • Cyber Security
    • Dev Tools
    • Testing Tools
  • About
  • Contact

Theme Settings

Color Scheme
Display Options
Font Size
100%
Back

VAE Image Generation

Complete Documentation & Project Details for Variational Autoencoder & Image Generation

Project Description

This project implements a Variational Autoencoder (VAE) for generating realistic images using probabilistic latent space with encoder-decoder architecture. The architecture uses convolutional encoder and decoder networks with reparameterization trick, KL divergence regularization, and reconstruction loss for learning meaningful latent representations. Perfect for learning VAE fundamentals and image generation. The system includes KL divergence and reconstruction loss evaluation, TensorBoard integration, latent space interpolation, data augmentation, and comprehensive training tools.

The VAE uses probabilistic latent space where an encoder network maps images to latent distributions (μ, σ), and a decoder network reconstructs images from sampled latent vectors. The encoder uses standard convolutions to downsample images to latent parameters, while the decoder uses transposed convolutions to upsample from latent vectors to images. The implementation provides complete PyTorch support, comprehensive training pipeline, web interface, evaluation metrics, and deployment tools for image generation applications.

Project Screenshots

1 / 4
VAE Image Generation

Core Features

VAE Architecture

  • Convolutional encoder
  • Convolutional decoder
  • Reparameterization trick
  • Probabilistic latent space
  • Image reconstruction

Probabilistic Latent Space

  • Latent distributions (μ, σ)
  • Reparameterization trick
  • KL divergence regularization
  • Meaningful representations
  • Standard normal prior

Encoder-Decoder Layers

  • Convolutional encoder
  • Transposed convolutions (decoder)
  • Latent space mapping
  • Image reconstruction
  • Deep network architecture

KL Divergence & Reconstruction Loss

  • Reconstruction loss (MSE)
  • KL divergence regularization
  • Beta weighting parameter
  • Model performance evaluation
  • Comprehensive evaluation

TensorBoard Integration

  • Real-time loss visualization
  • Generated image tracking
  • Training progress monitoring
  • Interactive dashboard
  • Comprehensive logging

Web Interface

  • Flask-based web app
  • Interactive image generation
  • Real-time generation
  • Download generated images
  • User-friendly interface

Advanced Features

Latent Space Interpolation

  • Linear interpolation
  • Spherical interpolation (SLERP)
  • Latent walk generation
  • Smooth image transitions
  • Visual exploration

Data Augmentation

  • Adaptive augmentation
  • Mixup augmentation
  • Cutout augmentation
  • Multiple augmentation levels

Multiple Dataset Support

  • Custom dataset support
  • CelebA dataset
  • CIFAR-10 dataset
  • MNIST dataset

Resume Training

  • Checkpoint resuming
  • Automatic checkpoint detection
  • Training continuation
  • Progress preservation
  • Early stopping support

Web Interface Features

Feature Description Usage
Image Generation Generate images from random noise Select number of images and click Generate
Real-time Generation Generate images in real-time Images appear as they are generated
Download Images Download generated images Click download button for each image
Model Selection Choose different trained models Select from available checkpoints

Technologies Used

This VAE Image Generation project is built using modern deep learning and computer vision technologies. The core implementation uses Python as the primary programming language and PyTorch for deep learning operations. The project includes a Variational Autoencoder architecture with convolutional encoder and decoder networks for realistic image generation. The project includes a Flask-based web interface for interactive image generation, Jupyter Notebook support for interactive development and demonstrations, and comprehensive KL divergence and reconstruction loss evaluation for assessing image quality.

The VAE model uses probabilistic latent space where an encoder maps images to latent distributions (μ, σ), and a decoder reconstructs images from sampled latent vectors. The system supports reparameterization trick for enabling backpropagation through random sampling, KL divergence regularization for ensuring latent space follows standard normal distribution, and latent space interpolation for exploring the learned representation space, making it suitable for various image generation applications.

Python 3.8+ PyTorch 2.0+ VAE Autoencoder Image Generation TensorBoard Computer Vision Jupyter Notebook Flask 2.3+ KL Divergence

Installation & Usage

Installation

Install all required dependencies for the VAE Image Generation project:

# Install all requirements pip install -r requirements.txt # The VAE model will be trained on your data # Prepare your dataset in data/custom/ directory # Or use built-in datasets (MNIST, CIFAR-10, Custom)

PyTorch Installation

Install PyTorch (CPU or GPU version):

# For CPU only pip install torch torchvision torchaudio # For CUDA (GPU support) - CUDA 11.8 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # For CUDA 12.1 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # Verify installation python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"

Verify Installation

Test the model and verify all components work:

# Test model architecture python test_model.py # This will verify: # - Model can be instantiated # - Forward pass works # - All components function correctly # - Device compatibility (CPU/CUDA)

Training the Model

Train the VAE model on your image dataset:

# Prepare your dataset # Place images in data/custom/ directory # Or use built-in datasets: 'mnist', 'cifar10', 'custom' # Basic training with default parameters python train.py --dataset mnist --epochs 50 --batch-size 128 --latent-dim 128 # Configure in config.py: # - DATASET = 'mnist' # or 'cifar10', 'custom' # - BATCH_SIZE = 128 # - EPOCHS = 50 # - LATENT_DIM = 128 # - LEARNING_RATE = 0.001 # - BETA = 1.0 # KL divergence weight # Or use Jupyter notebook jupyter notebook VAE_Image_Generation.ipynb # Training will: # - Load and preprocess images # - Initialize encoder and decoder # - Train with reconstruction + KL divergence loss # - Save checkpoints and generated samples # - Log to TensorBoard

Training Parameters (config.py):

  • DATASET: Dataset to use - 'mnist', 'cifar10', 'custom'
  • BATCH_SIZE: Training batch size (default: 128)
  • EPOCHS: Number of training epochs (default: 50)
  • LATENT_DIM: Dimension of latent space (default: 128)
  • LEARNING_RATE: Learning rate (default: 0.001)
  • BETA: Weight for KL divergence term (default: 1.0)
  • INPUT_CHANNELS: Number of input channels - 3 for RGB, 1 for grayscale (default: 3)
  • HIDDEN_DIMS: Encoder/decoder hidden dimensions (default: [32, 64, 128, 256])
  • DEVICE: Device to use - 'cuda' or 'cpu' (default: 'cuda')

Image Generation

Generate images using the trained VAE model:

# Generate images from trained model python generate.py --model outputs/vae_model.pth --num-images 64 --output generated.png # Generate with interpolation python generate.py --model outputs/vae_model.pth --interpolate --interpolation-steps 10 --output interpolation.png # Or use Jupyter notebook jupyter notebook VAE_Image_Generation.ipynb # Using Python API from generate import load_model, generate_images images = generate_images( model_path="outputs/vae_model.pth", num_images=64, device="cuda" ) print(f"Generated {len(images)} images")

REST API Server

Start the Flask API server for web integration:

# Start API server (default port 5000) python api.py --model outputs/vae_model.pth --port 5000 # Start on custom port python api.py --model outputs/vae_model.pth --port 8080 # Start on custom host and port python api.py --model outputs/vae_model.pth --host 0.0.0.0 --port 5000 # API will be available at http://localhost:5000 # Example API calls: # POST /generate - {"num_images": 10} # POST /interpolate - {"num_steps": 10} # GET /health - Check API health

API Server Parameters:

  • --model: Path to trained model checkpoint (required)
  • --port: Port to run server on (default: 5000)
  • --host: Host to bind to (default: 0.0.0.0)

Model Evaluation

Evaluate the trained VAE model performance:

# Evaluate trained model python evaluate.py --model outputs/vae_model.pth --dataset mnist # The evaluation includes: # - Reconstruction error calculation # - KL divergence statistics # - Latent space statistics # - Model performance metrics

Latent Space Visualization

Visualize the learned latent space and generate interpolations:

# Latent space visualization python visualize.py --model outputs/vae_model.pth --mode manifold --output latent_manifold.png # The visualization includes: # - Latent space manifold visualization # - Interpolation between latent points # - Reconstruction examples # - Generated image samples

Jupyter Notebook

Open the interactive Jupyter notebook for demonstrations:

# VAE Image Generation demonstration notebook jupyter notebook VAE_Image_Generation.ipynb # The notebook includes: # - Model architecture visualization # - Encoder-decoder explanation # - Training setup examples # - Image generation examples # - Latent space visualization # Or use JupyterLab jupyter lab VAE_Image_Generation.ipynb

Project Structure

vae-image-generation/
├── README.md # Main documentation
├── requirements.txt # Python dependencies
├── LICENSE # License file
├── CHANGELOG.md # Changelog
├── CONTRIBUTING.md # Contribution guidelines
├── PROJECT_SUMMARY.md # Project summary
│
├── Core Modules
│ ├── vae_model.py # VAE architecture
│ ├── data_loader.py # Data loading and augmentation
│ ├── train.py # Basic training script
│ ├── train_advanced.py # Advanced training script
│ ├── generate.py # Image generation
│ ├── evaluate.py # Model evaluation
│ ├── visualize.py # Visualization tools
│ ├── utils.py # Utility functions
│ └── config.py # Configuration settings
│
├── API & Services
│ ├── api.py # Flask REST API
│ └── index.html # Web interface
│
├── Data
│ └── (training datasets: MNIST, CIFAR-10, Custom)
│
├── Models
│ └── (trained model checkpoints)
│
├── Notebooks
│ └── VAE_Image_Generation.ipynb # Jupyter notebook demo
│
├── Scripts
│ ├── export_model.py # Model export utilities
│ └── compare_models.py # Model comparison

Configuration Options

Model Configuration

Customize model and training parameters in config.py and train.py:

# Model Architecture (config.py) INPUT_CHANNELS = 3 # Input channels (1 for grayscale, 3 for RGB) LATENT_DIM = 128 # Dimension of latent space HIDDEN_DIMS = [32, 64, 128, 256] # Encoder/decoder hidden dimensions # Training Parameters (config.py) BATCH_SIZE = 128 # Training batch size LEARNING_RATE = 0.001 # Learning rate EPOCHS = 50 # Number of training epochs BETA = 1.0 # KL divergence weight DATASET = 'mnist' # Dataset: 'mnist', 'cifar10', 'custom' GRAD_CLIP = 1.0 # Gradient clipping value DEVICE = 'cuda' # Device: 'cuda' or 'cpu' # Generation Configuration NUM_IMAGES = 64 # Number of images to generate INTERPOLATION_STEPS = 10 # Steps for latent interpolation

Configuration Tips:

  • LATENT_DIM: Higher = more expressive but slower. Common values: 64, 128, 256
  • HIDDEN_DIMS: Encoder/decoder channel dimensions. More channels = better quality but slower
  • BETA: KL divergence weight. Higher = more regularization. Start with 1.0
  • BATCH_SIZE: Larger = faster but needs more memory. Adjust based on GPU
  • LEARNING_RATE: Start with 0.001. Use learning rate scheduling for better convergence
  • INPUT_CHANNELS: 1 for grayscale (MNIST), 3 for RGB (CIFAR-10, custom)
  • DATASET: Choose 'mnist', 'cifar10', or 'custom' based on your data

Training Progress Logging

The training script automatically logs progress to TensorBoard and saves checkpoints:

# Training logs are saved to: # outputs/logs/ - TensorBoard logs # outputs/models/ - Model checkpoints # outputs/samples/ - Generated sample images # View TensorBoard logs tensorboard --logdir outputs/logs # TensorBoard shows: # - Reconstruction loss per epoch # - KL divergence loss per epoch # - Total loss per epoch # - Generated image samples # - Learning rate schedule # - Model parameters # Checkpoints are saved as: # outputs/models/vae_model_epoch_XX.pth # outputs/models/best_vae_model.pth

Advanced Training Options

Use the advanced training script with additional features:

# Advanced training with all features python train_advanced.py \ --dataset mnist \ --epochs 50 \ --batch-size 128 \ --latent-dim 128 \ --beta 1.0 \ --tensorboard \ --early-stopping 10 \ --lr-scheduler \ --augment \ --grad-clip 1.0 \ --amp # Features enabled: # --tensorboard: Enable TensorBoard logging # --early-stopping N: Stop if no improvement for N epochs # --lr-scheduler: Use learning rate scheduling # --augment: Enable data augmentation # --grad-clip: Gradient clipping value # --amp: Mixed precision training (faster, less memory) # Resume training from checkpoint python train_advanced.py \ --resume outputs/models/vae_model_epoch_25.pth \ --epochs 50

Training on Different Datasets

Examples for training on different datasets:

# Training on MNIST (grayscale, 28x28) python train.py --dataset mnist --epochs 50 --latent-dim 64 # Training on CIFAR-10 (RGB, 32x32) python train.py --dataset cifar10 --epochs 100 --latent-dim 128 --beta 0.5 # Training on custom dataset (RGB, any size) python train.py \ --dataset custom \ --data-dir data/custom \ --epochs 50 \ --latent-dim 256 \ --image-size 64 # For high-resolution images (128x128 or larger) python train.py \ --dataset custom \ --data-dir data/custom \ --latent-dim 512 \ --hidden-dims "[64, 128, 256, 512, 1024]" \ --image-size 128 \ --batch-size 32

Detailed Architecture

VAE Components

1. Encoder:

  • Convolutional layers that downsample input images
  • Maps images to latent space parameters (μ, σ)
  • Outputs mean (μ) and log variance (log σ²) for each latent dimension
  • Uses standard convolutions with stride for downsampling
  • Creates rich feature representations before latent space

2. Reparameterization Trick:

  • Samples latent vectors z from N(μ, σ²) distribution
  • Enables backpropagation through random sampling
  • Formula: z = μ + σ ⊙ ε, where ε ~ N(0, I)
  • Allows gradient flow from decoder to encoder
  • Key innovation that makes VAE trainable

3. Decoder:

  • Transposed convolutional layers that upsample latent vectors
  • Reconstructs images from sampled latent vectors z
  • Uses transposed convolutions with stride for upsampling
  • Outputs reconstructed images matching input dimensions
  • Learns to generate realistic images from latent space

Loss Function

The VAE loss combines reconstruction and regularization:

Total Loss = Reconstruction Loss + β × KL Divergence Reconstruction Loss (MSE): L_recon = ||x - x_recon||² KL Divergence: L_KL = -0.5 × Σ(1 + log(σ²) - μ² - σ²) Where: - x: Original input image - x_recon: Reconstructed image - μ: Latent mean vector - σ: Latent standard deviation vector - β: Weight for KL divergence (default: 1.0) The KL divergence ensures the latent space follows a standard normal distribution N(0, I)

Reparameterization Trick

Enables backpropagation through random sampling:

z = μ + σ ⊙ ε Where: - μ: Mean vector from encoder - σ: Standard deviation vector from encoder - ε: Random noise sampled from N(0, I) - ⊙: Element-wise multiplication This allows: - Sampling from latent distribution - Gradient flow through random sampling - Training encoder and decoder jointly - Learning meaningful latent representations

Layer-by-Layer Architecture Details

Encoder Architecture:

  • Input Layer: Receives images of size [batch, channels, height, width]
  • Convolutional Blocks: Each block contains Conv2d → BatchNorm → LeakyReLU
  • Downsampling: Uses stride=2 convolutions to reduce spatial dimensions
  • Hidden Dimensions: Progressively increases channels [32→64→128→256]
  • Flattening: Converts feature maps to 1D vectors
  • Latent Projection: Two linear layers output μ (mean) and log σ² (log variance)

Decoder Architecture:

  • Latent Input: Receives sampled latent vector z of size [batch, latent_dim]
  • Initial Projection: Linear layer expands z to feature map size
  • Reshaping: Converts 1D vector to 2D feature maps
  • Transposed Convolutional Blocks: Each block contains ConvTranspose2d → BatchNorm → ReLU
  • Upsampling: Uses stride=2 transposed convolutions to increase spatial dimensions
  • Hidden Dimensions: Progressively decreases channels [256→128→64→32]
  • Output Layer: Final transposed convolution outputs reconstructed image

Mathematical Formulation

Complete mathematical description of the VAE:

# Encoder: q_φ(z|x) μ, log σ² = Encoder(x) q_φ(z|x) = N(z; μ, σ²I) # Reparameterization: z ~ q_φ(z|x) ε ~ N(0, I) z = μ + σ ⊙ ε # Decoder: p_θ(x|z) x_recon = Decoder(z) p_θ(x|z) = N(x; x_recon, I) # For MSE loss # Prior: p(z) p(z) = N(z; 0, I) # Standard normal # Loss Function: ELBO (Evidence Lower BOund) L = E_q[log p_θ(x|z)] - β × KL(q_φ(z|x) || p(z)) # Where: # - First term: Reconstruction loss (MSE) # - Second term: KL divergence (regularization) # - β: Weight parameter (β-VAE)

Beta-VAE and KL Weighting

Understanding the beta parameter in VAE training:

  • β = 0: No KL regularization, pure reconstruction (may collapse to standard autoencoder)
  • β = 1: Standard VAE, balanced reconstruction and regularization
  • β > 1: Stronger regularization, better latent space structure, may sacrifice reconstruction quality
  • β < 1: Weaker regularization, better reconstruction, may have less structured latent space
  • β-VAE: Using β > 1 encourages disentangled representations
  • Recommendation: Start with β = 1.0, adjust based on reconstruction quality vs. latent structure

Advanced Features Usage

Image Generation Usage

Generate images using the trained VAE model:

# Generate images from random latent vectors from generate import load_model, generate_images model = load_model("outputs/vae_model.pth", device="cuda") images = generate_images(model, num_images=64, device="cuda") # Save generated images from utils import save_images save_images(images, "generated_samples.png") # Generate with specific latent vector import torch z = torch.randn(1, 128).to("cuda") # 128 is latent_dim generated = model.decode(z)

Latent Space Interpolation

Generate smooth transitions between images:

from generate import interpolate_latent_space # Linear interpolation between two latent vectors z1 = torch.randn(1, 128).to("cuda") z2 = torch.randn(1, 128).to("cuda") interpolated = interpolate_latent_space(model, z1, z2, steps=10) # Spherical interpolation (SLERP) interpolated = interpolate_latent_space( model, z1, z2, steps=10, method="slerp" ) # Save interpolation results save_images(interpolated, "interpolation.png")

Model Evaluation

Evaluate model performance with reconstruction and KL divergence:

from evaluate import evaluate_model # Evaluate on test dataset results = evaluate_model( model_path="outputs/vae_model.pth", dataset="mnist", device="cuda" ) # Results include: # - Average reconstruction loss # - Average KL divergence # - Total loss # - Latent space statistics print(f"Reconstruction Loss: {results['recon_loss']:.4f}") print(f"KL Divergence: {results['kl_loss']:.4f}") print(f"Total Loss: {results['total_loss']:.4f}")

Dataset Preparation

Prepare your custom dataset for training:

# Prepare custom dataset # Place images in data/custom/ directory # Supported formats: .jpg, .png, .jpeg # Directory structure: # data/custom/ # ├── image1.jpg # ├── image2.png # └── ... # The training script will automatically: # - Load images from directory # - Apply data augmentation # - Resize to specified dimensions # - Normalize pixel values # - Create data loaders for training # Use custom dataset for training python train.py --dataset custom --data-dir data/custom --epochs 50

Latent Space Exploration

Explore the learned latent space:

# Latent space visualization is available in the Jupyter notebook jupyter notebook VAE_Image_Generation.ipynb # The notebook includes: # - Latent space manifold visualization # - Interpolation examples # - Reconstruction examples # - Generated image samples # Visualize latent space with t-SNE or PCA from visualize import visualize_latent_space visualize_latent_space(model, test_loader, method="tsne")

Advanced Visualization Techniques

Use the visualization script for comprehensive analysis:

# Latent manifold visualization (2D grid of generated images) python visualize.py --model outputs/vae_model.pth --mode manifold --output latent_manifold.png # Reconstruction comparison (original vs reconstructed) python visualize.py --model outputs/vae_model.pth --mode reconstruct --num-samples 16 --output reconstruction.png # Latent space traversal (vary one dimension at a time) python visualize.py --model outputs/vae_model.pth --mode traverse --dim 0 --steps 10 --output traversal.png # t-SNE visualization of latent space python visualize.py --model outputs/vae_model.pth --mode tsne --output latent_tsne.png # PCA visualization of latent space python visualize.py --model outputs/vae_model.pth --mode pca --output latent_pca.png # All visualizations at once python visualize.py --model outputs/vae_model.pth --mode all --output-dir visualizations/

Model Export and Deployment

Export trained models for deployment:

# Export to ONNX format (for production deployment) python scripts/export_model.py \ --model outputs/vae_model.pth \ --format onnx \ --output vae_model.onnx \ --input-size 1 3 64 64 # Export to TorchScript (PyTorch mobile/edge) python scripts/export_model.py \ --model outputs/vae_model.pth \ --format torchscript \ --output vae_model.pt # Export encoder and decoder separately python scripts/export_model.py \ --model outputs/vae_model.pth \ --format onnx \ --components encoder decoder \ --output-dir exported_models/ # Verify exported model python scripts/export_model.py \ --model outputs/vae_model.pth \ --format onnx \ --output vae_model.onnx \ --verify

Model Comparison and Analysis

Compare different trained models:

# Compare multiple models python scripts/compare_models.py \ --models outputs/models/vae_model_beta1.pth \ outputs/models/vae_model_beta2.pth \ outputs/models/vae_model_beta5.pth \ --dataset mnist \ --output comparison_report.html # Compare models with different latent dimensions python scripts/compare_models.py \ --models outputs/models/vae_latent64.pth \ outputs/models/vae_latent128.pth \ outputs/models/vae_latent256.pth \ --metrics reconstruction kl_divergence total_loss \ --output comparison.png # Generate side-by-side comparisons python scripts/compare_models.py \ --models outputs/models/*.pth \ --mode visual \ --num-samples 16 \ --output-dir model_comparisons/

Complete Training Workflow

Step-by-Step Training Process

Step 1: Prepare Data

# Use built-in datasets (MNIST, CIFAR-10) # Or prepare custom dataset in data/custom/ directory # For custom dataset: # Place images in data/custom/ # Supported formats: .jpg, .png, .jpeg # The training script will automatically: # - Load images from directory # - Apply data augmentation # - Resize and normalize images # - Create data loaders

Step 2: Train Model

# Start training python train.py --dataset mnist --epochs 50 --batch-size 128 --latent-dim 128 # Training will: # 1. Load and preprocess images # 2. Initialize encoder and decoder # 3. Train with reconstruction + KL divergence loss # 4. Save checkpoints and best model # 5. Log training history to TensorBoard # 6. Generate sample images during training

Step 3: Monitor Training

  • Watch console output for epoch progress
  • Check TensorBoard: tensorboard --logdir outputs/logs
  • View generated samples in outputs/samples/
  • Best model saved as outputs/models/vae_model.pth

Step 4: Evaluate Model

# Evaluate on test set python evaluate.py --model outputs/vae_model.pth --dataset mnist # Calculate reconstruction loss and KL divergence

Step 5: Generate Images

# Generate images python generate.py --model outputs/vae_model.pth --num-images 64 --output generated.png # Generate interpolation python generate.py --model outputs/vae_model.pth --interpolate --interpolation-steps 10

API Usage Examples

Image Generation Endpoint (cURL)

Generate images using the REST API:

curl -X POST http://localhost:5000/generate \ -H "Content-Type: application/json" \ -d '{ "num_images": 10 }' # Response: # { # "num_images": 10, # "images": ["base64_encoded_image1", ...], # "status": "success" # }

Latent Interpolation Endpoint (cURL)

Generate interpolation between latent points:

curl -X POST http://localhost:5000/interpolate \ -H "Content-Type: application/json" \ -d '{ "num_steps": 10 }' # Response: # { # "num_steps": 10, # "images": ["base64_encoded_image1", ...], # "status": "success" # }

Health Check (cURL)

Check API server health and model status:

curl -X GET http://localhost:5000/health # Response: # { # "status": "healthy", # "model_loaded": true, # "device": "cuda" # }

Python Requests Example

Use the API with Python requests library:

import requests import base64 from PIL import Image from io import BytesIO # Image generation endpoint response = requests.post( 'http://localhost:5000/generate', json={'num_images': 10} ) data = response.json() # Decode and save images for i, img_base64 in enumerate(data['images']): img_data = base64.b64decode(img_base64) img = Image.open(BytesIO(img_data)) img.save(f'generated_{i}.png') # Interpolation endpoint interp_response = requests.post( 'http://localhost:5000/interpolate', json={'num_steps': 10} ) print(interp_response.json()) # Health check health = requests.get('http://localhost:5000/health') print(health.json())

JavaScript/Fetch Example

Use the API with JavaScript fetch API:

// Image generation fetch('http://localhost:5000/generate', { method: 'POST', headers: {'Content-Type': 'application/json'}, body: JSON.stringify({ num_images: 10 }) }) .then(res => res.json()) .then(data => { console.log('Generated', data.num_images, 'images'); // Display images from base64 data data.images.forEach((imgBase64, i) => { const img = document.createElement('img'); img.src = 'data:image/png;base64,' + imgBase64; document.body.appendChild(img); }); }); // Interpolation fetch('http://localhost:5000/interpolate', { method: 'POST', headers: {'Content-Type': 'application/json'}, body: JSON.stringify({ num_steps: 10 }) }) .then(res => res.json()) .then(data => { console.log('Interpolation steps:', data.num_steps); }); // Health check fetch('http://localhost:5000/health') .then(res => res.json()) .then(data => console.log('Status:', data));

VAE Model Variants

Model Latent Dim Hidden Dims Use Case Quality
Small VAE 64 [16, 32, 64] Fast inference, basic tasks Good
Medium VAE 128 [32, 64, 128, 256] Balanced quality/speed Better
Large VAE 256 [64, 128, 256, 512] Higher quality generation Best
XL VAE 512 [128, 256, 512, 1024] Research, high quality Excellent

Dataset Information

Dataset Formats

The project supports multiple dataset formats for image training:

  • Built-in datasets: MNIST, CIFAR-10 (automatically downloaded)
  • Custom dataset: Directory of images (JPG, PNG, JPEG)
  • Automatic image loading and preprocessing
  • Data augmentation support
  • Train/validation split support
  • Multiple image format support

Custom Dataset Format

Training data is stored as image files in a directory:

# Custom dataset directory structure data/custom/ ├── image1.jpg ├── image2.png ├── image3.jpeg └── ... # The training script automatically: # - Loads images from directory # - Applies data augmentation # - Resizes to specified dimensions # - Normalizes pixel values # - Creates data loaders for training

Adding Custom Training Data

Add your own image dataset for training:

# Place images in data/custom/ directory # Supported formats: .jpg, .png, .jpeg # Example: mkdir -p data/custom cp your_images/*.jpg data/custom/ # Use in training python train.py --dataset custom --data-dir data/custom --epochs 50 # The script will automatically: # - Load all images from directory # - Apply augmentation if enabled # - Resize and normalize images # - Create train/validation splits

Troubleshooting & Best Practices

Common Issues

  • CUDA Out of Memory: Reduce batch_size in train.py, use smaller d_model (256 instead of 512), reduce num_layers, or use CPU mode
  • Model Not Found: Ensure model is trained first by running train.py or loading from models/ directory. Check model path is correct
  • Vocabulary Not Found: Ensure vocabularies are saved during training. Check vocab_dir path matches training save_dir
  • Slow Generation: Use smaller latent_dim (64 instead of 128), reduce hidden_dims, or use CPU mode
  • API Connection Error: Check if api.py is running on port 5000. Verify model path is correct
  • Import Errors: Verify all dependencies installed: pip install -r requirements.txt. Check Python version (3.8+)
  • Image Size Mismatch: Ensure all images are same size or use data augmentation to resize. Check IMAGE_SIZE in config
  • Poor Generation Quality: Train for more epochs, use larger latent_dim, increase training data, or adjust beta (KL weight)
  • Training Loss Not Decreasing: Check learning rate (may be too high/low), verify data format, check for data issues
  • Validation Loss Increasing: Model may be overfitting. Increase beta (KL weight), use more data, or reduce model size
  • Blurry Generated Images: Increase latent_dim, train longer, or adjust beta to balance reconstruction and KL divergence
  • Mode Collapse: Increase beta value, use more diverse training data, or try different architectures
  • KL Divergence Too High: Reduce beta value or increase model capacity
  • KL Divergence Too Low: Increase beta value to encourage better latent structure
  • Training Instability: Reduce learning rate, use gradient clipping, or try different optimizer
  • Memory Issues: Reduce batch size, use smaller image size, or enable mixed precision training

Performance Optimization Tips

  • GPU Memory: Use gradient accumulation for effective larger batch sizes: accumulate gradients over N batches before updating
  • Mixed Precision: Enable AMP (Automatic Mixed Precision) for 2x speedup and 50% memory reduction
  • Data Loading: Use multiple workers (num_workers=4-8) and pin_memory=True for faster data loading
  • Model Pruning: Remove unnecessary layers or reduce hidden dimensions for faster inference
  • Quantization: Use INT8 quantization for 4x speedup in production (with slight quality loss)
  • Batch Inference: Generate multiple images in batches rather than one at a time
  • Model Caching: Load model once and reuse for multiple generations
  • ONNX Runtime: Use exported ONNX models with ONNX Runtime for faster inference

Best Practices

  • Training Data: Use diverse, high-quality image datasets. More data = better results. Aim for 1K+ images minimum
  • Data Format: Ensure images are in supported formats (JPG, PNG, JPEG). Consistent image sizes recommended
  • Data Preprocessing: Normalize pixel values, resize images to consistent dimensions, apply augmentation if needed
  • Batch Size: Use smaller batches (32-64) for limited GPU memory. Larger batches (128+) for faster training if memory allows
  • Learning Rate: Start with 0.001 and adjust based on training loss. Use learning rate scheduling for better convergence
  • Gradient Clipping: Default is 1.0. Increase if training is unstable, decrease if gradients are too small
  • Beta (KL Weight): Start with 1.0. Higher = more regularization, lower = better reconstruction. Adjust based on results
  • Model Selection: Start with latent_dim=64 for speed/testing. Use 128+ for production quality. Higher = more expressive
  • Evaluation: Regularly evaluate reconstruction and KL divergence on validation set. Monitor for overfitting
  • Latent Dimension: Higher latent_dim = more expressive but slower. Common values: 64, 128, 256. Start with 128
  • Checkpointing: Model saves checkpoints automatically. Can resume training from checkpoint if needed
  • API Rate Limiting: Implement rate limiting for production deployments. Consider using nginx or similar
  • Logging: Monitor TensorBoard logs for debugging and optimization. View in outputs/logs/
  • Device Selection: Use CUDA if available for faster training. CPU works but much slower
  • Beta Tuning: Start with β=1.0, increase for better latent structure, decrease for better reconstruction
  • Latent Dimension: Start with 128, increase for more expressive models, decrease for faster training
  • Early Stopping: Monitor validation loss, stop if no improvement for 10-15 epochs
  • Checkpointing: Save checkpoints every 5-10 epochs to avoid losing progress
  • Data Augmentation: Use for small datasets to improve generalization
  • Learning Rate: Use learning rate scheduling (ReduceLROnPlateau) for better convergence

Use Cases and Applications

  • Image Generation: Generate new images from random latent vectors
  • Image Reconstruction: Reconstruct and denoise images
  • Image Interpolation: Create smooth transitions between images
  • Anomaly Detection: Detect outliers by high reconstruction error
  • Data Augmentation: Generate synthetic training data
  • Image Editing: Manipulate images in latent space
  • Feature Learning: Learn meaningful image representations
  • Dimensionality Reduction: Compress images to low-dimensional latent space
  • Style Transfer: Transfer styles by manipulating latent vectors
  • Image Completion: Complete missing parts of images

Performance Optimization

  • GPU Usage: Set CUDA_VISIBLE_DEVICES for multi-GPU systems. Use GPU for training and inference when available
  • Model Selection: Use latent_dim=64 for fastest inference. Larger models (128+, 256+) for better quality
  • Batch Processing: Generate multiple images in batches for efficient processing. Reduces overhead
  • Caching: API server caches model in memory. Model loads once on first request, then reused
  • Image Size: Use smaller image sizes (64x64) for faster generation. Larger images (128x128+) for better quality
  • Latent Sampling: Sample from standard normal distribution N(0, I) for generation. Use interpolation for smooth transitions
  • Model Quantization: Consider model quantization for production to reduce memory and speed up inference
  • Async Processing: For high-throughput, consider async API or queue system for batch image generation
  • Memory Management: Clear GPU cache between batches if running out of memory: torch.cuda.empty_cache()

Expected Training Times

Approximate training times for different configurations:

Dataset Image Size Latent Dim Batch Size Epochs GPU Time
MNIST 28×28 64 128 50 GTX 1080 ~15 min
MNIST 28×28 128 128 50 GTX 1080 ~20 min
CIFAR-10 32×32 128 128 100 GTX 1080 ~2 hours
Custom 64×64 256 64 50 RTX 3090 ~3 hours
Custom 128×128 512 32 50 RTX 3090 ~8 hours

Note: Times are approximate and depend on hardware, dataset size, and other factors. CPU training is typically 10-20x slower.

Model Size and Memory Requirements

Approximate model sizes and memory usage:

Latent Dim Hidden Dims Model Size GPU Memory (Training) GPU Memory (Inference)
64 [32, 64, 128] ~5 MB ~500 MB ~200 MB
128 [32, 64, 128, 256] ~15 MB ~1.5 GB ~500 MB
256 [64, 128, 256, 512] ~50 MB ~4 GB ~1.5 GB
512 [128, 256, 512, 1024] ~200 MB ~12 GB ~4 GB

Note: Memory usage depends on batch size and image size. Larger batches and images require more memory.

Real-World Examples & Use Cases

Example 1: Training on Custom Face Dataset

Complete workflow for training on a custom face dataset:

# 1. Prepare dataset mkdir -p data/faces # Copy face images to data/faces/ # 2. Train VAE model python train_advanced.py \ --dataset custom \ --data-dir data/faces \ --epochs 100 \ --batch-size 64 \ --latent-dim 256 \ --beta 1.0 \ --image-size 64 \ --tensorboard \ --early-stopping 15 \ --lr-scheduler \ --augment # 3. Generate new faces python generate.py \ --model outputs/models/best_vae_model.pth \ --num-images 16 \ --output generated_faces.png # 4. Create face interpolation python generate.py \ --model outputs/models/best_vae_model.pth \ --interpolate \ --interpolation-steps 20 \ --output face_interpolation.png

Example 2: Anomaly Detection

Use VAE for detecting anomalies in images:

# Train on normal images only python train.py --dataset normal_images --epochs 50 # Detect anomalies by reconstruction error from evaluate import detect_anomalies anomalies = detect_anomalies( model_path="outputs/vae_model.pth", test_images="data/test/", threshold=0.1 # Reconstruction error threshold ) # Images with reconstruction error > threshold are anomalies for img_path, error in anomalies: if error > threshold: print(f"Anomaly detected: {img_path} (error: {error:.4f})")

Example 3: Image Denoising

Use VAE to denoise corrupted images:

# Train on clean images python train.py --dataset clean_images --epochs 50 # Denoise corrupted images from generate import denoise_image # Load noisy image noisy_image = load_image("noisy_image.png") # Encode to latent space and decode (denoising) denoised = denoise_image( model_path="outputs/vae_model.pth", noisy_image=noisy_image, device="cuda" ) # Save denoised image save_image(denoised, "denoised_image.png")

Example 4: Data Augmentation

Generate synthetic training data:

# Train VAE on small dataset python train.py --dataset small_dataset --epochs 50 # Generate synthetic images to augment dataset python generate.py \ --model outputs/vae_model.pth \ --num-images 1000 \ --output-dir data/augmented/ # Use generated images as additional training data # Combine with original dataset for better model performance

Example 5: Latent Space Manipulation

Manipulate images by editing latent vectors:

# Encode image to latent space from vae_model import VAE import torch model = VAE(latent_dim=128) model.load_state_dict(torch.load("outputs/vae_model.pth")) model.eval() # Encode image image = load_image("input_image.png") mu, logvar = model.encode(image) z = model.reparameterize(mu, logvar) # Manipulate latent vector (e.g., change style) z_modified = z.clone() z_modified[:, 0:10] += 0.5 # Modify first 10 dimensions # Decode modified latent vector modified_image = model.decode(z_modified) # Save result save_image(modified_image, "modified_image.png")

Integration Examples

Integration with Flask Web Application

Integrate VAE into a Flask web application:

from flask import Flask, request, jsonify, send_file from generate import load_model, generate_images import io from PIL import Image app = Flask(__name__) model = load_model("outputs/vae_model.pth", device="cuda") @app.route('/generate', methods=['POST']) def generate(): num_images = request.json.get('num_images', 10) images = generate_images(model, num_images=num_images, device="cuda") # Convert to base64 for JSON response import base64 image_data = [] for img in images: buffered = io.BytesIO() img.save(buffered, format="PNG") img_str = base64.b64encode(buffered.getvalue()).decode() image_data.append(img_str) return jsonify({'images': image_data}) @app.route('/reconstruct', methods=['POST']) def reconstruct(): # Receive image file file = request.files['image'] image = Image.open(file.stream) # Reconstruct using VAE reconstructed = model.reconstruct(image) # Return reconstructed image img_io = io.BytesIO() reconstructed.save(img_io, 'PNG') img_io.seek(0) return send_file(img_io, mimetype='image/png') if __name__ == '__main__': app.run(host='0.0.0.0', port=5000)

Integration with FastAPI

Create a FastAPI service for VAE image generation:

from fastapi import FastAPI, File, UploadFile from fastapi.responses import FileResponse from generate import load_model, generate_images import torch app = FastAPI() model = load_model("outputs/vae_model.pth", device="cuda") @app.post("/generate") async def generate_images_endpoint(num_images: int = 10): """Generate images from random latent vectors.""" images = generate_images(model, num_images=num_images, device="cuda") # Save and return images return {"status": "success", "num_images": num_images} @app.post("/reconstruct") async def reconstruct_image(file: UploadFile = File(...)): """Reconstruct uploaded image.""" image_data = await file.read() # Process and reconstruct image reconstructed = model.reconstruct(image_data) return FileResponse(reconstructed) @app.get("/health") async def health_check(): """Health check endpoint.""" return {"status": "healthy", "model_loaded": True}

Integration with Streamlit

Create an interactive Streamlit application:

import streamlit as st from generate import load_model, generate_images import torch st.title("VAE Image Generation") # Load model @st.cache_resource def load_vae_model(): return load_model("outputs/vae_model.pth", device="cuda") model = load_vae_model() # Sidebar controls num_images = st.sidebar.slider("Number of Images", 1, 64, 16) latent_dim = st.sidebar.slider("Latent Dimension", 64, 512, 128) # Generate button if st.button("Generate Images"): with st.spinner("Generating images..."): images = generate_images(model, num_images=num_images, device="cuda") st.image(images, width=200) # Interpolation st.header("Latent Space Interpolation") steps = st.slider("Interpolation Steps", 5, 30, 10) if st.button("Create Interpolation"): # Create interpolation st.image(interpolated_images, width=200)

Contact Information

Get in Touch

Developer: Molla Samser
Designer & Tester: Rima Khatun

rskworld.in
help@rskworld.in support@rskworld.in
+91 93305 39277

License

This project is for educational purposes only. See LICENSE file for more details.

About RSK World

Founded by Molla Samser, with Designer & Tester Rima Khatun, RSK World is your one-stop destination for free programming resources, source code, and development tools.

Founder: Molla Samser
Designer & Tester: Rima Khatun

Development

  • Game Development
  • Web Development
  • Mobile Development
  • Software Development
  • Development Tools

Legal

  • Terms & Conditions
  • Privacy Policy
  • Disclaimer

Contact Info

Nutanhat, Mongolkote
Purba Burdwan, West Bengal
India, 713147

+91 93305 39277

hello@rskworld.in
support@rskworld.in

© 2025 RSK World. All rights reserved.

Content used for educational purposes only. View Disclaimer