RSK World - Complete Documentation - LSTM Chatbot | LSTM Chatbot | Sequence-to-Sequence | Conversational AI | TensorFlow | Keras | RSK World

Project Description

This project implements an LSTM-based sequence-to-sequence chatbot system for conversational AI. The encoder-decoder architecture with attention mechanism processes input sequences and generates contextually relevant responses, making it excellent for multi-turn conversations and natural dialogue. The system includes beam search decoding, conversation history management, temperature sampling, and advanced features like attention visualization, data augmentation, and comprehensive evaluation.

The LSTM chatbot uses encoder-decoder architecture with Bahdanau attention mechanism. The encoder processes input sequences using LSTM layers, while the decoder generates responses with attention to relevant input parts. The implementation provides complete TensorFlow and Keras support, comprehensive training pipeline, REST API server, evaluation metrics, and deployment tools for conversational AI applications.

Project Screenshots

1 / 4

Core Features

LSTM Encoder-Decoder

LSTM encoder-decoder architecture
Sequence-to-sequence modeling
Long-term dependency handling
Context-aware response generation
Natural language understanding

Attention Mechanism

Bahdanau attention mechanism
Focus on relevant input parts
Improved context understanding
Attention weight visualization
Input-output alignment

Beam Search Decoding

Beam search algorithm
Multiple candidate exploration
Configurable beam width
Improved response quality
Better sequence selection

Conversation History

Multi-turn conversation support
Automatic context management
Save/load conversation history
Configurable history length
Context-aware responses

Evaluation Metrics

BLEU score calculation
Word Error Rate (WER)
Perplexity calculation
Sample-based evaluation
Comprehensive metrics

REST API Server

Flask-based API
Chat and history endpoints
CORS enabled
Export functionality
Production-ready

Advanced Features

Attention Visualization

Attention weight visualization
Model interpretability
Input-output alignment
Visual attention maps
Heatmap generation

Data Augmentation

Random insertion
Random deletion
Word swapping
Synonym replacement

Temperature Sampling

Temperature-controlled sampling
Configurable randomness
Balanced vs creative responses
Fine-tuned output control

Training Visualization

Loss curve visualization
Accuracy tracking
Learning rate monitoring
Overfitting detection
Training history plots

REST API Endpoints

Endpoint	Method	Description	Request Body	Response
/chat	POST	Main chat interface	{"message": "text"}	Generated response
/history	GET	Get conversation history	?num=5 (optional)	Conversation history
/history	POST	Save conversation history	{"filepath": "path"}	Success message
/clear	POST	Clear conversation history	N/A	Success message
/export	POST	Export chat log	{"filepath": "path"}	Success message
/health	GET	Health check	N/A	Server status

Technologies Used

This LSTM chatbot project is built using modern deep learning and web technologies. The core implementation uses Python as the primary programming language, TensorFlow and Keras for deep learning operations, and PyTorch for additional model support. The project includes an LSTM encoder-decoder architecture with Bahdanau attention mechanism for sequence-to-sequence modeling. The project includes a Flask-based REST API for web integration, Jupyter Notebook support for interactive development and demonstrations, and comprehensive evaluation metrics for assessing model performance.

The LSTM model uses sequence-to-sequence architecture with encoder-decoder design, enabling the model to process input sequences and generate contextually relevant responses. The system supports beam search decoding for better response quality, conversation history for multi-turn dialogue, and temperature sampling for controlling response randomness, making it suitable for various conversational AI applications.

Python 3.8+ TensorFlow 2.13+ Keras LSTM Seq2Seq Attention Conversational AI Jupyter Notebook Flask 2.3+ Evaluation Metrics

Installation & Usage

Installation

Install all required dependencies for the LSTM chatbot project:

# Install all requirements
pip install -r requirements.txt

# The LSTM model will be trained on your data
# Prepare conversation data in data/conversations.txt

Quick Setup

Set up the project structure and verify installation:

# Run quick setup script
python quick_start.py setup

# Test imports
python quick_start.py test-imports

# Run quick test
python quick_start.py quick-test

Training the Model

Train the LSTM model on your conversation dataset:

# Prepare data in data/conversations.txt
# Format: "input|output" (one conversation per line)

# Basic training
python src/train.py

# Or use Jupyter notebook
jupyter notebook notebooks/02_model_training.ipynb

# Training parameters can be configured in:
# - src/model.py (model architecture)
# - src/train.py (training loop)

Interactive Chatbot

Run the chatbot in interactive mode for conversations:

# Basic chatbot
python demo.py

# Enhanced chatbot with beam search and history
python demo_enhanced.py

# Or use Jupyter notebook
jupyter notebook notebooks/03_chatbot_inference.ipynb

# Enhanced chatbot with Python
from src.chatbot_enhanced import load_enhanced_chatbot

chatbot = load_enhanced_chatbot(
    use_beam_search=True,
    beam_width=5,
    use_history=True,
    max_history=10
)

response = chatbot.chat("Hello!")

REST API Server

Start the Flask API server for web integration:

# Start API server (default port 5000)
python api_server.py

# API will be available at http://localhost:5000

# Example API calls:
# POST /chat - {"message": "Hello"}
# GET /history - Get conversation history
# GET /history?num=5 - Get last 5 exchanges
# POST /history - {"filepath": "chat_history.json"} - Save history
# POST /clear - Clear conversation history
# POST /export - {"filepath": "chat_log.txt"} - Export chat log
# GET /health - Check API health

Model Evaluation

Evaluate the trained model performance:

# Evaluate trained model
python src/evaluate_model.py

# Or use evaluation module
from src.evaluation import evaluate_model

results = evaluate_model(model, test_data, processor)
print(f"BLEU Score: {results['bleu_score']}")
print(f"Word Error Rate: {results['wer']}")
print(f"Perplexity: {results['perplexity']}")

Evaluation Metrics

Evaluate chatbot responses using comprehensive metrics:

from src.evaluation import (
    calculate_bleu_score,
    calculate_wer,
    calculate_perplexity
)

# BLEU Score
reference = "hello how are you".split()
candidate = "hello how are you doing".split()
bleu = calculate_bleu_score(reference, candidate)

# Word Error Rate (WER)
wer = calculate_wer(reference, candidate)

# Perplexity
perplexity = calculate_perplexity(model, test_data)

# Print results
print(f"BLEU Score: {bleu:.4f}")
print(f"Word Error Rate: {wer:.4f}")
print(f"Perplexity: {perplexity:.4f}")

Metric Descriptions:

BLEU Score: Measures n-gram precision between reference and candidate text
Word Error Rate (WER): Measures edit distance between reference and candidate
Perplexity: Measures how well the model predicts the next word in sequence

Attention Visualization

Visualize attention weights to understand model behavior:

from src.visualization import plot_attention_weights

# After generating a response, visualize attention
plot_attention_weights(
    attention_weights,
    input_tokens,
    output_tokens,
    save_path='attention.png'
)

# This creates a heatmap showing which input words
# the model focuses on when generating each output word

Export Chat Logs

Export conversation history to various formats:

# Using enhanced chatbot
from src.chatbot_enhanced import load_enhanced_chatbot

chatbot = load_enhanced_chatbot(use_history=True)

# Save history
chatbot.save_history('chat_history.json')

# Export chat log
chatbot.export_chat_log('chat_log.txt')

# Or use API endpoint
# POST /export - {"filepath": "chat_log.txt"}

Jupyter Notebook

Open the interactive Jupyter notebooks for demonstrations:

# Data preprocessing notebook
jupyter notebook notebooks/01_data_preprocessing.ipynb

# Model training notebook
jupyter notebook notebooks/02_model_training.ipynb

# Chatbot inference notebook
jupyter notebook notebooks/03_chatbot_inference.ipynb

# Or use JupyterLab
jupyter lab notebooks/

Project Structure

                lstm-chatbot/

                ├── README.md                          # Main documentation

                ├── requirements.txt                   # Python dependencies

                ├── setup.py                           # Package setup

                ├── LICENSE                            # License file

                ├── PROJECT_INFO.md                    # Project overview

                ├── FEATURES.md                        # Features documentation

                │

                ├── Core Modules (src/)

                │   ├── data_processing.py             # Data preprocessing

                │   ├── model.py                        # LSTM model architecture

                │   ├── attention.py                   # Attention mechanism

                │   ├── train.py                       # Training script

                │   ├── chatbot.py                     # Basic chatbot

                │   ├── chatbot_enhanced.py            # Enhanced chatbot

                │   ├── beam_search.py                 # Beam search decoding

                │   ├── conversation_history.py       # History management

                │   ├── evaluation.py                 # Evaluation metrics

                │   ├── evaluate_model.py             # Model evaluation

                │   ├── visualization.py              # Visualization utilities

                │   └── data_augmentation.py           # Data augmentation

                │

                ├── API & Services

                │   ├── api_server.py                 # Flask web API

                │   └── index.html                    # Web interface

                │

                ├── Data

                │   └── conversations.txt             # Training data (input|output)

                │

                ├── Models

                │   └── (trained model checkpoints)

                │

                ├── Notebooks

                │   ├── 01_data_preprocessing.ipynb    # Data preprocessing

                │   ├── 02_model_training.ipynb       # Model training

                │   └── 03_chatbot_inference.ipynb    # Chatbot inference

                │

                ├── Demo Scripts

                │   ├── demo.py                       # Basic demo

                │   └── demo_enhanced.py              # Enhanced demo

Configuration Options

Model Configuration

Customize model and training parameters in src/model.py and src/train.py:

# Model Architecture (src/model.py)
EMBEDDING_DIM = 256               # Embedding dimension
LSTM_UNITS = 512                  # LSTM hidden units
ATTENTION_UNITS = 256             # Attention mechanism units
VOCAB_SIZE = 10000                # Vocabulary size
MAX_SEQ_LENGTH = 50               # Maximum sequence length

# Training Parameters (src/train.py)
BATCH_SIZE = 64                   # Training batch size
LEARNING_RATE = 0.001             # Learning rate
NUM_EPOCHS = 50                   # Number of training epochs
VALIDATION_SPLIT = 0.2            # Validation split ratio

# Chatbot Configuration
BEAM_WIDTH = 5                    # Beam search width
TEMPERATURE = 1.0                 # Temperature sampling
MAX_HISTORY = 10                  # Maximum conversation history length

Advanced Features Usage

Chatbot Usage

Use the chatbot with customizable parameters:

# Basic chatbot
from src.chatbot import Chatbot

chatbot = Chatbot(model_path='models/lstm_chatbot')
response = chatbot.chat("Hello, how are you?")
print(response)

# Enhanced chatbot with beam search and history
from src.chatbot_enhanced import load_enhanced_chatbot

chatbot = load_enhanced_chatbot(
    use_beam_search=True,
    beam_width=5,
    use_history=True,
    max_history=10,
    temperature=1.0
)

response = chatbot.chat("Hello!")
print(response)

Beam Search Decoding

Use beam search for better response quality:

from src.chatbot_enhanced import load_enhanced_chatbot
from src.beam_search import generate_with_beam_search

# Load chatbot with beam search
chatbot = load_enhanced_chatbot(
    use_beam_search=True,
    beam_width=5  # Number of candidates to explore
)

# Generate response with beam search
response = chatbot.chat("Hello, how are you?")

# Higher beam width = better quality but slower
# Recommended: 3-10 for balance between quality and speed

Model Evaluation

Evaluate model performance with comprehensive metrics:

from src.evaluate_model import evaluate_model
from src.evaluation import calculate_bleu_score, calculate_wer

# Evaluate model on test dataset
results = evaluate_model(model, test_data, processor)

# Returns dictionary with:
# - bleu_score: BLEU score
# - wer: Word Error Rate
# - perplexity: Language model perplexity

# Calculate individual metrics
reference = "hello how are you".split()
candidate = "hello how are you doing".split()

bleu = calculate_bleu_score(reference, candidate)
wer = calculate_wer(reference, candidate)

print(f"BLEU Score: {bleu:.4f}")
print(f"Word Error Rate: {wer:.4f}")

Data Augmentation

Enhance training data with augmentation techniques:

from src.data_augmentation import (
    random_insertion,
    random_deletion,
    swap_words,
    synonym_replacement
)

# Augment text data
text = "hello how are you"

# Random insertion
augmented = random_insertion(text, num_insertions=1)

# Random deletion
augmented = random_deletion(text, deletion_prob=0.1)

# Word swapping
augmented = swap_words(text, num_swaps=1)

# Synonym replacement
augmented = synonym_replacement(text, num_replacements=1)

# Use augmented data for training to improve model robustness

Temperature Sampling

Control response randomness and creativity with temperature sampling:

from src.chatbot_enhanced import load_enhanced_chatbot

# Temperature values:
# < 1.0: More deterministic, focused responses
# = 1.0: Balanced (default)
# > 1.0: More creative, diverse responses

# Focused responses
chatbot = load_enhanced_chatbot(temperature=0.5)
response = chatbot.chat("Hello!")

# Balanced responses (default)
chatbot = load_enhanced_chatbot(temperature=1.0)
response = chatbot.chat("Hello!")

# Creative responses
chatbot = load_enhanced_chatbot(temperature=1.5)
response = chatbot.chat("Hello!")

API Usage Examples

Chat Endpoint (cURL)

Send a chat message and get response with intent and sentiment:

curl -X POST http://localhost:5000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello, what is LSTM?",
    "session_id": "user123"
  }'

# Response:
# {
#   "response": "LSTM is a recurrent neural network...",
#   "session_id": "user123",
#   "timestamp": "2025-01-15T10:30:00"
# }

Text Generation (cURL)

Generate text from a prompt:

curl -X POST http://localhost:5000/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Once upon a time",
    "max_length": 100,
    "temperature": 0.7
  }'

# Response:
# {
#   "generated_text": "Once upon a time, in a land far away...",
#   "length": 100
# }

Batch Generation (cURL)

Generate text from multiple prompts:

curl -X POST http://localhost:5000/api/batch \
  -H "Content-Type: application/json" \
  -d '{
    "prompts": ["Prompt 1", "Prompt 2", "Prompt 3"],
    "max_length": 100
  }'

# Response:
# {
#   "results": [
#     {"prompt": "Prompt 1", "generated": "..."},
#     {"prompt": "Prompt 2", "generated": "..."}
#   ]
# }

Model Evaluation (cURL)

Evaluate model performance:

curl -X POST http://localhost:5000/api/evaluate \
  -H "Content-Type: application/json" \
  -d '{
    "test_file": "data/val.txt",
    "model": "lstm_chatbot"
  }'

# Response:
# {
#   "perplexity": 25.3,
#   "avg_generation_time": 0.05,
#   "tokens_per_second": 20.0
# }

Python Requests Example

Use the API with Python requests library:

import requests

# Chat endpoint
response = requests.post(
    'http://localhost:5000/chat',
    json={
        'message': 'Hello, how are you?',
        'session_id': 'user123'
    }
)
data = response.json()
print(f"Bot: {data['response']}")
print(f"Intent: {data['intent']}")
print(f"Sentiment: {data['sentiment']}")

# Get context
context = requests.get('http://localhost:5000/context/user123')
print(context.json())

# Clear context
requests.delete('http://localhost:5000/context/user123')

LSTM Model Variants

Model	Parameters	Size	Use Case	Speed
Small LSTM	128 units	~10-20 MB	Fast inference, basic tasks	Fastest
Medium LSTM	256 units	~30-50 MB	Balanced quality/speed	Fast
Large LSTM	512 units	~100-200 MB	Higher quality generation	Moderate
XL LSTM	1024 units	~300-500 MB	Best quality, research	Slower

Dataset Information

Training Datasets

The project includes multiple domain-specific datasets for fine-tuning:

General training data (train.txt, val.txt)
Technical domain data
Creative writing samples
Conversation samples (Q&A pairs)
Science & technology topics
Business & economics content
Health & wellness topics
Education & learning content
Philosophy & ethics topics
History & culture content
Sample prompts (text and JSON formats)
Advanced training examples

Data Format

Training data is stored in plain text format (one example per line):

# train.txt format (one example per line)
Machine learning is a subset of artificial intelligence.
LSTM (Long Short-Term Memory) is a recurrent neural network architecture.
Natural language processing enables computers to understand text.
Deep learning uses neural networks with multiple layers.
Text generation creates coherent sequences of words.

# For batch generation, JSON format:
[
  {"prompt": "Once upon a time", "max_length": 100},
  {"prompt": "In a galaxy far away", "max_length": 100}
]

Adding Custom Training Data

Add your own text data for fine-tuning:

# Simply append to training file
with open('data/train.txt', 'a', encoding='utf-8') as f:
    f.write("Your custom training text here.\n")
    f.write("Each line is a separate training example.\n")
    f.write("The model will learn from these examples.\n")

# Or create a new domain-specific file
with open('data/domain_custom.txt', 'w', encoding='utf-8') as f:
    f.write("Domain-specific training data...\n")

# Use in training
python train.py --train_file data/domain_custom.txt --output_dir models/custom

Troubleshooting & Best Practices

Common Issues

CUDA Out of Memory: Reduce batch_size in src/train.py, use smaller LSTM units (128 instead of 512), or use CPU mode
Model Not Found: Ensure model is trained first by running src/train.py or loading from models/ directory
Slow Generation: Use smaller LSTM units (128 or 256) or reduce beam_width for faster inference
API Connection Error: Check if app.py is running on port 5000
Import Errors: Verify all dependencies installed: pip install -r requirements.txt
Context Too Long: Reduce CONTEXT_LENGTH in config.py or MAX_LENGTH
Repetitive Output: Adjust temperature (higher) or repetition_penalty (higher) in config.py

Best Practices

Training Data: Use diverse, high-quality text data for better results
Context Management: Keep CONTEXT_LENGTH between 3-7 for optimal performance
Batch Size: Use smaller batches (4-8) for limited GPU memory, especially with larger models
Learning Rate: Start with 5e-5 and adjust based on training loss
Temperature: Use 0.7-0.9 for creative text, 0.3-0.5 for more focused output
Model Selection: Start with 128-256 LSTM units for speed, use 512 or larger for quality
Evaluation: Regularly evaluate perplexity on validation set during training
Session Management: Use unique session_ids for different users
API Rate Limiting: Implement rate limiting for production deployments
Logging: Monitor training logs for debugging and optimization

Performance Optimization

GPU Usage: Set CUDA_VISIBLE_DEVICES for multi-GPU systems
Model Selection: Use 128-256 LSTM units for fastest inference, larger units for better quality
Batch Processing: Use batch_generate.py for processing multiple prompts efficiently
Caching: Cache model and tokenizer to avoid reloading
Context Pruning: Remove old context beyond CONTEXT_LENGTH
Generation Parameters: Adjust temperature, top_p, and top_k for desired output style

Contact Information

Get in Touch

Developer: Molla Samser
Designer & Tester: Rima Khatun

rskworld.in

help@rskworld.in support@rskworld.in

+91 93305 39277

License

This project is for educational purposes only. See LICENSE file for more details.

Theme Settings

Color Scheme

Display Options

Font Size

LSTM Chatbot

Project Description

Project Screenshots

Core Features

LSTM Encoder-Decoder

Attention Mechanism

Beam Search Decoding

Conversation History

Evaluation Metrics

REST API Server

Advanced Features

Attention Visualization

Data Augmentation

Temperature Sampling

Training Visualization

REST API Endpoints

Technologies Used

Installation & Usage

Installation

Quick Setup

Training the Model

Interactive Chatbot

REST API Server

Model Evaluation

Evaluation Metrics

Attention Visualization

Export Chat Logs

Jupyter Notebook

Project Structure

Configuration Options

Model Configuration

Advanced Features Usage

Chatbot Usage

Beam Search Decoding

Model Evaluation

Data Augmentation

Temperature Sampling

API Usage Examples

Chat Endpoint (cURL)

Text Generation (cURL)

Batch Generation (cURL)

Model Evaluation (cURL)

Python Requests Example

LSTM Model Variants

Dataset Information

Training Datasets

Data Format

Adding Custom Training Data

Troubleshooting & Best Practices

Common Issues

Best Practices

Performance Optimization

Contact Information

Get in Touch

License