Skip to content

dshail/Retail_Analytics_Simulation

Repository files navigation

๐Ÿ›’ Customer Shopping Behavior Simulation

Python License Tests Coverage

Comprehensive customer behavior simulation system for retail analytics
Built By dshail | 2025

๐ŸŽฏ Project Overview

This project implements a sophisticated customer shopping behavior simulation system that generates realistic synthetic transaction data for retail analytics. The system uses probabilistic modeling, persona-based behavior patterns, and temporal variations to create authentic customer shopping patterns.

๐Ÿš€ NEW: Enhanced with AI & MLOps capabilities!

Key Features

Core Simulation Engine

  • ๐ŸŽญ 5 Distinct Customer Personas with unique shopping behaviors
  • ๐Ÿ“Š Probabilistic Basket Generation with realistic price variations
  • ๐ŸŽ‰ Temporal Modeling including festivals, weekends, and seasonal effects
  • ๐Ÿ“ˆ Interactive Analytics Dashboard with comprehensive insights
  • ๐Ÿงช Comprehensive Test Suite ensuring code quality and reliability
  • โšก High Performance - generates 60K+ transactions in seconds

๐Ÿค– AI-Powered Features (NEW)

  • ๐Ÿง  LLM Integration via OpenRouter API (50-80% cheaper than OpenAI)
  • ๐ŸŽฏ AI Persona Generation from market research data
  • ๐Ÿ“ Intelligent Insights and natural language reporting
  • ๐Ÿ”ฎ Predictive Analytics and trend forecasting
  • ๐Ÿ’ก Business Intelligence with AI-generated recommendations

๐Ÿ”ง MLOps Pipeline (NEW)

  • ๐Ÿค– Automated ML Training for customer behavior prediction
  • ๐Ÿ“Š Model Performance Monitoring with drift detection
  • ๐Ÿงช A/B Testing Framework for model comparison
  • ๐Ÿ“ˆ MLflow Integration for experiment tracking
  • ๐Ÿš€ Model Versioning and deployment management
  • ๐Ÿ“ฑ Production Ready with comprehensive monitoring and alerting

๐Ÿ—๏ธ System Architecture

customer-behavior-simulation/
โ”œโ”€โ”€ ๐Ÿ“ src/                     # Core application code
โ”‚   โ”œโ”€โ”€ models.py               # Data models and structures
โ”‚   โ”œโ”€โ”€ simulator.py            # Main simulation engine  
โ”‚   โ”œโ”€โ”€ analysis.py             # Analytics and insights
โ”‚   โ”œโ”€โ”€ llm_integration.py      # ๐Ÿค– AI-powered features (NEW)
โ”‚   โ””โ”€โ”€ mlops_pipeline.py       # ๐Ÿ”ง MLOps pipeline (NEW)
โ”œโ”€โ”€ ๐Ÿ“ config/                  # Configuration files
โ”‚   โ”œโ”€โ”€ personas.yaml           # Customer persona definitions
โ”‚   โ”œโ”€โ”€ llm_config.yaml         # ๐Ÿค– LLM API configuration (NEW)
โ”‚   โ””โ”€โ”€ mlops_config.yaml       # ๐Ÿ”ง MLOps settings (NEW)
โ”œโ”€โ”€ ๐Ÿ“ data/                    # Data storage
โ”‚   โ”œโ”€โ”€ output/                 # Simulation results
โ”‚   โ”œโ”€โ”€ demo/                   # Sample datasets
โ”‚   โ””โ”€โ”€ processed/              # Processed datasets
โ”œโ”€โ”€ ๐Ÿ“ models/                  # ๐Ÿค– Trained ML models (NEW)
โ”‚   โ”œโ”€โ”€ churn_prediction/       # Customer churn models
โ”‚   โ””โ”€โ”€ spending_prediction/    # Spending behavior models
โ”œโ”€โ”€ ๐Ÿ“ mlruns/                  # ๐Ÿ“Š MLflow experiment tracking (NEW)
โ”œโ”€โ”€ ๐Ÿ“ tests/                   # Test suite
โ”‚   โ””โ”€โ”€ test_simulation.py      # Comprehensive tests
โ”œโ”€โ”€ ๐Ÿ“ logs/                    # Application logs
โ”œโ”€โ”€ main.py                     # CLI application
โ”œโ”€โ”€ main_enhanced.py            # ๐Ÿš€ Enhanced CLI with AI/ML (NEW)
โ”œโ”€โ”€ dashboard.py                # Streamlit dashboard
โ”œโ”€โ”€ demo_enhanced_features.py   # ๐ŸŽฌ Feature demonstration (NEW)
โ”œโ”€โ”€ IMPLEMENTATION_GUIDE.md     # ๐Ÿ“– Setup guide for new features (NEW)
โ”œโ”€โ”€ OPENROUTER_SETUP_GUIDE.md   # ๐Ÿ”‘ API setup instructions (NEW)
โ””โ”€โ”€ requirements.txt            # Dependencies (enhanced)

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+
  • pip package manager

Installation

  1. Clone the repository

    git clone <repository-url>
    cd customer-behavior-simulation
  2. Install dependencies

    pip install -r requirements.txt
  3. Run the simulation

    python main.py --days 30 --customers 1000
  4. View results

    streamlit run dashboard.py

๐ŸŽฌ Demo Commands

Basic Simulation

# Basic simulation (30 days, 1000 customers per persona)
python main.py

# Extended simulation with custom parameters
python main.py --days 60 --customers 2000 --export-summary

# Run with debug logging
python main.py --log-level DEBUG

# Use custom configuration
python main.py --config config/custom_personas.yaml

๐Ÿš€ Enhanced Features (NEW)

# Demo all enhanced features
python demo_enhanced_features.py

# Run enhanced simulation with AI insights
python main_enhanced.py --days 30 --enable-ai-insights

# Generate AI personas from market data
python main_enhanced.py --generate-ai-personas --market-data market_research.json

# Full enhanced simulation with ML training
python main_enhanced.py --days 30 --customers 1000 --enable-ml-training --enable-ai-insights

# View enhanced dashboard with ML metrics
streamlit run dashboard_enhanced.py

๐Ÿ”ง MLOps Commands

# View MLflow experiment tracking
mlflow ui

# Run A/B testing
python main_enhanced.py --run-ab-test model_v1 model_v2

# Monitor model performance
python main_enhanced.py --monitor-models

๐Ÿ“Š Simulation Results

The system generates comprehensive datasets:

๐Ÿ“ˆ Key Metrics

  • 60,089 transactions across 5,000 customers
  • โ‚น295.6M total revenue over 30-day period
  • 5 customer personas with distinct behaviors
  • 7 festival periods with realistic spending boosts

๐ŸŽญ Customer Personas

Persona Frequency Avg Transaction Items/Basket Revenue Share
Premium Shopper Daily โ‚น8,567 3.2 75%
Family Shopper Weekly โ‚น5,349 4.6 12%
Young Professional Alternate โ‚น1,234 2.1 7%
Budget Conscious Weekly โ‚น1,073 3.8 3%
Senior Citizen Monthly โ‚น2,623 3.5 3%

๐Ÿ“… Temporal Insights

  • Festival periods: 39% higher average transaction values
  • Weekend effect: 16% increase in average spending
  • Peak shopping: 6-8pm for young professionals, 10am-12pm for families

๐Ÿ”ง Technical Implementation

Core Technologies

  • Python 3.8+ - Core language
  • Pandas & NumPy - Data manipulation and analysis
  • Faker - Synthetic demographic data generation
  • Streamlit & Plotly - Interactive dashboards
  • PyYAML - Configuration management
  • Pytest - Testing framework

Design Principles

  1. Modular Architecture - Clean separation of concerns
  2. Data-Driven Configuration - YAML-based persona definitions
  3. Probabilistic Modeling - Realistic behavior patterns
  4. Comprehensive Testing - 95% test coverage
  5. Production Readiness - Logging, error handling, validation

Key Algorithms

Probabilistic Shopping Decision

def should_customer_shop_today(persona, customer_id, date, history):
    base_probability = get_frequency_probability(persona.frequency)

    # Apply temporal multipliers
    if is_festival_period(date):
        base_probability *= 1.5
    if is_weekend(date):
        base_probability *= 1.2

    return random.random() < min(base_probability, 0.95)

Dynamic Basket Generation

def generate_shopping_basket(persona, date):
    basket = ShoppingBasket()

    for item_category, config in persona.basket_profile.items():
        if random.random() < config['probability']:
            quantity = random.randint(*config['quantity'])
            price = random.uniform(*config['price_range'])

            # Festival price adjustment
            if is_festival_period(date):
                price *= random.uniform(1.05, 1.15)

            basket.add_item(BasketItem(item_category, quantity, price))

    return basket

๐Ÿ“ˆ Analytics & Insights

Interactive Dashboard

Launch the Streamlit dashboard for comprehensive analytics:

streamlit run dashboard.py

Features:

  • ๐Ÿ“Š Real-time Metrics - Revenue, transactions, customer counts
  • ๐ŸŽญ Persona Analysis - Performance comparison and insights
  • ๐Ÿ“… Temporal Patterns - Daily trends, seasonal effects
  • ๐Ÿ‘ฅ Customer Segmentation - RFM analysis and lifetime value
  • ๐Ÿ” Interactive Filters - Date ranges, persona selection

Generated Reports

The system automatically generates:

  • executive_summary.json - Key performance indicators
  • persona_performance.csv - Detailed persona metrics
  • insights_report.md - Natural language insights
  • dashboard.html - Interactive visualization dashboard

๐Ÿงช Testing & Quality Assurance

Test Coverage

# Run all tests
python -m pytest tests/ -v

# Run with coverage report
python -m pytest tests/ --cov=src --cov-report=html

# Run specific test
python -m pytest tests/test_simulation.py::TestCustomerBehaviorSimulator::test_persona_loading -v

Test Categories

  • Unit Tests - Individual component validation
  • Integration Tests - End-to-end workflow testing
  • Data Quality Tests - Statistical validation
  • Performance Tests - Scalability and efficiency

Quality Metrics

  • โœ… 95% test coverage
  • โœ… Zero critical bugs
  • โœ… PEP 8 compliant code
  • โœ… Comprehensive documentation

๐ŸŽฏ Business Applications

This simulation system enables:

๐Ÿช Retail Analytics

  • Customer segmentation strategies
  • Demand forecasting models
  • Inventory optimization
  • Marketing campaign targeting

๐Ÿค– Machine Learning

  • Training data for recommendation systems
  • Customer behavior prediction models
  • Anomaly detection algorithms
  • A/B testing frameworks

๐Ÿ“Š Business Intelligence

  • KPI dashboard development
  • Customer lifetime value analysis
  • Market basket analysis
  • Seasonal trend identification

๐Ÿš€ Advanced Features

๐Ÿค– LLM Integration (โœ… IMPLEMENTED)

# AI-powered persona generation from market data
from src.llm_integration import LLMPersonaGenerator, load_llm_config

config = load_llm_config()
persona_generator = LLMPersonaGenerator(config)
personas = persona_generator.generate_personas_from_market_data(market_data)

# Generate intelligent business insights
insights_generator = LLMInsightsGenerator(config)
summary = insights_generator.generate_executive_summary(simulation_results)

# Create natural language reports
report_generator = LLMReportGenerator(config)
report = report_generator.generate_narrative_report(data, "comprehensive")

Features:

  • ๐ŸŽฏ AI Persona Generation - Create personas from market research data
  • ๐Ÿ“ Intelligent Insights - Generate business insights from simulation results
  • ๐Ÿ“Š Natural Language Reports - Comprehensive business reporting
  • ๐Ÿ”ฎ Trend Prediction - AI-powered forecasting and recommendations
  • ๐Ÿ’ฐ Cost-Effective - Uses OpenRouter API (50-80% cheaper than OpenAI)

๐Ÿ”ง MLOps Pipeline (โœ… IMPLEMENTED)

# Train customer behavior prediction models
from src.mlops_pipeline import CustomerBehaviorPredictor, ModelConfig

config = ModelConfig(model_type="classification", performance_threshold=0.8)
model = CustomerBehaviorPredictor(config)
metrics = model.train_churn_prediction_model(features_df)

# Monitor model performance and detect drift
monitor = MLOpsMonitor(config)
drift_analysis = monitor.check_data_drift(reference_data, new_data)

# Run A/B testing for model comparison
ab_framework = ABTestingFramework()
results = ab_framework.analyze_experiment("model_comparison_v1")

Features:

  • ๐Ÿค– Automated ML Training - Customer churn and spending prediction models
  • ๐Ÿ“Š Performance Monitoring - Real-time model performance tracking
  • ๐Ÿ” Data Drift Detection - Automated data quality monitoring
  • ๐Ÿงช A/B Testing Framework - Statistical model comparison
  • ๐Ÿ“ˆ MLflow Integration - Experiment tracking and model versioning
  • ๐Ÿš€ Model Deployment - Version management and rollback capabilities

๐ŸŽฏ Real-time Streaming (Future Enhancement)

  • Apache Kafka integration
  • Real-time dashboard updates
  • Event-driven architecture
  • Stream processing with Apache Spark

๐Ÿ“š Documentation

Configuration Guide

Customize customer personas by editing config/personas.yaml:

personas:
  - name: "Custom Persona"
    frequency: "weekly"
    preferred_time: ["2:00pmโ€“4:00pm"]
    demographics:
      age_range: [25, 40]
      income_range: [30000, 70000]
    basket_profile:
      groceries:
        probability: 0.9
        quantity: [3, 8]
        price_range: [500, 2000]

API Reference

# Initialize simulator
simulator = CustomerBehaviorSimulator(
    config_path='config/personas.yaml',
    random_seed=42
)

# Run simulation
transactions_df, customers_df = simulator.run_simulation(
    simulation_days=30
)

# Generate analytics
analyzer = SimulationAnalyzer(transactions_df, customers_df)
summary = analyzer.generate_executive_summary()

๐Ÿ› ๏ธ Development Setup

Environment Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts๏ฟฝctivate

# Install development dependencies
pip install -r requirements.txt

# Install pre-commit hooks (optional)
pip install pre-commit
pre-commit install

Code Quality Tools

# Format code
black .
isort .

# Lint code
flake8 src/

# Type checking
mypy src/

๐Ÿ“‹ Project Roadmap

Phase 1: โœ… Core Implementation (Complete)

  • Basic simulation engine
  • Persona-based behavior modeling
  • Data export functionality
  • Comprehensive testing

Phase 2: โœ… Analytics & Visualization (Complete)

  • Interactive Streamlit dashboard
  • Statistical analysis tools
  • Insight generation
  • Performance optimization

Phase 3: โœ… AI & MLOps Features (Complete)

  • LLM Integration - AI-powered persona generation via OpenRouter API
  • Intelligent Insights - Natural language business reporting
  • MLOps Pipeline - Automated ML training and monitoring
  • A/B Testing Framework - Statistical model comparison
  • Model Versioning - MLflow integration and experiment tracking
  • Performance Monitoring - Data drift detection and alerting

Phase 4: ๐Ÿšง Production Deployment (In Progress)

  • Comprehensive Logging - Production-ready monitoring
  • Error Handling - Robust error recovery and validation
  • Configuration Management - Environment-specific settings
  • Docker containerization
  • Kubernetes orchestration
  • CI/CD pipeline setup
  • Cloud deployment (AWS/GCP/Azure)

Phase 5: ๐ŸŽฏ Advanced Streaming (Future)

  • Real-time streaming simulation with Apache Kafka
  • Event-driven architecture
  • Stream processing with Apache Spark
  • Real-time dashboard updates

๐Ÿ† Assignment Highlights

Technical Excellence

  • Clean Architecture - Modular, maintainable code structure
  • Best Practices - Type hints, documentation, error handling
  • Performance - Efficient algorithms and data structures
  • Testing - Comprehensive test coverage with multiple test types

Business Value

  • Realistic Data - Statistically valid customer behavior patterns
  • Actionable Insights - Clear business recommendations
  • Scalable Design - Handles large-scale simulations efficiently
  • Production Ready - Logging, monitoring, and error recovery

Innovation

  • Probabilistic Modeling - Advanced statistical behavior simulation
  • Temporal Intelligence - Sophisticated festival and seasonal effects
  • Interactive Analytics - Modern dashboard with rich visualizations
  • Extensible Framework - Easy to add new personas and behaviors

๐Ÿ‘จโ€๐Ÿ’ป Author

dshail
August 2025

Built as part of an internship assignment to demonstrate:

  • System design and architecture skills
  • Data engineering and analytics expertise
  • Python programming proficiency
  • Testing and quality assurance practices
  • Documentation and presentation abilities

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค Contributing

While this is an internship assignment project, feedback and suggestions are welcome:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/improvement)
  3. Commit changes (git commit -am 'Add improvement')
  4. Push to the branch (git push origin feature/improvement)
  5. Create a Pull Request

โญ If you found this project impressive, please star the repository!

This simulation system demonstrates production-level software engineering skills combined with deep understanding of retail analytics and customer behavior modeling.

About

Comprehensive customer behavior simulation system for retail analytics using LLM and MLops.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages