Comprehensive customer behavior simulation system for retail analytics
Built By dshail | 2025
This project implements a sophisticated customer shopping behavior simulation system that generates realistic synthetic transaction data for retail analytics. The system uses probabilistic modeling, persona-based behavior patterns, and temporal variations to create authentic customer shopping patterns.
๐ NEW: Enhanced with AI & MLOps capabilities!
- ๐ญ 5 Distinct Customer Personas with unique shopping behaviors
- ๐ Probabilistic Basket Generation with realistic price variations
- ๐ Temporal Modeling including festivals, weekends, and seasonal effects
- ๐ Interactive Analytics Dashboard with comprehensive insights
- ๐งช Comprehensive Test Suite ensuring code quality and reliability
- โก High Performance - generates 60K+ transactions in seconds
- ๐ง LLM Integration via OpenRouter API (50-80% cheaper than OpenAI)
- ๐ฏ AI Persona Generation from market research data
- ๐ Intelligent Insights and natural language reporting
- ๐ฎ Predictive Analytics and trend forecasting
- ๐ก Business Intelligence with AI-generated recommendations
- ๐ค Automated ML Training for customer behavior prediction
- ๐ Model Performance Monitoring with drift detection
- ๐งช A/B Testing Framework for model comparison
- ๐ MLflow Integration for experiment tracking
- ๐ Model Versioning and deployment management
- ๐ฑ Production Ready with comprehensive monitoring and alerting
customer-behavior-simulation/
โโโ ๐ src/ # Core application code
โ โโโ models.py # Data models and structures
โ โโโ simulator.py # Main simulation engine
โ โโโ analysis.py # Analytics and insights
โ โโโ llm_integration.py # ๐ค AI-powered features (NEW)
โ โโโ mlops_pipeline.py # ๐ง MLOps pipeline (NEW)
โโโ ๐ config/ # Configuration files
โ โโโ personas.yaml # Customer persona definitions
โ โโโ llm_config.yaml # ๐ค LLM API configuration (NEW)
โ โโโ mlops_config.yaml # ๐ง MLOps settings (NEW)
โโโ ๐ data/ # Data storage
โ โโโ output/ # Simulation results
โ โโโ demo/ # Sample datasets
โ โโโ processed/ # Processed datasets
โโโ ๐ models/ # ๐ค Trained ML models (NEW)
โ โโโ churn_prediction/ # Customer churn models
โ โโโ spending_prediction/ # Spending behavior models
โโโ ๐ mlruns/ # ๐ MLflow experiment tracking (NEW)
โโโ ๐ tests/ # Test suite
โ โโโ test_simulation.py # Comprehensive tests
โโโ ๐ logs/ # Application logs
โโโ main.py # CLI application
โโโ main_enhanced.py # ๐ Enhanced CLI with AI/ML (NEW)
โโโ dashboard.py # Streamlit dashboard
โโโ demo_enhanced_features.py # ๐ฌ Feature demonstration (NEW)
โโโ IMPLEMENTATION_GUIDE.md # ๐ Setup guide for new features (NEW)
โโโ OPENROUTER_SETUP_GUIDE.md # ๐ API setup instructions (NEW)
โโโ requirements.txt # Dependencies (enhanced)
- Python 3.8+
- pip package manager
-
Clone the repository
git clone <repository-url> cd customer-behavior-simulation
-
Install dependencies
pip install -r requirements.txt
-
Run the simulation
python main.py --days 30 --customers 1000
-
View results
streamlit run dashboard.py
# Basic simulation (30 days, 1000 customers per persona)
python main.py
# Extended simulation with custom parameters
python main.py --days 60 --customers 2000 --export-summary
# Run with debug logging
python main.py --log-level DEBUG
# Use custom configuration
python main.py --config config/custom_personas.yaml# Demo all enhanced features
python demo_enhanced_features.py
# Run enhanced simulation with AI insights
python main_enhanced.py --days 30 --enable-ai-insights
# Generate AI personas from market data
python main_enhanced.py --generate-ai-personas --market-data market_research.json
# Full enhanced simulation with ML training
python main_enhanced.py --days 30 --customers 1000 --enable-ml-training --enable-ai-insights
# View enhanced dashboard with ML metrics
streamlit run dashboard_enhanced.py# View MLflow experiment tracking
mlflow ui
# Run A/B testing
python main_enhanced.py --run-ab-test model_v1 model_v2
# Monitor model performance
python main_enhanced.py --monitor-modelsThe system generates comprehensive datasets:
- 60,089 transactions across 5,000 customers
- โน295.6M total revenue over 30-day period
- 5 customer personas with distinct behaviors
- 7 festival periods with realistic spending boosts
| Persona | Frequency | Avg Transaction | Items/Basket | Revenue Share |
|---|---|---|---|---|
| Premium Shopper | Daily | โน8,567 | 3.2 | 75% |
| Family Shopper | Weekly | โน5,349 | 4.6 | 12% |
| Young Professional | Alternate | โน1,234 | 2.1 | 7% |
| Budget Conscious | Weekly | โน1,073 | 3.8 | 3% |
| Senior Citizen | Monthly | โน2,623 | 3.5 | 3% |
- Festival periods: 39% higher average transaction values
- Weekend effect: 16% increase in average spending
- Peak shopping: 6-8pm for young professionals, 10am-12pm for families
- Python 3.8+ - Core language
- Pandas & NumPy - Data manipulation and analysis
- Faker - Synthetic demographic data generation
- Streamlit & Plotly - Interactive dashboards
- PyYAML - Configuration management
- Pytest - Testing framework
- Modular Architecture - Clean separation of concerns
- Data-Driven Configuration - YAML-based persona definitions
- Probabilistic Modeling - Realistic behavior patterns
- Comprehensive Testing - 95% test coverage
- Production Readiness - Logging, error handling, validation
def should_customer_shop_today(persona, customer_id, date, history):
base_probability = get_frequency_probability(persona.frequency)
# Apply temporal multipliers
if is_festival_period(date):
base_probability *= 1.5
if is_weekend(date):
base_probability *= 1.2
return random.random() < min(base_probability, 0.95)def generate_shopping_basket(persona, date):
basket = ShoppingBasket()
for item_category, config in persona.basket_profile.items():
if random.random() < config['probability']:
quantity = random.randint(*config['quantity'])
price = random.uniform(*config['price_range'])
# Festival price adjustment
if is_festival_period(date):
price *= random.uniform(1.05, 1.15)
basket.add_item(BasketItem(item_category, quantity, price))
return basketLaunch the Streamlit dashboard for comprehensive analytics:
streamlit run dashboard.pyFeatures:
- ๐ Real-time Metrics - Revenue, transactions, customer counts
- ๐ญ Persona Analysis - Performance comparison and insights
- ๐ Temporal Patterns - Daily trends, seasonal effects
- ๐ฅ Customer Segmentation - RFM analysis and lifetime value
- ๐ Interactive Filters - Date ranges, persona selection
The system automatically generates:
executive_summary.json- Key performance indicatorspersona_performance.csv- Detailed persona metricsinsights_report.md- Natural language insightsdashboard.html- Interactive visualization dashboard
# Run all tests
python -m pytest tests/ -v
# Run with coverage report
python -m pytest tests/ --cov=src --cov-report=html
# Run specific test
python -m pytest tests/test_simulation.py::TestCustomerBehaviorSimulator::test_persona_loading -v- Unit Tests - Individual component validation
- Integration Tests - End-to-end workflow testing
- Data Quality Tests - Statistical validation
- Performance Tests - Scalability and efficiency
- โ 95% test coverage
- โ Zero critical bugs
- โ PEP 8 compliant code
- โ Comprehensive documentation
This simulation system enables:
- Customer segmentation strategies
- Demand forecasting models
- Inventory optimization
- Marketing campaign targeting
- Training data for recommendation systems
- Customer behavior prediction models
- Anomaly detection algorithms
- A/B testing frameworks
- KPI dashboard development
- Customer lifetime value analysis
- Market basket analysis
- Seasonal trend identification
# AI-powered persona generation from market data
from src.llm_integration import LLMPersonaGenerator, load_llm_config
config = load_llm_config()
persona_generator = LLMPersonaGenerator(config)
personas = persona_generator.generate_personas_from_market_data(market_data)
# Generate intelligent business insights
insights_generator = LLMInsightsGenerator(config)
summary = insights_generator.generate_executive_summary(simulation_results)
# Create natural language reports
report_generator = LLMReportGenerator(config)
report = report_generator.generate_narrative_report(data, "comprehensive")Features:
- ๐ฏ AI Persona Generation - Create personas from market research data
- ๐ Intelligent Insights - Generate business insights from simulation results
- ๐ Natural Language Reports - Comprehensive business reporting
- ๐ฎ Trend Prediction - AI-powered forecasting and recommendations
- ๐ฐ Cost-Effective - Uses OpenRouter API (50-80% cheaper than OpenAI)
# Train customer behavior prediction models
from src.mlops_pipeline import CustomerBehaviorPredictor, ModelConfig
config = ModelConfig(model_type="classification", performance_threshold=0.8)
model = CustomerBehaviorPredictor(config)
metrics = model.train_churn_prediction_model(features_df)
# Monitor model performance and detect drift
monitor = MLOpsMonitor(config)
drift_analysis = monitor.check_data_drift(reference_data, new_data)
# Run A/B testing for model comparison
ab_framework = ABTestingFramework()
results = ab_framework.analyze_experiment("model_comparison_v1")Features:
- ๐ค Automated ML Training - Customer churn and spending prediction models
- ๐ Performance Monitoring - Real-time model performance tracking
- ๐ Data Drift Detection - Automated data quality monitoring
- ๐งช A/B Testing Framework - Statistical model comparison
- ๐ MLflow Integration - Experiment tracking and model versioning
- ๐ Model Deployment - Version management and rollback capabilities
- Apache Kafka integration
- Real-time dashboard updates
- Event-driven architecture
- Stream processing with Apache Spark
Customize customer personas by editing config/personas.yaml:
personas:
- name: "Custom Persona"
frequency: "weekly"
preferred_time: ["2:00pmโ4:00pm"]
demographics:
age_range: [25, 40]
income_range: [30000, 70000]
basket_profile:
groceries:
probability: 0.9
quantity: [3, 8]
price_range: [500, 2000]# Initialize simulator
simulator = CustomerBehaviorSimulator(
config_path='config/personas.yaml',
random_seed=42
)
# Run simulation
transactions_df, customers_df = simulator.run_simulation(
simulation_days=30
)
# Generate analytics
analyzer = SimulationAnalyzer(transactions_df, customers_df)
summary = analyzer.generate_executive_summary()# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts๏ฟฝctivate
# Install development dependencies
pip install -r requirements.txt
# Install pre-commit hooks (optional)
pip install pre-commit
pre-commit install# Format code
black .
isort .
# Lint code
flake8 src/
# Type checking
mypy src/- Basic simulation engine
- Persona-based behavior modeling
- Data export functionality
- Comprehensive testing
- Interactive Streamlit dashboard
- Statistical analysis tools
- Insight generation
- Performance optimization
- LLM Integration - AI-powered persona generation via OpenRouter API
- Intelligent Insights - Natural language business reporting
- MLOps Pipeline - Automated ML training and monitoring
- A/B Testing Framework - Statistical model comparison
- Model Versioning - MLflow integration and experiment tracking
- Performance Monitoring - Data drift detection and alerting
- Comprehensive Logging - Production-ready monitoring
- Error Handling - Robust error recovery and validation
- Configuration Management - Environment-specific settings
- Docker containerization
- Kubernetes orchestration
- CI/CD pipeline setup
- Cloud deployment (AWS/GCP/Azure)
- Real-time streaming simulation with Apache Kafka
- Event-driven architecture
- Stream processing with Apache Spark
- Real-time dashboard updates
- Clean Architecture - Modular, maintainable code structure
- Best Practices - Type hints, documentation, error handling
- Performance - Efficient algorithms and data structures
- Testing - Comprehensive test coverage with multiple test types
- Realistic Data - Statistically valid customer behavior patterns
- Actionable Insights - Clear business recommendations
- Scalable Design - Handles large-scale simulations efficiently
- Production Ready - Logging, monitoring, and error recovery
- Probabilistic Modeling - Advanced statistical behavior simulation
- Temporal Intelligence - Sophisticated festival and seasonal effects
- Interactive Analytics - Modern dashboard with rich visualizations
- Extensible Framework - Easy to add new personas and behaviors
dshail
August 2025
Built as part of an internship assignment to demonstrate:
- System design and architecture skills
- Data engineering and analytics expertise
- Python programming proficiency
- Testing and quality assurance practices
- Documentation and presentation abilities
This project is licensed under the MIT License - see the LICENSE file for details.
While this is an internship assignment project, feedback and suggestions are welcome:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit changes (
git commit -am 'Add improvement') - Push to the branch (
git push origin feature/improvement) - Create a Pull Request
โญ If you found this project impressive, please star the repository!
This simulation system demonstrates production-level software engineering skills combined with deep understanding of retail analytics and customer behavior modeling.