This project applies Ergodicity Economics to corporate risk management. Unlike traditional actuarial models that optimize for "average" outcomes across a group (ensemble average), this codebase simulates the trajectory of a single entity over time (time average) to prevent ruin and maximize long-term growth.
The major use cases are:
- Optimal Insurance Limit Selection
- Determining the specific insurance limits and retentions (deductibles) that maximize a company's long-term geometric growth rate, rather than just minimizing immediate premium costs.
- Ruin Probability Analysis
- Simulating thousands of potential future timelines to calculate the exact probability that a specific capital structure will hit zero (bankruptcy) under various shock scenarios.
- Capital Structure & Cash Flow Simulation
- Modeling the complex interaction between operating income (EBITDA), tax obligations (including Net Operating Loss carryforwards), capital expenditures, and catastrophic losses to project future balance sheets.
- Ergodic vs. Ensemble Comparison
- Providing mathematical proof and visualizations demonstrating where standard "Expected Value" decision-making fails compared to "Time-Average" decision-making, specifically regarding volatility drag.
- Dynamic Pricing & Market Cycle Analysis
- Modeling how insurance market hardening (price increases) and softening impact a buyer's optimal strategy over multi-year periods.
The following terms form the backbone of this project. They are ordered by importance for a developer understanding the simulation engine.
- Time Average Growth Rate (TAGR)
- Meaning: The compounded annual growth rate of a single entity's wealth over a long period. This is the primary metric optimized in this codebase.
- Implementation: Calculated as (Final_Wealth / Initial_Wealth)^(1/T) - 1 averaged over logarithmic returns of Monte Carlo paths.
- Ensemble Average
- Meaning: The average wealth of a group of entities at a specific point in time. Traditional models use this; this project proves it is often misleading for individual survival.
- Implementation: np.mean(wealth_array_at_time_t). Used primarily for comparison plots to show divergence from TAGR.
- Monte Carlo Simulation
- Meaning: The engine that generates thousands of hypothetical "paths" (timelines) for the business.
- Implementation: Managed by monte_carlo.py and simulation.py. Uses NumPy to generate random loss events and evolves the balance sheet time-step by time-step.
- Geometric Brownian Motion (GBM)
- Meaning: A stochastic process used to model the "normal" baseline growth of the company's revenue or asset value before shocks are applied.
- Implementation: S_t = S_0 * exp((mu - 0.5 * sigma^2)t + sigma * W_t). Found in stochastic_processes.py.
- Ergodicity
- Meaning: The property where the time average equals the ensemble average. Financial systems are non-ergodic; this project corrects for that by focusing on time-average metrics.
- Implementation: A conceptual constraint that drives the logic in ergodic_analyzer.py and risk_metrics.py.
- Ruin (Bankruptcy)
- Meaning: The absorbing state where a company's working capital falls below zero (or a defined insolvency threshold).
- Implementation: Checked at every time step in simulation.py. If capital < 0, the simulation for that path stops or is marked as dead.
- Volatility Drag
- Meaning: The reduction in compound growth caused by variance in returns.
- Implementation: The code explicitly measures the cost of variance (losses) against the cost of insurance (premium) to minimize this drag.
- Retention
- Meaning: The dollar amount of loss the company pays out of pocket before insurance kicks in (deductible).
- Implementation: A parameter in InsuranceStructure config. The simulation subtracts min(loss, retention) from cash flow.
- Limit
- Meaning: The maximum amount the insurer will pay for a claim or in aggregate.
- Implementation: Logic in insurance.py caps recoveries at Limit. Losses above Retention + Limit revert to the manufacturer.
- Premium
- Meaning: The fixed cost paid to transfer risk.
- Implementation: Calculated in insurance_pricing.py. It reduces cash flow at the start of every period (t=0, t=1, etc.).
- Loss Ratio
- Meaning: The ratio of claims paid by the insurer to premiums collected. Used to calibrate fair pricing.
- Implementation: Used in pricing_models to reverse-engineer premiums based on expected losses.
- Claim Development
- Meaning: The delay between a loss occurring and the full payment being settled.
- Implementation: claim_development.py. Uses "Chain Ladder" or similar patterns to pay out losses over multiple simulation steps rather than instantly.
- Ground Up Loss
- Meaning: The total financial impact of an event before any insurance is applied.
- Implementation: Generated via random sampling (e.g., Pareto or Lognormal distributions) in loss_distributions.py.
- Aggregate Cover
- Meaning: Insurance that caps the total losses in a year, not just per claim.
- Implementation: InsuranceProgram tracks cumulative losses within a year loop to trigger aggregate protection.
- Manufacturer
- Meaning: The class representing the entity being simulated (the client).
- Implementation: Defined in manufacturer.py. Holds state: cash, assets, liabilities, and parameters for growth/margin.
- Working Capital
- Meaning: The liquid assets available to pay for losses and operations.
- Implementation: Current Assets - Current Liabilities. This is the primary "health bar" in the simulation.
- EBITDA
- Meaning: Earnings Before Interest, Taxes, Depreciation, and Amortization. The proxy for operating cash flow.
- Implementation: Modeled as a margin on Revenue, subject to stochastic shocks.
- NOL (Net Operating Loss)
- Meaning: A tax credit generated when the company loses money, used to reduce future tax bills.
- Implementation: tax_handler.py tracks an accumulator nol_balance. Future taxes are max(0, (Income - nol_balance) * tax_rate).
- CapEx (Capital Expenditure)
- Meaning: Money spent to maintain or grow the asset base.
- Implementation: Deducted from cash flow annually; usually a % of revenue or a fixed depreciation schedule.
- Free Cash Flow (FCF)
- Meaning: The actual cash added to the bank account after OpEx, CapEx, and Taxes.
- Implementation: EBITDA - Taxes - CapEx - RetainedLosses - InsurancePremium.
- Config V2 (YAML)
- Meaning: The data-driven definitions for simulation parameters.
- Implementation: Parsed by config_loader.py. Defines everything from simulation steps to tax rates.
- Parallel Executor
- Meaning: Utility to run Monte Carlo sims across multiple CPU cores.
- Implementation: parallel_executor.py uses multiprocessing to split n_simulations into chunks.
- Trajectory Storage
- Meaning: Efficient storage for the massive arrays of simulation data (Paths x Time Steps).
- Implementation: trajectory_storage.py. Often uses memory mapping or optimized NumPy structures to avoid RAM overflow.
- HJB Solver
- Meaning: Hamilton-Jacobi-Bellman equation solver. Used for theoretical optimal control validation.
- Implementation: hjb_solver.py. Solves partial differential equations numerically (finite difference method) to find the theoretical optimum.
- Pareto Frontier
- Meaning: The set of optimal trade-offs between Risk (Ruin Probability) and Reward (Growth).
- Implementation: pareto_frontier.py calculates and plots points where you cannot improve growth without increasing risk.
- Scenario Comparator
- Meaning: Tool to compare different config setups (e.g., "High Deductible" vs "Low Deductible").
- Implementation: Runs two distinct Monte Carlo batches and aggregates the difference in reporting/scenario_comparator.py.
- Visualization Factory
- Meaning: A centralized system for generating consistent charts.
- Implementation: visualization/figure_factory.py. Decouples plot logic from data generation.
- Seed (Random State)
- Meaning: The integer used to initialize the random number generator.
- Implementation: Crucial for Reproducibility. Every run accepts a seed to ensure the "random" losses are identical across comparison runs.
- Convergence Check
- Meaning: Validating that enough simulations were run to get a stable answer.
- Implementation: convergence.py. Plots the running average of the result to see if it flattens out.
- Reporting Builder
- Meaning: Generates human-readable summaries (PDF/Markdown/HTML).
- Implementation: reporting/report_builder.py aggregates stats and plots into final documents.
These expressions appear frequently in the code and represent the core logic.
- time_average_growth: The specific calculation of geometric growth log(final/initial) / time, the project's "North Star" metric.
- ruin_probability: The percentage of Monte Carlo paths where wealth dropped below zero.
- apply_insurance_recovery: Function that takes a raw loss and returns the net loss after applying retention and limits.
- update_balance_sheet: The step-function that rolls the simulation forward one unit of time, applying income and expenses.
- generate_claims: Function using Poisson (frequency) and Pareto/Lognormal (severity) distributions to create loss events.
- run_simulation_batch: The high-level command to execute a full set of Monte Carlo paths for a given configuration.
- optimize_retention: An iterative process that tests various retention levels to find the peak of the growth curve.
- tax_shield: The value gained by deducting losses from taxable income, effectively subsidizing risk.
- chain_ladder: The actuarial method used to simulate the delay in claim reporting and payment.
- hjb_solve: The command to run the differential equation solver for theoretical benchmarking.
- Project Architecture: ergodic_insurance
- This is a Python package structure. The core logic resides in ergodic_insurance/.
- notebooks/ contains the actual execution logic and experiments. You will often run experiments in Jupyter but implement the logic in the package.
- Configuration Migration
- The project recently migrated to a "V2" configuration system (config_v2.py). Be careful when looking at older notebooks or config_legacy files; ensure you are using the modern YAML-based structure found in ergodic_insurance/data/config.
- Performance Matters
- Because the model simulates long timelines (10-30 years) with thousands of paths (10k+), efficiency is key.
- The code heavily utilizes vectorization (NumPy operations on whole arrays) rather than Python for loops.
- Avoid iterating through individual paths whenever possible; perform operations on the entire (n_sims, n_time) matrix at once.
- Data Persistence
- Simulations are expensive to run. The project uses a caching mechanism (cache_manager.py and safe_pickle.py) to save simulation results. Always check if a result is cached before re-running a massive batch.
To fully grasp the "Why" behind this code, consult these resources:
- "The Time Resolution of the St. Petersburg Paradox" by Ole Peters
- Why: The foundational paper that explains why time averages diverge from ensemble averages.
- "Optimal Leverage" (The Kelly Criterion)
- Why: This project is essentially applying a complex version of the Kelly Criterion to insurance purchasing. Understanding Kelly betting is understanding this codebase.
- Basic Actuarial Mathematics (CAS Exam 5/8 materials)
- Why: Concepts like "Chain Ladder," "Bornhuetter-Ferguson," and "Aggregate Deductibles" are standard actuarial science.
- "Skin in the Game" by Nassim Nicholas Taleb
- Why: Taleb frequently discusses the difference between ensemble probabilities and time probabilities (ruin).
- Hamilton-Jacobi-Bellman Equation (Wikipedia/Textbooks)
- Why: Used in the hjb_solver.py module to provide a theoretical upper bound to the simulation results.