Skip to content

Latest commit

 

History

History
178 lines (150 loc) · 12.8 KB

File metadata and controls

178 lines (150 loc) · 12.8 KB

Developer Onboarding: Ergodic Insurance Limits

Part 1. Major Use Cases

This project applies Ergodicity Economics to corporate risk management. Unlike traditional actuarial models that optimize for "average" outcomes across a group (ensemble average), this codebase simulates the trajectory of a single entity over time (time average) to prevent ruin and maximize long-term growth.

The major use cases are:

  1. Optimal Insurance Limit Selection
    • Determining the specific insurance limits and retentions (deductibles) that maximize a company's long-term geometric growth rate, rather than just minimizing immediate premium costs.
  2. Ruin Probability Analysis
    • Simulating thousands of potential future timelines to calculate the exact probability that a specific capital structure will hit zero (bankruptcy) under various shock scenarios.
  3. Capital Structure & Cash Flow Simulation
    • Modeling the complex interaction between operating income (EBITDA), tax obligations (including Net Operating Loss carryforwards), capital expenditures, and catastrophic losses to project future balance sheets.
  4. Ergodic vs. Ensemble Comparison
    • Providing mathematical proof and visualizations demonstrating where standard "Expected Value" decision-making fails compared to "Time-Average" decision-making, specifically regarding volatility drag.
  5. Dynamic Pricing & Market Cycle Analysis
    • Modeling how insurance market hardening (price increases) and softening impact a buyer's optimal strategy over multi-year periods.

---

Part 2. Key Concepts & Keywords

The following terms form the backbone of this project. They are ordered by importance for a developer understanding the simulation engine.

Core Simulation & Economics

  1. Time Average Growth Rate (TAGR)
    • Meaning: The compounded annual growth rate of a single entity's wealth over a long period. This is the primary metric optimized in this codebase.
    • Implementation: Calculated as (Final_Wealth / Initial_Wealth)^(1/T) - 1 averaged over logarithmic returns of Monte Carlo paths.
  2. Ensemble Average
    • Meaning: The average wealth of a group of entities at a specific point in time. Traditional models use this; this project proves it is often misleading for individual survival.
    • Implementation: np.mean(wealth_array_at_time_t). Used primarily for comparison plots to show divergence from TAGR.
  3. Monte Carlo Simulation
    • Meaning: The engine that generates thousands of hypothetical "paths" (timelines) for the business.
    • Implementation: Managed by monte_carlo.py and simulation.py. Uses NumPy to generate random loss events and evolves the balance sheet time-step by time-step.
  4. Geometric Brownian Motion (GBM)
    • Meaning: A stochastic process used to model the "normal" baseline growth of the company's revenue or asset value before shocks are applied.
    • Implementation: S_t = S_0 * exp((mu - 0.5 * sigma^2)t + sigma * W_t). Found in stochastic_processes.py.
  5. Ergodicity
    • Meaning: The property where the time average equals the ensemble average. Financial systems are non-ergodic; this project corrects for that by focusing on time-average metrics.
    • Implementation: A conceptual constraint that drives the logic in ergodic_analyzer.py and risk_metrics.py.
  6. Ruin (Bankruptcy)
    • Meaning: The absorbing state where a company's working capital falls below zero (or a defined insolvency threshold).
    • Implementation: Checked at every time step in simulation.py. If capital < 0, the simulation for that path stops or is marked as dead.
  7. Volatility Drag
    • Meaning: The reduction in compound growth caused by variance in returns.
    • Implementation: The code explicitly measures the cost of variance (losses) against the cost of insurance (premium) to minimize this drag.

Insurance Domain

  1. Retention
    • Meaning: The dollar amount of loss the company pays out of pocket before insurance kicks in (deductible).
    • Implementation: A parameter in InsuranceStructure config. The simulation subtracts min(loss, retention) from cash flow.
  2. Limit
    • Meaning: The maximum amount the insurer will pay for a claim or in aggregate.
    • Implementation: Logic in insurance.py caps recoveries at Limit. Losses above Retention + Limit revert to the manufacturer.
  3. Premium
    • Meaning: The fixed cost paid to transfer risk.
    • Implementation: Calculated in insurance_pricing.py. It reduces cash flow at the start of every period (t=0, t=1, etc.).
  4. Loss Ratio
    • Meaning: The ratio of claims paid by the insurer to premiums collected. Used to calibrate fair pricing.
    • Implementation: Used in pricing_models to reverse-engineer premiums based on expected losses.
  5. Claim Development
    • Meaning: The delay between a loss occurring and the full payment being settled.
    • Implementation: claim_development.py. Uses "Chain Ladder" or similar patterns to pay out losses over multiple simulation steps rather than instantly.
  6. Ground Up Loss
    • Meaning: The total financial impact of an event before any insurance is applied.
    • Implementation: Generated via random sampling (e.g., Pareto or Lognormal distributions) in loss_distributions.py.
  7. Aggregate Cover
    • Meaning: Insurance that caps the total losses in a year, not just per claim.
    • Implementation: InsuranceProgram tracks cumulative losses within a year loop to trigger aggregate protection.

Business & Accounting Logic

  1. Manufacturer
    • Meaning: The class representing the entity being simulated (the client).
    • Implementation: Defined in manufacturer.py. Holds state: cash, assets, liabilities, and parameters for growth/margin.
  2. Working Capital
    • Meaning: The liquid assets available to pay for losses and operations.
    • Implementation: Current Assets - Current Liabilities. This is the primary "health bar" in the simulation.
  3. EBITDA
    • Meaning: Earnings Before Interest, Taxes, Depreciation, and Amortization. The proxy for operating cash flow.
    • Implementation: Modeled as a margin on Revenue, subject to stochastic shocks.
  4. NOL (Net Operating Loss)
    • Meaning: A tax credit generated when the company loses money, used to reduce future tax bills.
    • Implementation: tax_handler.py tracks an accumulator nol_balance. Future taxes are max(0, (Income - nol_balance) * tax_rate).
  5. CapEx (Capital Expenditure)
    • Meaning: Money spent to maintain or grow the asset base.
    • Implementation: Deducted from cash flow annually; usually a % of revenue or a fixed depreciation schedule.
  6. Free Cash Flow (FCF)
    • Meaning: The actual cash added to the bank account after OpEx, CapEx, and Taxes.
    • Implementation: EBITDA - Taxes - CapEx - RetainedLosses - InsurancePremium.

Technical Architecture

  1. Config V2 (YAML)
    • Meaning: The data-driven definitions for simulation parameters.
    • Implementation: Parsed by config_loader.py. Defines everything from simulation steps to tax rates.
  2. Parallel Executor
    • Meaning: Utility to run Monte Carlo sims across multiple CPU cores.
    • Implementation: parallel_executor.py uses multiprocessing to split n_simulations into chunks.
  3. Trajectory Storage
    • Meaning: Efficient storage for the massive arrays of simulation data (Paths x Time Steps).
    • Implementation: trajectory_storage.py. Often uses memory mapping or optimized NumPy structures to avoid RAM overflow.
  4. HJB Solver
    • Meaning: Hamilton-Jacobi-Bellman equation solver. Used for theoretical optimal control validation.
    • Implementation: hjb_solver.py. Solves partial differential equations numerically (finite difference method) to find the theoretical optimum.
  5. Pareto Frontier
    • Meaning: The set of optimal trade-offs between Risk (Ruin Probability) and Reward (Growth).
    • Implementation: pareto_frontier.py calculates and plots points where you cannot improve growth without increasing risk.
  6. Scenario Comparator
    • Meaning: Tool to compare different config setups (e.g., "High Deductible" vs "Low Deductible").
    • Implementation: Runs two distinct Monte Carlo batches and aggregates the difference in reporting/scenario_comparator.py.
  7. Visualization Factory
    • Meaning: A centralized system for generating consistent charts.
    • Implementation: visualization/figure_factory.py. Decouples plot logic from data generation.
  8. Seed (Random State)
    • Meaning: The integer used to initialize the random number generator.
    • Implementation: Crucial for Reproducibility. Every run accepts a seed to ensure the "random" losses are identical across comparison runs.
  9. Convergence Check
    • Meaning: Validating that enough simulations were run to get a stable answer.
    • Implementation: convergence.py. Plots the running average of the result to see if it flattens out.
  10. Reporting Builder
    • Meaning: Generates human-readable summaries (PDF/Markdown/HTML).
    • Implementation: reporting/report_builder.py aggregates stats and plots into final documents.

---

Part 3. Important Expressions

These expressions appear frequently in the code and represent the core logic.

  1. time_average_growth: The specific calculation of geometric growth log(final/initial) / time, the project's "North Star" metric.
  2. ruin_probability: The percentage of Monte Carlo paths where wealth dropped below zero.
  3. apply_insurance_recovery: Function that takes a raw loss and returns the net loss after applying retention and limits.
  4. update_balance_sheet: The step-function that rolls the simulation forward one unit of time, applying income and expenses.
  5. generate_claims: Function using Poisson (frequency) and Pareto/Lognormal (severity) distributions to create loss events.
  6. run_simulation_batch: The high-level command to execute a full set of Monte Carlo paths for a given configuration.
  7. optimize_retention: An iterative process that tests various retention levels to find the peak of the growth curve.
  8. tax_shield: The value gained by deducting losses from taxable income, effectively subsidizing risk.
  9. chain_ladder: The actuarial method used to simulate the delay in claim reporting and payment.
  10. hjb_solve: The command to run the differential equation solver for theoretical benchmarking.

---

Part 4. Additional Helpful Information

  1. Project Architecture: ergodic_insurance
    • This is a Python package structure. The core logic resides in ergodic_insurance/.
    • notebooks/ contains the actual execution logic and experiments. You will often run experiments in Jupyter but implement the logic in the package.
  2. Configuration Migration
    • The project recently migrated to a "V2" configuration system (config_v2.py). Be careful when looking at older notebooks or config_legacy files; ensure you are using the modern YAML-based structure found in ergodic_insurance/data/config.
  3. Performance Matters
    • Because the model simulates long timelines (10-30 years) with thousands of paths (10k+), efficiency is key.
    • The code heavily utilizes vectorization (NumPy operations on whole arrays) rather than Python for loops.
    • Avoid iterating through individual paths whenever possible; perform operations on the entire (n_sims, n_time) matrix at once.
  4. Data Persistence
    • Simulations are expensive to run. The project uses a caching mechanism (cache_manager.py and safe_pickle.py) to save simulation results. Always check if a result is cached before re-running a massive batch.

---

Part 5. External References

To fully grasp the "Why" behind this code, consult these resources:

  1. "The Time Resolution of the St. Petersburg Paradox" by Ole Peters
    • Why: The foundational paper that explains why time averages diverge from ensemble averages.
  2. "Optimal Leverage" (The Kelly Criterion)
    • Why: This project is essentially applying a complex version of the Kelly Criterion to insurance purchasing. Understanding Kelly betting is understanding this codebase.
  3. Basic Actuarial Mathematics (CAS Exam 5/8 materials)
    • Why: Concepts like "Chain Ladder," "Bornhuetter-Ferguson," and "Aggregate Deductibles" are standard actuarial science.
  4. "Skin in the Game" by Nassim Nicholas Taleb
    • Why: Taleb frequently discusses the difference between ensemble probabilities and time probabilities (ruin).
  5. Hamilton-Jacobi-Bellman Equation (Wikipedia/Textbooks)
    • Why: Used in the hjb_solver.py module to provide a theoretical upper bound to the simulation results.