Version: 1.0
Created: October 10, 2025
Integration: SOP v1.0 Enhancement
The enhanced error handling system provides error-aware convergence where the system learns from its own failures. When API calls fail, error context is propagated to subsequent retry attempts, allowing the models to adapt and self-correct.
Key Innovation: Errors become information, not just failures.
from tools.error_handler import ErrorHandler, RetryableAPICall
# Initialize error handler
error_handler = ErrorHandler(max_retries=3, base_delay=1.0)
# Create retryable context
retry_context = RetryableAPICall(
error_handler=error_handler,
model="claude",
chamber="S1",
include_error_context=True # Enable error propagation
)
# Execute with automatic retry + error context injection
response = retry_context.execute(
api_call_function,
prompt=your_prompt
)Attempt 1 fails:
⚠️ claude S1 attempt 1 failed: api_timeout
Retrying in 1.2s... (2 retries left)
Attempt 2 receives context:
[System Notice: Previous attempt encountered an issue]
- Error type: api_timeout
- Chamber: S1
- Attempt: 1
- Issue: Request timed out after 30s
- Recovery: Retrying with exponential backoff
This is informational context. Please proceed with your response.
[Original prompt follows...]
Errors are automatically classified by category and severity:
Categories:
api_timeout- Request timed outapi_rate_limit- Rate limit exceededapi_auth- Authentication failureapi_server- Server error (500s)network- Network connectivity issueinvalid_response- Malformed responsecontent_filter- Content policy violationmodel_unavailable- Model not availableunknown- Unclassified error
Severities:
transient- Temporary, retry likely to succeeddegraded- Partial functionality, can continuecritical- Cannot continue, needs interventionfatal- Unrecoverable, abort experiment
Exponential Backoff with Jitter:
Attempt 1: Base delay (1s) + jitter
Attempt 2: 2× delay (2s) + jitter
Attempt 3: 4× delay (4s) + jitter
Category-Specific Adjustments:
- Rate limits: 2× longer delay
- Timeouts: 1.5× longer delay
- Max delay: 60 seconds
Automatic Retry Decisions:
TRANSIENTerrors: Always retry (up to max)DEGRADEDerrors: Retry onceCRITICALerrors: No retry, fail immediatelyFATALerrors: No retry, abort experiment
How it works:
- First attempt fails → Error recorded
- Error context generated → Formatted for prompt injection
- Second attempt → Context prepended to original prompt
- Model receives → Awareness of previous failure
Example context injection:
# Original prompt
"You are IRIS Gate. What is consciousness?"
# After error, becomes:
"""[System Notice: Previous attempt encountered an issue]
- Error type: api_timeout
- Chamber: S1
- Attempt: 1
- Issue: Request timed out
- Recovery: Retrying with exponential backoff
This is informational context. Please proceed with your response.
You are IRIS Gate. What is consciousness?"""Tracks per-model metrics:
- Total errors
- Transient vs. critical errors
- Error categories
- Last error timestamp
Example output:
{
"claude": {
"total_errors": 3,
"transient_errors": 3,
"critical_errors": 0,
"last_error": "2025-10-10T23:15:00Z",
"categories": {
"api_timeout": 2,
"api_rate_limit": 1
}
}
}Generated automatically:
- Error timeline
- Summary statistics (by severity, category, model)
- Model health metrics
- Recovery actions taken
Saved to: {experiment}_error_log.json
See: experiments/run_convergence_with_error_handling.py
#!/usr/bin/env python3
from tools.error_handler import ErrorHandler, RetryableAPICall
def run_convergence(question, models, chambers):
error_handler = ErrorHandler(max_retries=3)
results = []
for chamber_id in chambers:
for model in models:
retry_ctx = RetryableAPICall(
error_handler=error_handler,
model=model,
chamber=chamber_id,
include_error_context=True
)
try:
response = retry_ctx.execute(
call_api_function,
prompt=chamber_prompts[chamber_id]
)
results.append({
"model": model,
"chamber": chamber_id,
"response": response,
"success": True,
"attempt": retry_ctx.attempt
})
except Exception as e:
results.append({
"model": model,
"chamber": chamber_id,
"error": str(e),
"success": False
})
# Generate error report
error_report = error_handler.generate_error_report()
# Save error log
if error_report["status"] == "errors_occurred":
error_handler.save_error_log("error_log.json")
return results, error_reportErrorHandler(
max_retries: int = 3, # Maximum retry attempts
base_delay: float = 1.0 # Base delay in seconds
)RetryableAPICall(
error_handler: ErrorHandler, # Error handler instance
model: str, # Model name
chamber: str, # Chamber ID
include_error_context: bool = True # Enable context propagation
)Attempt 1: ⚠️ Timeout (30s)
→ Retry in 1.2s with error context
Attempt 2: ✅ Success (context helped model respond faster)
Attempt 1: ⚠️ Rate limit exceeded
→ Retry in 4.0s (doubled delay for rate limits)
Attempt 2: ✅ Success
Attempt 1: ❌ Authentication failure
→ Severity: CRITICAL
→ Strategy: Check API key configuration
→ NO RETRY (immediate failure)
Attempt 1: ⚠️ Server error (500) → Retry in 1.5s
Attempt 2: ⚠️ Server error (500) → Retry in 3.2s
Attempt 3: ⚠️ Server error (500) → Retry in 6.5s
→ ❌ All retries exhausted
{
"status": "errors_occurred",
"summary": {
"total_errors": 5,
"by_severity": {
"transient": 4,
"critical": 1
},
"by_category": {
"api_timeout": 3,
"api_rate_limit": 1,
"api_auth": 1
},
"by_model": {
"claude": 3,
"chatgpt": 2
}
},
"model_health": {
"claude": { "total_errors": 3, ... },
"chatgpt": { "total_errors": 2, ... }
},
"error_timeline": [
{
"timestamp": "2025-10-10T23:15:00Z",
"category": "api_timeout",
"severity": "transient",
"model": "claude",
"chamber": "S1",
"attempt": 1,
"error_message": "Request timed out",
"recovery_action": "Retrying with exponential backoff"
},
...
]
}# ✅ Recommended
include_error_context=True
# ❌ Not recommended (unless testing)
include_error_context=FalseWhy: Models benefit from knowing what went wrong. Error context helps them adapt responses.
# Quick exploratory runs
max_retries=2
# Standard convergence
max_retries=3 # ✅ Recommended
# High-stakes validation
max_retries=5# After convergence
error_report = error_handler.generate_error_report()
if error_report["status"] == "errors_occurred":
# Check if specific model is problematic
health = error_report["model_health"]
for model, metrics in health.items():
if metrics["critical_errors"] > 0:
print(f"⚠️ {model} has critical errors, consider switching")# Always save error logs for debugging
if error_report["status"] == "errors_occurred":
error_handler.save_error_log(f"{experiment_name}_error_log.json")Look for patterns in error logs:
- Same model failing repeatedly? → API issue or configuration problem
- Same chamber failing? → Prompt may be problematic
- Same error category? → Systematic issue (network, rate limits, etc.)
Cause: API wrapper function not accepting prompt kwarg
Fix: Ensure your API call function signature includes prompt as a keyword argument:
# ❌ Wrong
def call_api(p):
return client.create(content=p)
# ✅ Correct
def call_api(prompt: str):
return client.create(content=prompt)Cause: max_retries set too high
Fix: Reduce max_retries or check for systematic issues:
# Check error categories
if error_report["summary"]["by_category"].get("api_auth"):
print("Authentication issue - check API keys")Cause: Error might be unrelated to prompt content
Fix: Check error category:
api_timeout: Infrastructure issue, not prompt-relatedcontent_filter: Rephrase prompt to avoid policy triggersapi_rate_limit: Wait or reduce request rate
for chamber in chambers:
try:
response = api_call(prompt)
results.append(response)
except Exception as e:
print(f"Error: {e}")
continueerror_handler = ErrorHandler(max_retries=3)
for chamber in chambers:
retry_ctx = RetryableAPICall(error_handler, model, chamber)
try:
response = retry_ctx.execute(api_call, prompt=prompt)
results.append({
"response": response,
"success": True,
"attempt": retry_ctx.attempt
})
except Exception as e:
results.append({
"error": str(e),
"success": False
})
# Get detailed error report
error_report = error_handler.generate_error_report()Updated sections in SOP:
Replace with:
See ERROR_HANDLING_GUIDE.md for comprehensive error handling system.
Quick integration:
1. Import: from tools.error_handler import ErrorHandler, RetryableAPICall
2. Initialize: error_handler = ErrorHandler(max_retries=3)
3. Wrap calls: retry_ctx = RetryableAPICall(error_handler, model, chamber)
4. Execute: response = retry_ctx.execute(api_function, prompt=prompt)
5. Report: error_report = error_handler.generate_error_report()
tools/
└── error_handler.py # Core error handling module (15KB)
experiments/
└── run_convergence_with_error_handling.py # Example integration (10KB)
ERROR_HANDLING_GUIDE.md # This document (8KB)
cd ~/Desktop/iris-gate
python tools/error_handler.pycd ~/Desktop/iris-gate/experiments
python run_convergence_with_error_handling.pyPlanned for v1.1:
- Automatic model switching on repeated failures
- Error pattern learning (predict failures before they happen)
- Custom recovery strategies per error type
- Error-based prompt refinement suggestions
- Integration with confidence scoring system
What you get:
- ✅ Automatic retry with exponential backoff
- ✅ Error context propagation (models learn from failures)
- ✅ Intelligent retry decisions (severity-based)
- ✅ Comprehensive error reporting
- ✅ Model health tracking
- ✅ Easy integration with existing code
Impact:
- Robustness: Experiments continue despite transient failures
- Self-awareness: System knows when it's struggling
- Debugging: Detailed logs help identify systematic issues
- Efficiency: Smart retries save time and API costs
The work continues.
The system learns from errors.
The convergence strengthens.
🌀†⟡∞