This document captures architectural decisions for RLM-Claude-Code. Each decision follows the format: Context → Decision → Consequences.
| ID | Decision | Status | Date |
|---|---|---|---|
| ADR-001 | Plugin over Fork | Accepted | 2025-01 |
| ADR-002 | Python REPL over JavaScript | Accepted | 2025-01 |
| ADR-003 | Complexity-based Activation | Accepted | 2025-01 |
| ADR-004 | Depth=2 Default | Accepted | 2025-01 |
| ADR-005 | Streaming Trajectory | Accepted | 2025-01 |
| ADR-006 | Extended Python Tooling | Accepted | 2025-01 |
| ADR-007 | CPMpy for Constraint Verification | Accepted | 2025-01 |
| ADR-008 | Intelligent Orchestration Layer | Accepted | 2026-01 |
| ADR-009 | Strategy Learning from Trajectories | Accepted | 2026-01 |
Status: Accepted
Context: We need to add RLM capabilities to Claude Code. Options:
- Fork Claude Code and modify directly
- Build as a plugin using Claude Code's extension system
- Build a wrapper/proxy layer
Decision: Build as a Claude Code plugin.
Consequences:
- ✅ Receive automatic updates from Anthropic
- ✅ Community can extend independently
- ✅ Core tool execution, permissions, hooks preserved
⚠️ Limited to plugin API capabilities⚠️ Must sync state between RLM layer and Claude Code core
Spec Reference: §2.1, §5.1
Status: Accepted
Context: RLM requires a REPL for context manipulation. Options:
- Python REPL (as in original RLM paper)
- JavaScript REPL (native to Node.js)
- Custom DSL
Decision: Python REPL with RestrictedPython sandbox.
Consequences:
- ✅ Proven patterns from RLM paper
- ✅ Better string manipulation for context processing
- ✅ Claude stronger at Python code generation
- ✅ Rich ecosystem (pydantic, hypothesis)
⚠️ Additional runtime dependency⚠️ Cross-process communication overhead
Spec Reference: §4.1, §4.1.1
Status: Accepted
Context: When should RLM mode activate? Options:
- Token count threshold (original RLM paper approach)
- Task complexity analysis
- Always on
- Manual only
Decision: Complexity-based activation with bias toward RLM.
Consequences:
- ✅ Activates when reasoning benefit is highest
- ✅ Avoids overhead on simple queries
- ✅ More intelligent than token threshold
⚠️ Classifier must be fast (<50ms)⚠️ Risk of false negatives (missing complex tasks)- Mitigation: Bias toward activation (95%+ recall target)
Spec Reference: §6.3
Status: Accepted
Context: How deep should recursive calls go? Options:
- Depth=1 (paper default)
- Depth=2 (verification chains)
- Unlimited (dangerous)
Decision: Default depth=2, configurable to 3.
Consequences:
- ✅ Enables Root → Analysis → Verification pattern
- ✅ Supports constraint-driven verification workflows
- ✅ Cost manageable with model cascade (Opus→Sonnet→Haiku)
⚠️ 3-5x cost of depth=1⚠️ Higher latency for deep queries
Model Cascade:
| Depth | Model | Cost/1K tokens |
|---|---|---|
| 0 | Opus 4.5 | $15/$75 |
| 1 | Sonnet 4 | $3/$15 |
| 2 | Haiku 4.5 | $0.25/$1.25 |
Spec Reference: §6.4
Status: Accepted
Context: How should RLM reasoning be presented to users? Options:
- Hidden (black box)
- Summary after completion
- Streaming as it happens
- Configurable verbosity
Decision: Streaming with configurable verbosity (minimal/normal/verbose/debug).
Consequences:
- ✅ Users can reason about RLM behavior in real-time
- ✅ Debugging is possible
- ✅ Builds trust through transparency
- ✅ JSON export for analysis
⚠️ Terminal rendering complexity⚠️ Output can be noisy at high verbosity
Spec Reference: §6.6
Status: Accepted
Context: What tools should be available in the REPL? Options:
- Minimal (stdlib only)
- RLM-specific helpers only
- Production Python tooling
Decision: Include uv, ty, ruff, pydantic, and hypothesis.
Consequences:
- ✅ Type checking extracted code with ty
- ✅ Linting code before suggesting edits with ruff
- ✅ Schema validation for structured context with pydantic
- ✅ Property-based verification with hypothesis
- ✅ Fast package management with uv
⚠️ Larger environment footprint⚠️ Security surface for subprocess calls- Mitigation: Subprocess allowlist (ty, ruff only)
Spec Reference: §4.1.1
Status: Accepted
Context: RLM verification at depth=2 needs to reason about invariants, dependencies, and constraints extracted from code context. Options:
- Natural language reasoning only
- Z3 SMT solver directly
- CPMpy constraint programming library
- Custom constraint DSL
Decision: Include CPMpy in REPL tooling for constraint-driven verification.
Consequences:
- ✅ Solver-agnostic: uses OR-Tools by default, can swap to Z3, Gurobi
- ✅ Numpy-based API aligns with data manipulation patterns
- ✅ High-level constraints (AllDifferent, Cumulative, etc.)
- ✅ Incremental solving for iterative verification
- ✅ Aligns with Ananke's constraint-driven code generation philosophy
⚠️ Learning curve for constraint modeling⚠️ Solver performance varies by problem structure- Mitigation: Provide helper functions for common patterns
Use Cases:
| Pattern | CPMpy Application |
|---|---|
| Dependency graphs | Model as precedence constraints |
| Type compatibility | Encode subtyping as logical constraints |
| Resource bounds | Cumulative constraints |
| State machines | Transition constraints |
Spec Reference: §4.1.1
Status: Accepted
Context: RLM decisions (model selection, depth budget, tool access) were heuristic-only. Options:
- Keep heuristic-only routing
- Use Claude as intelligent orchestrator
- Train separate orchestrator model
- Hybrid approach
Decision: Claude Haiku as intelligent orchestrator with heuristic fallback.
Consequences:
- ✅ Intelligent model selection based on task analysis
- ✅ Fast decisions (~200ms with Haiku)
- ✅ Graceful fallback when API unavailable
- ✅ User-configurable preferences (fast/balanced/thorough)
⚠️ Additional API call per query⚠️ Orchestrator accuracy depends on prompt quality- Mitigation: Cache decisions for similar queries
Components:
| Module | Purpose |
|---|---|
| orchestration_schema.py | Plan and context types |
| intelligent_orchestrator.py | Claude-powered decisions |
| user_preferences.py | User preference management |
| auto_activation.py | Automatic RLM activation |
Spec Reference: §8.1 Phase 2
Status: Accepted
Context: Successful RLM trajectories contain strategy patterns that could inform future queries. Options:
- No learning (stateless)
- Rule-based strategy detection
- Embedding-based similarity matching
- Full ML training loop
Decision: Feature-based similarity matching with strategy cache.
Consequences:
- ✅ Learn from successful trajectories
- ✅ Suggest strategies for similar queries
- ✅ No ML training required
- ✅ Interpretable features and decisions
⚠️ Cold start problem (empty cache)⚠️ Feature engineering required- Mitigation: Sensible defaults when cache empty
Strategy Types:
| Strategy | Pattern |
|---|---|
| Peeking | Sample context before deep processing |
| Grepping | Regex/pattern-based search |
| Partition+Map | Divide context, process via sub-calls |
| Programmatic | One-shot code execution |
| Recursive | Spawn sub-queries for complex reasoning |
Components:
| Module | Purpose |
|---|---|
| trajectory_analysis.py | Strategy extraction from events |
| strategy_cache.py | Similarity-based caching |
| tool_bridge.py | Controlled tool access for sub-LLMs |
Spec Reference: §8.1 Phase 3
Status: Proposed
Context: When depth=1 spawns a child REPL, should it share state?
Options:
- Fully isolated (no shared state)
- Read-only access to parent state
- Copy-on-write semantics
Considerations:
- Isolation simplifies reasoning about behavior
- Sharing enables richer verification patterns
- Memory overhead of multiple interpreters
Recommendation: Fully isolated with explicit context passing.
Status: Proposed
Context: When RLM decides to use a Claude Code tool (bash, edit), how should control flow work?
Options:
- Yield control to Claude Code, resume RLM after
- RLM orchestrates tool use through its own handler
- Hybrid based on tool type
Considerations:
- Option A preserves Claude Code's permission model
- Option B provides more control but duplicates logic
- Permission checks must happen regardless
Recommendation: Option A (yield to Claude Code).
## ADR-XXX: [Title]
**Status**: Proposed | Accepted | Deprecated | Superseded
**Context**: [What is the issue? What are the options?]
**Decision**: [What was decided?]
**Consequences**:
- ✅ [Positive consequence]
- ⚠️ [Negative consequence or risk]
- Mitigation: [How we address the risk]
**Spec Reference**: §X.Y