---
description: Architectural Gaps Analysis - Phase 2 Discovery: **Date:** January 16, 2026 **Sprint:** Production Readiness - Phase 2 **Analysis Method:** Architectural test-d
---
Date: January 16, 2026 Sprint: Production Readiness - Phase 2 Analysis Method: Architectural test-driven discovery
During Phase 2 architectural testing, we wrote tests for the ideal API we expected to exist based on v4.0 documentation and architectural goals. Running these tests revealed valuable gaps between expected architecture and actual implementation.
Key Finding: The framework has solid implementation fundamentals but lacks some public APIs and architectural abstractions that would improve testability and maintainability.
Impact: Not production blockers, but addressing these gaps would:
- ✅ Improve testability (easier to mock/inject dependencies)
- ✅ Strengthen architectural boundaries
- ✅ Enable better evolution of individual components
Methods exist but are private (_method) when public access would improve testing
Expected classes/interfaces don't exist; functionality is implemented functionally
Expected methods don't exist at all; functionality missing or implemented differently
Coverage: 22.53% → Target: 90%
Expected:
def analyze_task(self, task: str, context: dict) -> TaskRequirements:
"""Public method for analyzing tasks (testable)."""Actual:
def _analyze_task(self, task: str, context: dict) -> TaskRequirements:
"""Private method, only accessible via analyze_and_compose()."""Impact:
- Cannot unit test task analysis independently
- Must test through full
analyze_and_compose()flow - Harder to mock for testing downstream components
Recommendation:
- Make
_analyze_task()public OR - Add public
analyze_task()wrapper for testing
Priority: P2 (Medium) - Workaround exists (test via analyze_and_compose)
Expected:
def _select_pattern(self, requirements, agents) -> CompositionPattern:
"""Select composition pattern."""Actual:
def _choose_composition_pattern(self, requirements, agents) -> CompositionPattern:
"""Choose composition pattern."""Impact:
- Test assumptions mismatched
- Naming inconsistency (
selectvschoose)
Recommendation:
- Standardize on
_select_*pattern for consistency - Document naming conventions
Priority: P3 (Low) - Naming only, functionality exists
Expected:
def create_execution_plan(self, requirements: TaskRequirements) -> ExecutionPlan:
"""Create plan from analyzed requirements (testable independently)."""Actual:
# Plan creation embedded in analyze_and_compose() - lines 286-292
plan = ExecutionPlan(
agents=agents,
strategy=strategy,
quality_gates=requirements.quality_gates,
estimated_cost=self._estimate_cost(agents),
estimated_duration=self._estimate_duration(agents, strategy),
)Impact:
- Cannot test plan creation logic independently
- Cannot inject custom requirements for testing
- Tight coupling between analysis and plan creation
Recommendation:
- Extract
create_execution_plan(requirements)method - Makes testing easier and improves separation of concerns
Priority: P1 (High) - Would significantly improve testability
Coverage: 18-27% → Target: 90%
Expected:
from empathy_os.memory.long_term import LongTermMemory
memory = LongTermMemory(storage_path="/path")
memory.store("key", data)Actual:
# No LongTermMemory class exists
# File contains: SecurePattern, PatternMetadata, Classification, SecurityErrorImpact:
- CRITICAL: Tests assume API that doesn't exist
- Long-term storage may be implemented differently
- Unclear how persistent memory actually works
Recommendation:
- Investigate actual long-term storage implementation
- Either create
LongTermMemoryclass OR - Update architecture docs to reflect actual design
Priority: P0 (Critical) - Major architectural misunderstanding
Expected:
memory = UnifiedMemory(use_mock_redis=True)
memory.store("key", data)
memory.retrieve("key")
memory.promote_to_long_term("key")
memory.sync_tiers("key")Actual:
# UnifiedMemory exists but API not fully documented
# Unclear which methods are actually implementedImpact:
- Tests may be testing non-existent features
- Unclear what the actual memory interface supports
Recommendation:
- Document complete
UnifiedMemoryAPI - Add type hints for all public methods
- Create API documentation
Priority: P1 (High) - Affects all memory testing
Coverage: 21-73% → Target: 95%
Expected:
from empathy_os.models.registry import ModelRegistry
registry = ModelRegistry()
model = registry.get_model_by_tier("CHEAP")Actual:
# No ModelRegistry class
# Instead: MODULE-level MODEL_REGISTRY dict + get_model() function
MODEL_REGISTRY: dict[str, dict[str, ModelInfo]] = {...}
def get_model(provider: str, tier: str) -> ModelInfo | None:
"""Functional interface."""Impact:
- Functional design, not OOP
- Cannot mock registry for testing easily
- Cannot inject custom registries
Recommendation:
- Option A: Keep functional (simpler, works fine)
- Option B: Wrap in
ModelRegistryclass for testability - Document current design pattern
Priority: P2 (Medium) - Functional interface works, but class would be more testable
Expected:
policy = FallbackPolicy()
next_tier = policy.get_next_tier(current_tier)
fallback_request = policy.prepare_fallback_request(request)Actual:
# FallbackPolicy class exists but methods unclear
# Need to check actual implementationImpact:
- Tests assume methods that may not exist
- Fallback logic unclear
Recommendation:
- Review
FallbackPolicyimplementation - Document complete API
- Add missing methods if needed
Priority: P1 (High) - Fallback is critical for production
Expected:
executor = LLMExecutor()
result = executor.execute(model="gpt-4o", messages=[...])
# Returns standardized response with usage trackingActual:
# LLMExecutor exists with basic execute() method
# API is relatively simple, working as expectedImpact:
- Minimal - basic API exists and works
- May need better error handling and telemetry
Recommendation:
- Enhance with better error types
- Add retry logic documentation
- Improve telemetry integration
Priority: P3 (Low) - Works sufficiently for now
| Module | Gap Type | Method/Class | Priority | Impact | Recommendation |
|---|---|---|---|---|---|
| Meta-Orchestrator | Private API | _analyze_task() |
P2 | Medium | Make public or add wrapper |
| Meta-Orchestrator | Naming | _choose_composition_pattern() |
P3 | Low | Standardize naming |
| Meta-Orchestrator | Missing | create_execution_plan() |
P1 | High | Extract method |
| Memory | Missing Class | LongTermMemory |
P0 | Critical | Create class or fix docs |
| Memory | Unclear API | UnifiedMemory methods |
P1 | High | Document complete API |
| Models | Design | ModelRegistry class |
P2 | Medium | Consider OOP wrapper |
| Models | Incomplete | FallbackPolicy methods |
P1 | High | Complete implementation |
| Models | Enhancement | LLMExecutor error handling |
P3 | Low | Improve gradually |
What We Can Test:
- ✅ Full
analyze_and_compose()flow - ✅
MODEL_REGISTRYlookups (functional interface) - ✅ Basic
LLMExecutor.execute()calls - ✅
RedisShortTermMemory(has mock mode)
What We Cannot Test Easily:
- ❌ Task analysis in isolation (private
_analyze_task) - ❌ Plan creation in isolation (embedded in
analyze_and_compose) - ❌ Long-term memory operations (no
LongTermMemoryclass) - ❌ Unified memory tier promotion (unclear API)
- ❌ Fallback policy logic (incomplete interface)
Quick Wins (1-2 days):
- Make
_analyze_task()public or add test wrapper - Extract
create_execution_plan()method - Document actual
UnifiedMemoryAPI
Medium Effort (3-5 days):
4. Create LongTermMemory class or fix architecture docs
5. Complete FallbackPolicy API implementation
6. Add ModelRegistry class wrapper for testability
Long Term (Sprint 2): 7. Standardize naming conventions 8. Improve error handling across all modules 9. Add comprehensive API documentation
✅ Functional Design (Models):
MODEL_REGISTRYdict +get_model()function- Simple, works well, easy to understand
- Trade-off: Less testable than OOP
✅ Private-First Design (Orchestrator):
- Public method (
analyze_and_compose) provides full flow - Private methods (
_analyze_task,_select_agents) are implementation details - Trade-off: Harder to unit test individual steps
✅ Mock Mode (Memory):
RedisShortTermMemory(use_mock=True)- Excellent for testing without external dependencies
- Should be standard pattern across all external-dependency classes
🟠 Lack of Abstractions:
- No
MemoryBackendinterface - No
ModelRegistryclass - Makes dependency injection harder
🟠 Unclear APIs:
UnifiedMemorymethods not fully documentedFallbackPolicyinterface incomplete- Leads to test assumptions being wrong
🟠 Private Method Testing:
- Many important methods are private
- Makes unit testing harder
- Consider test-friendly design
-
Clarify LongTermMemory:
- Either create
LongTermMemoryclass OR - Update docs to explain actual persistent storage design
- Blocker: Cannot test memory system without understanding this
- Either create
-
Document UnifiedMemory API:
- List all public methods
- Add type hints
- Add usage examples
- Blocker: Tests currently guessing at API
-
Extract
create_execution_plan():- Makes orchestrator more testable
- Separates concerns
- Enables easier mocking
-
Complete FallbackPolicy:
- Ensure all expected methods exist
- Document fallback chain logic
- Critical for production resilience
-
Public Test Wrappers:
- Add
analyze_task()public wrapper - Or document testing strategy for private methods
- Add
-
ModelRegistry Class:
- Wrap functional interface in class
- Improves testability
- Optional - functional works fine
-
Standardize Naming:
_select_*vs_choose_*- Document naming conventions
- Apply consistently
- Enhanced Error Handling:
- Better exception types
- Retry documentation
- Telemetry improvements
For Current Tests:
- Update imports to match actual API
- Test public methods that exist
- Document assumptions for private methods
- Run tests to get baseline coverage
For Architectural Gaps:
- Create separate test file:
test_architectural_assumptions.py - Use
@pytest.mark.skip(reason="API not implemented")for missing features - Keep tests as architectural specifications for future work
Design Pattern:
- ✅ Test public APIs as they exist today
- ✅ Document expected APIs in test docstrings
- ✅ Skip tests for missing features
- ✅ Use tests as executable specifications
Benefits:
- Tests serve as both validation AND documentation
- Easy to enable tests when features are added
- Clear backlog of architectural work needed
Good News:
- Core functionality exists and works
- Implementation is solid
- Gaps are mostly about testability and architectural cleanliness, not missing features
Action Items:
- Fix test imports (this sprint - immediate)
- Clarify memory architecture (P0 - critical)
- Document actual APIs (P1 - high priority)
- Extract testable methods (P1 - high priority)
- Create architectural improvement backlog (P2-P3 - future)
Coverage Impact:
- With fixed tests: Expect 10-15 percentage points improvement
- With architectural fixes: Expect 25-35 percentage points improvement
- Full implementation of ideal architecture: 80%+ coverage achievable
Status: ✅ Analysis Complete - Ready for Implementation Next Steps: Fix test imports and re-run for baseline coverage Document Version: 1.0 Last Updated: January 16, 2026