---
description: OOP Refactoring Plan - Architectural Consistency: **Date:** January 16, 2026 **Sprint:** Production Readiness - Phase 2B **Goal:** Refactor functional interface
---
Date: January 16, 2026 Sprint: Production Readiness - Phase 2B Goal: Refactor functional interfaces to OOP for consistency, testability, and maintainability
Problem: Framework uses OOP throughout except for 3 critical modules that use functional design:
models/registry.py- Dict + functions instead of classmemory/long_term.py- No class interfaceorchestration/meta_orchestrator.py- Private methods reduce testability
Solution: Refactor to OOP while maintaining 100% backward compatibility
Timeline: 2 days (16 hours) Impact: Enables 200+ architectural tests, improves testability, establishes consistency
- Memory System - Create proper class interfaces
- Blocking 70+ tests
- Critical for production
- Model Registry - Wrap functional interface
- Enables 50+ tests
- Improves testability significantly
- Meta-Orchestrator - Extract testable methods
- Enables remaining orchestration tests
- Better separation of concerns
Current State:
# memory/long_term.py
# No LongTermMemory class exists!
# File contains: SecurePattern, PatternMetadata, Classification
# memory/unified.py
class UnifiedMemory:
def __init__(self, ...):
self.short_term = RedisShortTermMemory(...)
self.long_term = ??? # What is this?Target State:
# memory/long_term.py
class LongTermMemory:
"""Persistent memory storage with classification."""
def __init__(self, storage_path: str = "./memory"):
self._storage_path = Path(storage_path)
self._cache: dict[str, Any] = {}
def store(self, key: str, data: dict, classification: str = "INTERNAL") -> None:
"""Store data with classification."""
def retrieve(self, key: str) -> dict | None:
"""Retrieve data by key."""
def delete(self, key: str) -> bool:
"""Delete data by key."""
def list_keys(self, classification: str | None = None) -> list[str]:
"""List all keys, optionally filtered by classification."""
# memory/unified.py
class UnifiedMemory:
"""Two-tier memory with short-term (Redis) and long-term (persistent)."""
def __init__(self, use_mock_redis: bool = False, storage_path: str = "./memory"):
self.short_term = RedisShortTermMemory(use_mock=use_mock_redis)
self.long_term = LongTermMemory(storage_path=storage_path)
def store(self, key: str, data: dict, ttl: int | None = None) -> None:
"""Store in appropriate tier based on TTL."""
if ttl:
self.short_term.stash(key, data, ttl=ttl)
else:
self.long_term.store(key, data)
def retrieve(self, key: str) -> dict | None:
"""Retrieve from short-term first, then long-term."""
result = self.short_term.retrieve(key)
if result is None:
result = self.long_term.retrieve(key)
return result
def promote_to_long_term(self, key: str) -> bool:
"""Promote short-term memory to long-term."""
data = self.short_term.retrieve(key)
if data:
self.long_term.store(key, data)
return True
return False
def delete(self, key: str) -> bool:
"""Delete from both tiers."""
st_deleted = self.short_term.delete(key)
lt_deleted = self.long_term.delete(key)
return st_deleted or lt_deletedImplementation Steps:
-
Create
LongTermMemoryclass (2 hours)# File: src/empathy_os/memory/long_term.py - Move existing SecurePattern, PatternMetadata to separate file - Create LongTermMemory class with CRUD operations - Use JSON file storage for MVP - Add classification support - Add comprehensive docstrings
-
Update
UnifiedMemory(1 hour)# File: src/empathy_os/memory/unified.py - Initialize LongTermMemory in __init__ - Implement store/retrieve/delete with tier logic - Add promote_to_long_term method - Add sync_tiers method - Document all public methods
-
Add backward compatibility shims (30 min)
# If old code expects different interface # Add compatibility functions
-
Enable memory architecture tests (30 min)
# tests/unit/memory/test_memory_architecture.py - Remove placeholder: LongTermMemory = None - Uncomment import - Run tests, fix failures
Success Criteria:
- ✅
LongTermMemoryclass exists with CRUD operations - ✅
UnifiedMemoryuses both tiers correctly - ✅ 70+ memory tests pass
- ✅ No breaking changes to existing code
Current State:
# models/registry.py (functional design)
MODEL_REGISTRY: dict[str, dict[str, ModelInfo]] = {
"anthropic": {
"cheap": ModelInfo(...),
"capable": ModelInfo(...),
"premium": ModelInfo(...),
},
"openai": {...},
}
def get_model(provider: str, tier: str) -> ModelInfo | None:
"""Get model by provider and tier."""
return MODEL_REGISTRY.get(provider, {}).get(tier)
def get_model_by_id(model_id: str) -> ModelInfo | None:
"""Get model by ID."""
for provider_models in MODEL_REGISTRY.values():
for model in provider_models.values():
if model.id == model_id:
return model
return NoneTarget State:
# models/registry.py (OOP design with backward compatibility)
class ModelRegistry:
"""Registry for LLM models with tier-based routing.
Provides OOP interface for model management while maintaining
backward compatibility with functional interface.
"""
def __init__(self, registry: dict | None = None):
"""Initialize registry with model definitions.
Args:
registry: Optional custom registry (defaults to MODEL_REGISTRY)
"""
self._registry = registry or MODEL_REGISTRY
self._by_id_cache: dict[str, ModelInfo] = {}
self._build_id_cache()
def _build_id_cache(self) -> None:
"""Build cache for fast ID lookup."""
for provider_models in self._registry.values():
for model in provider_models.values():
self._by_id_cache[model.id] = model
def get_model(self, provider: str, tier: str) -> ModelInfo | None:
"""Get model by provider and tier.
Args:
provider: Provider name (anthropic, openai, etc.)
tier: Model tier (cheap, capable, premium)
Returns:
ModelInfo if found, None otherwise
"""
return self._registry.get(provider, {}).get(tier)
def get_model_by_id(self, model_id: str) -> ModelInfo | None:
"""Get model by ID (fast O(1) lookup).
Args:
model_id: Model identifier (e.g., "claude-sonnet-3-5")
Returns:
ModelInfo if found, None otherwise
"""
return self._by_id_cache.get(model_id)
def get_all_models(self) -> dict[str, dict[str, ModelInfo]]:
"""Get all models grouped by provider and tier."""
return self._registry
def get_models_by_tier(self, tier: str) -> list[ModelInfo]:
"""Get all models for a specific tier across all providers.
Args:
tier: Model tier (cheap, capable, premium)
Returns:
List of ModelInfo for the tier
"""
models = []
for provider_models in self._registry.values():
if tier in provider_models:
models.append(provider_models[tier])
return models
def get_providers(self) -> list[str]:
"""Get list of available providers."""
return list(self._registry.keys())
def get_tiers(self) -> list[str]:
"""Get list of available tiers."""
tiers = set()
for provider_models in self._registry.values():
tiers.update(provider_models.keys())
return sorted(tiers)
# Backward compatibility - default instance
_default_registry = ModelRegistry()
def get_model(provider: str, tier: str) -> ModelInfo | None:
"""Get model by provider and tier (backward compatible).
Args:
provider: Provider name
tier: Model tier
Returns:
ModelInfo if found, None otherwise
"""
return _default_registry.get_model(provider, tier)
def get_model_by_id(model_id: str) -> ModelInfo | None:
"""Get model by ID (backward compatible).
Args:
model_id: Model identifier
Returns:
ModelInfo if found, None otherwise
"""
return _default_registry.get_model_by_id(model_id)Implementation Steps:
-
Create
ModelRegistryclass (2 hours)# File: src/empathy_os/models/registry.py - Define ModelRegistry class - Move logic from functions to methods - Add ID cache for performance - Add utility methods (get_providers, get_tiers, etc.) - Comprehensive docstrings
-
Add backward compatibility (30 min)
- Create _default_registry instance - Keep functional interface as wrappers - Ensure zero breaking changes
-
Enable model tests (30 min)
# tests/unit/models/test_execution_and_fallback_architecture.py - Remove placeholder: ModelRegistry = None - Import ModelRegistry - Update tests to use class - Run tests, fix failures
Success Criteria:
- ✅
ModelRegistryclass with full API - ✅ Functional interface still works (backward compatible)
- ✅ 50+ model tests pass
- ✅ Performance maintained (ID cache)
Current State:
# orchestration/meta_orchestrator.py
class MetaOrchestrator:
def analyze_and_compose(self, task: str, context: dict) -> ExecutionPlan:
"""Public entry point - does everything."""
requirements = self._analyze_task(task, context) # Private
agents = self._select_agents(requirements) # Private
strategy = self._choose_composition_pattern(requirements, agents) # Private
# Plan creation embedded here
plan = ExecutionPlan(
agents=agents,
strategy=strategy,
quality_gates=requirements.quality_gates,
estimated_cost=self._estimate_cost(agents),
estimated_duration=self._estimate_duration(agents, strategy),
)
return planTarget State:
# orchestration/meta_orchestrator.py
class MetaOrchestrator:
def analyze_and_compose(self, task: str, context: dict) -> ExecutionPlan:
"""Public entry point - orchestrates the full flow."""
requirements = self.analyze_task(task, context) # Now public
plan = self.create_execution_plan(requirements) # Extracted
return plan
def analyze_task(self, task: str, context: dict) -> TaskRequirements:
"""Analyze task to extract requirements (public for testing).
Args:
task: Task description
context: Optional context dictionary
Returns:
TaskRequirements with complexity, domain, capabilities
"""
if not task or not isinstance(task, str):
raise ValueError("task must be a non-empty string")
complexity = self._classify_complexity(task)
domain = self._classify_domain(task)
capabilities = self._extract_capabilities(domain, context)
return TaskRequirements(
complexity=complexity,
domain=domain,
capabilities_needed=capabilities,
parallelizable=self._is_parallelizable(task),
quality_gates=self._extract_quality_gates(task, context),
context=context,
)
def create_execution_plan(self, requirements: TaskRequirements) -> ExecutionPlan:
"""Create execution plan from analyzed requirements (testable).
Args:
requirements: Task requirements from analyze_task()
Returns:
ExecutionPlan with agents, strategy, costs
"""
agents = self._select_agents(requirements)
strategy = self._choose_composition_pattern(requirements, agents)
return ExecutionPlan(
agents=agents,
strategy=strategy,
quality_gates=requirements.quality_gates,
estimated_cost=self._estimate_cost(agents),
estimated_duration=self._estimate_duration(agents, strategy),
)
# Keep _classify_complexity, _classify_domain, etc. as private helpersImplementation Steps:
-
Extract
analyze_task()public method (1 hour)# File: src/empathy_os/orchestration/meta_orchestrator.py - Rename _analyze_task to analyze_task (public) - Keep internal helpers private (_classify_complexity, etc.) - Add comprehensive docstrings - Update analyze_and_compose to use public method -
Extract
create_execution_plan()method (1 hour)- Create public create_execution_plan(requirements) - Move plan creation logic from analyze_and_compose - Update analyze_and_compose to call it - Add docstrings
-
Enable orchestration tests (1 hour)
# tests/unit/orchestration/test_meta_orchestration_architecture.py - Update tests to use analyze_task() instead of _analyze_task() - Update tests to use create_execution_plan() - Remove @pytest.mark.skip decorators - Run tests, fix failures
Success Criteria:
- ✅
analyze_task()is public and testable - ✅
create_execution_plan()is extracted - ✅ 80+ orchestration tests pass
- ✅ No breaking changes to analyze_and_compose()
9:00 AM - 11:00 AM: Create LongTermMemory
- Create
src/empathy_os/memory/long_term.py(new implementation) - Define
LongTermMemoryclass with CRUD operations - Implement JSON file storage
- Add classification support
- Write comprehensive docstrings
11:00 AM - 12:00 PM: Update UnifiedMemory
- Update
src/empathy_os/memory/unified.py - Initialize LongTermMemory in constructor
- Implement tier logic (store, retrieve, delete)
- Add promote_to_long_term, sync_tiers methods
12:00 PM - 1:00 PM: Enable Memory Tests
- Lunch + fix test imports
- Enable memory architecture tests
- Run tests, fix failures
- Verify 70+ tests pass
1:00 PM - 3:00 PM: Create ModelRegistry
- Update
src/empathy_os/models/registry.py - Define
ModelRegistryclass - Implement all methods (get_model, get_model_by_id, etc.)
- Add ID cache for performance
- Write comprehensive docstrings
3:00 PM - 3:30 PM: Backward Compatibility
- Create _default_registry instance
- Keep functional interface as wrappers
- Test backward compatibility
3:30 PM - 5:00 PM: Enable Model Tests
- Enable model architecture tests
- Fix test failures
- Verify 50+ tests pass
- Run coverage report
9:00 AM - 10:00 AM: Extract analyze_task()
- Make
_analyze_task()public - Update analyze_and_compose to call it
- Add docstrings
10:00 AM - 11:00 AM: Extract create_execution_plan()
- Create
create_execution_plan()method - Move logic from analyze_and_compose
- Update analyze_and_compose to call it
11:00 AM - 1:00 PM: Enable Orchestration Tests
- Update tests to use public methods
- Remove @pytest.mark.skip decorators
- Fix test failures
- Verify 80+ tests pass
1:00 PM - 3:00 PM: Full Test Suite
- Run ALL 200+ architectural tests
- Fix remaining failures
- Verify coverage improvement
3:00 PM - 4:00 PM: Coverage Measurement
- Run pytest with coverage on all 3 modules
- Generate HTML report
- Document coverage improvements
4:00 PM - 5:00 PM: Documentation
- Update ARCHITECTURAL_GAPS_ANALYSIS.md (mark as resolved)
- Create REFACTORING_RESULTS.md
- Update CRITICAL_TEST_GAPS.md with new coverage
# Test that old code still works
# Memory
python -c "from empathy_os.memory.unified import UnifiedMemory; m = UnifiedMemory()"
# Models (functional interface)
python -c "from empathy_os.models.registry import get_model; print(get_model('anthropic', 'cheap'))"
# Orchestrator
python -c "from empathy_os.orchestration.meta_orchestrator import MetaOrchestrator; o = MetaOrchestrator(); p = o.analyze_and_compose('test task')"# Test new OOP interfaces
# Memory
python -c "from empathy_os.memory.long_term import LongTermMemory; m = LongTermMemory(); m.store('key', {'data': 'value'})"
# Models (class interface)
python -c "from empathy_os.models.registry import ModelRegistry; r = ModelRegistry(); print(r.get_model('anthropic', 'cheap'))"
# Orchestrator (public methods)
python -c "from empathy_os.orchestration.meta_orchestrator import MetaOrchestrator; o = MetaOrchestrator(); r = o.analyze_task('test task', {})"# Run all architectural tests
pytest tests/unit/orchestration/test_meta_orchestration_architecture.py -v
pytest tests/unit/memory/test_memory_architecture.py -v
pytest tests/unit/models/test_execution_and_fallback_architecture.py -v
# Expected results:
# - 200+ tests executable
# - 180+ tests passing (90%+)
# - 20 tests skipped (fallback policies, etc.)Mitigation:
- Keep all functional interfaces as wrappers
- Create _default_registry instances
- Comprehensive backward compatibility tests
Rollback Plan:
- All changes in feature branch
- Can revert easily if issues arise
- Old functional interfaces never removed
Mitigation:
- Add caching where needed (ModelRegistry ID cache)
- Profile before/after
- Benchmark critical paths
Acceptance:
- < 5% performance degradation acceptable
- If > 5%, optimize or revert
Mitigation:
- This is GOOD - find bugs now, not in production
- Fix bugs as discovered
- Document learnings
Acceptance:
- Finding bugs is success, not failure
- Better now than in production
Before Refactoring:
- Meta-orchestrator: 22.53%
- Memory (unified): 27.39%
- Memory (short-term): 18.80%
- Memory (long-term): N/A (no class)
- Models (registry): 60.87%
- Models (fallback): 21.07%
- Overall Critical Paths: ~25%
After Refactoring (Target):
- Meta-orchestrator: 75%+ (enable 80 tests)
- Memory (unified): 85%+ (enable 70 tests)
- Memory (long-term): 80%+ (new class)
- Models (registry): 85%+ (enable 50 tests)
- Overall Critical Paths: 80%+
Expected Overall Framework Coverage:
- Before: 54.67%
- After: 70-75% (+15-20 points)
- 200+ architectural tests written
- 180+ tests passing
- 20 tests skipped (documented gaps)
- 0 placeholder classes (ModelRegistry = None, etc.)
- ✅ OOP consistency across all modules
- ✅ Testable public APIs
- ✅ Backward compatibility maintained
- ✅ Clear separation of concerns
- ✅ Comprehensive documentation
-
src/empathy_os/memory/long_term.py(NEW)- LongTermMemory class with CRUD operations
-
src/empathy_os/memory/unified.py(MODIFIED)- Initialize LongTermMemory
- Complete tier logic
-
src/empathy_os/models/registry.py(MODIFIED)- ModelRegistry class
- Backward compatible functional wrappers
-
src/empathy_os/orchestration/meta_orchestrator.py(MODIFIED)- Public analyze_task()
- Extracted create_execution_plan()
-
tests/unit/memory/test_memory_architecture.py(ENABLED)- Remove placeholders
- Enable 70+ tests
-
tests/unit/models/test_execution_and_fallback_architecture.py(ENABLED)- Remove placeholders
- Enable 50+ tests
-
tests/unit/orchestration/test_meta_orchestration_architecture.py(ENABLED)- Remove @pytest.mark.skip
- Enable 80+ tests
-
docs/REFACTORING_RESULTS.md(NEW)- Before/after metrics
- Coverage improvements
- Lessons learned
-
docs/ARCHITECTURAL_GAPS_ANALYSIS.md(UPDATED)- Mark gaps as resolved
- Document solutions
-
docs/CRITICAL_TEST_GAPS.md(UPDATED)- Update coverage numbers
- Mark P0 items complete
- Create LongTermMemory
- Update UnifiedMemory
- Enable memory tests
- Checkpoint: 70+ memory tests passing
- Create ModelRegistry class
- Add backward compatibility
- Enable model tests
- Checkpoint: 50+ model tests passing
- Extract public methods
- Enable orchestration tests
- Checkpoint: 80+ orchestration tests passing
- Run full test suite (200+ tests)
- Measure coverage improvement
- Create documentation
- Checkpoint: Coverage at 70-75%, production ready
- Continue with Phase 3 of original sprint plan
- Test CLI integration
- Test workflow base class
- Test real tools integration
- Add FallbackPolicy class (P1 gap)
- Implement learning loop (P2)
- Add advanced caching (P2)
- Performance optimization (P3)
Status: ✅ Plan Complete - Ready for Execution Estimated Effort: 16 hours over 2 days Expected ROI: +15-20 percentage points coverage, 200+ tests enabled Risk Level: Low (backward compatible, can rollback)
Approval Required: YES Ready to Start: YES Next Action: Execute Day 1 Morning - Create LongTermMemory class