Created: January 30, 2026 Status: Planning Phase Goal: Refactor remaining 6 files from top-10 list to enable comprehensive test coverage
This document outlines refactoring strategies for the 6 remaining files in the top-10 largest files list. Combined, these files contain 9,878 lines of code that need to be broken down into testable modules (<500 lines each).
Target Files:
- telemetry/cli.py (1,936 lines) - CLI command implementations
- workflows/test_gen.py (1,917 lines) - Test generation workflow
- meta_workflows/cli_meta_workflows.py (1,809 lines) - Meta-workflow CLI
- models/telemetry.py (1,660 lines) - Telemetry data models
- workflows/document_gen.py (1,605 lines) - Documentation generation
- core.py (1,511 lines) - Core framework functionality
Combined Impact:
- Current: 9,878 total lines across 6 files
- Target: ~60+ focused modules (<500 lines each)
- Expected reduction: 70-85% in core files
- Test coverage gain: 500+ new behavioral tests
Function Analysis:
cmd_file_test_dashboard 426 lines (22%)
cmd_telemetry_dashboard 264 lines (14%)
cmd_file_test_status 184 lines (9%)
cmd_telemetry_cache_stats 127 lines (7%)
cmd_telemetry_compare 120 lines (6%)
cmd_sonnet_opus_analysis 116 lines (6%)
cmd_telemetry_show 108 lines (6%)
cmd_telemetry_export 102 lines (5%)
[+ 7 more smaller commands]
Key Issues:
- Massive HTML templates embedded in dashboard functions (690 lines total)
- 15 command functions with no logical grouping
- Duplicate _validate_file_path utility (exists in config.py)
New Modules:
-
Remove duplicate validation
- Delete _validate_file_path (lines 30-69)
- Import from empathy_os.config instead
-
telemetry/commands/core_commands.py (~250 lines)
- cmd_telemetry_show
- cmd_telemetry_savings
- cmd_telemetry_reset
- Dependencies: UsageTracker, rich (optional)
-
telemetry/commands/export_commands.py (~150 lines)
- cmd_telemetry_export
- Export to CSV/JSON functionality
- Dependencies: csv, json, Path validation
-
telemetry/commands/cache_commands.py (~150 lines)
- cmd_telemetry_cache_stats
- Cache hit/miss analysis
- Dependencies: UsageTracker, rich
-
telemetry/commands/compare_commands.py (~130 lines)
- cmd_telemetry_compare
- Comparison functionality between time periods
- Dependencies: UsageTracker, rich, datetime
-
telemetry/commands/analysis_commands.py (~300 lines)
- cmd_sonnet_opus_analysis
- cmd_agent_performance
- Dependencies: TelemetryAnalytics, rich
-
telemetry/commands/status_commands.py (~400 lines)
- cmd_tier1_status
- cmd_task_routing_report
- cmd_test_status
- Dependencies: TelemetryAnalytics, rich
-
telemetry/commands/dashboard_commands.py (~900 lines)
- cmd_telemetry_dashboard (HTML template)
- cmd_file_test_dashboard (HTML template)
- cmd_file_test_status
- Dependencies: tempfile, webbrowser, Counter
-
telemetry/cli.py (updated, ~50 lines)
- Import all commands from submodules
- Command registry/routing
- Backward compatibility via re-exports
Before:
cli.py (1,936 lines) - monolithic
After:
cli.py (50 lines) - router
commands/
├── core_commands.py (250 lines)
├── export_commands.py (150 lines)
├── cache_commands.py (150 lines)
├── compare_commands.py (130 lines)
├── analysis_commands.py (300 lines)
├── status_commands.py (400 lines)
└── dashboard_commands.py (900 lines)
Impact:
- 97% line reduction in main file
- Each module <500 lines (testable)
- Clear separation by functionality
- Reusable command components
Complexity:
- Single monolithic test generation workflow
- AST parsing and analysis
- Template rendering
- Multiple test patterns
- Failed automated test generation (too complex)
New Modules:
-
workflows/test_gen/ast_analyzer.py (~300 lines)
- AST parsing and function extraction
- Complexity analysis
- Dependency detection
-
workflows/test_gen/test_templates.py (~400 lines)
- Template definitions for different test types
- Parametrized test generation
- Fixture templates
-
workflows/test_gen/test_patterns.py (~300 lines)
- Pattern matching for test types
- Edge case detection
- Test case generation logic
-
workflows/test_gen/code_generator.py (~250 lines)
- Code generation from templates
- Import management
- Formatting and validation
-
workflows/test_gen/validation.py (~200 lines)
- Syntax validation
- pytest collection validation
- AST verification
-
workflows/test_gen/workflow.py (~400 lines)
- Main TestGenerationWorkflow class
- Orchestrates all components
- Tier routing logic
-
workflows/test_gen/init.py (~50 lines)
- Backward compatible imports
- Public API exports
Before:
test_gen.py (1,917 lines) - monolithic workflow
After:
test_gen/
├── __init__.py (50 lines)
├── workflow.py (400 lines) - orchestration
├── ast_analyzer.py (300 lines)
├── test_templates.py (400 lines)
├── test_patterns.py (300 lines)
├── code_generator.py (250 lines)
└── validation.py (200 lines)
Impact:
- 79% reduction in main workflow file
- Testable components in isolation
- Easier to extend with new patterns
- Better error handling
Complexity:
- CLI interface for meta-workflows
- Multiple command handlers
- Workflow orchestration
- Similar structure to telemetry/cli.py
New Modules:
-
meta_workflows/commands/workflow_commands.py (~300 lines)
- Core workflow execution commands
- Workflow listing and status
-
meta_workflows/commands/orchestration_commands.py (~350 lines)
- Multi-agent orchestration commands
- Coordination pattern commands
-
meta_workflows/commands/analysis_commands.py (~250 lines)
- Workflow analysis and reporting
- Performance metrics
-
meta_workflows/commands/config_commands.py (~200 lines)
- Configuration management
- Template management
-
meta_workflows/commands/interactive_commands.py (~400 lines)
- Interactive workflow creation
- Socratic questioning interface
-
meta_workflows/cli_meta_workflows.py (updated, ~200 lines)
- Command routing
- Imports from submodules
- Backward compatibility
Before:
cli_meta_workflows.py (1,809 lines)
After:
cli_meta_workflows.py (200 lines)
commands/
├── workflow_commands.py (300 lines)
├── orchestration_commands.py (350 lines)
├── analysis_commands.py (250 lines)
├── config_commands.py (200 lines)
└── interactive_commands.py (400 lines)
Impact:
- 89% reduction in main CLI file
- Modular command structure
- Easier to add new commands
- Better testability
Complexity:
- Data models for telemetry
- Analytics classes
- Storage interfaces
- Statistics calculations
New Modules:
-
models/telemetry/data_models.py (~300 lines)
- Core dataclasses (TelemetryEntry, etc.)
- Validation logic
- Serialization methods
-
models/telemetry/analytics.py (~400 lines)
- TelemetryAnalytics class
- Statistical calculations
- Aggregation logic
-
models/telemetry/storage.py (~250 lines)
- Storage interface
- Persistence logic
- Query methods
-
models/telemetry/tier1_analytics.py (~300 lines)
- Tier 1 specific analytics
- Task routing analysis
- Test execution metrics
-
models/telemetry/reporting.py (~250 lines)
- Report generation
- Data formatting
- Export utilities
-
models/telemetry/init.py (~150 lines)
- Backward compatible imports
- Public API
- Factory functions
Before:
telemetry.py (1,660 lines)
After:
telemetry/
├── __init__.py (150 lines)
├── data_models.py (300 lines)
├── analytics.py (400 lines)
├── storage.py (250 lines)
├── tier1_analytics.py (300 lines)
└── reporting.py (250 lines)
Impact:
- 91% reduction in main file
- Clear separation of concerns
- Independent model testing
- Easier to extend
Complexity:
- Documentation generation workflow
- Multiple documentation types
- Template rendering
- File I/O operations
New Modules:
-
workflows/document_gen/code_analyzer.py (~300 lines)
- Code analysis for documentation
- Docstring extraction
- API discovery
-
workflows/document_gen/doc_templates.py (~350 lines)
- Documentation templates
- Markdown generation
- Format utilities
-
workflows/document_gen/api_docs.py (~250 lines)
- API documentation generation
- Function/class documentation
- Parameter documentation
-
workflows/document_gen/tutorial_gen.py (~250 lines)
- Tutorial generation
- Example extraction
- Step-by-step guides
-
workflows/document_gen/mkdocs_integration.py (~200 lines)
- MkDocs configuration
- Navigation generation
- Site structure
-
workflows/document_gen/workflow.py (~200 lines)
- Main DocumentGenerationWorkflow
- Orchestrates components
- Tier routing
-
workflows/document_gen/init.py (~50 lines)
- Backward compatible imports
Before:
document_gen.py (1,605 lines)
After:
document_gen/
├── __init__.py (50 lines)
├── workflow.py (200 lines)
├── code_analyzer.py (300 lines)
├── doc_templates.py (350 lines)
├── api_docs.py (250 lines)
├── tutorial_gen.py (250 lines)
└── mkdocs_integration.py (200 lines)
Impact:
- 88% reduction in main workflow
- Reusable documentation components
- Easier to add new doc types
- Better separation of concerns
Complexity:
- Core framework functionality
- Already has 41 behavioral tests
- Mixed concerns
- Needs additional cleanup
Note: This file already has test coverage, so refactoring is lower priority but still valuable.
New Modules:
-
core/framework_init.py (~250 lines)
- Framework initialization
- Configuration loading
- Environment setup
-
core/workflow_base.py (~300 lines)
- Base workflow classes
- Common workflow patterns
- Abstract interfaces
-
core/tier_routing.py (~200 lines)
- Tier routing logic
- Cost optimization
- Model selection
-
core/agent_coordination.py (~250 lines)
- Agent coordination patterns
- Communication protocols
- State management
-
core/utilities.py (~300 lines)
- Utility functions
- Helper methods
- Common operations
-
core.py (updated, ~200 lines)
- Main entry point
- Imports from submodules
- Backward compatibility
Before:
core.py (1,511 lines)
After:
core.py (200 lines)
core/
├── framework_init.py (250 lines)
├── workflow_base.py (300 lines)
├── tier_routing.py (200 lines)
├── agent_coordination.py (250 lines)
└── utilities.py (300 lines)
Impact:
- 87% reduction in main file
- Clearer framework structure
- Better testability
- Easier to understand
Priority 1 - Largest Files with Embedded Content:
- ✅ telemetry/cli.py - Extract dashboard functions first (690 lines, 36% reduction)
- models/telemetry.py - Separate models from analytics (clean separation)
Estimated Time: 2-3 hours Expected Tests: 100+ new behavioral tests
Priority 2 - Complex Workflows: 3. workflows/test_gen.py - Extract AST analysis and templates 4. workflows/document_gen.py - Extract documentation components
Estimated Time: 4-5 hours Expected Tests: 150+ new behavioral tests
Priority 3 - CLI Interfaces: 5. meta_workflows/cli_meta_workflows.py - Extract command handlers 6. core.py - Final cleanup and organization
Estimated Time: 3-4 hours Expected Tests: 50+ new behavioral tests
-
Import Validation:
python -c "from empathy_os.[module] import *; print('✅ Imports work')" -
Run Existing Tests:
pytest tests/unit/[module]/ -v pytest tests/behavioral/generated/ -k [module] -v
-
Generate New Tests:
python -c "from empathy_os.workflows.autonomous_test_gen import AutonomousTestGenerator; \ gen = AutonomousTestGenerator('phase', 1, [{'file': 'path/to/new/module.py'}]); \ gen.generate_all()"
-
Line Count Verification:
wc -l src/empathy_os/[module]/**/*.py | sort -n
| Metric | Before | After | Change |
|---|---|---|---|
| Files Refactored | 6 monolithic | ~60 focused | +900% modules |
| Total Lines (main) | 9,878 | ~900 | -91% |
| Largest File | 1,936 lines | <500 lines | -75% |
| Avg File Size | 1,646 lines | <250 lines | -85% |
| New Tests | 41 (core only) | 500+ | +1,100% |
Modularity:
- Every module <500 lines (testable by automated generator)
- Clear separation of concerns
- Focused responsibilities
Testability:
- 500+ new behavioral tests
- Independent component testing
- Better coverage of edge cases
Maintainability:
- Clear module boundaries
- Easier to navigate codebase
- Simpler to onboard new contributors
Performance:
- No performance regressions (same behavior)
- Potential for better caching (smaller modules)
- Easier profiling and optimization
# Restore original file
git restore src/empathy_os/[module]/[file].py
# Remove extracted modules
rm -rf src/empathy_os/[module]/[extracted_dir]/
# Re-run tests to verify
pytest tests/unit/[module]/ -v- One file at a time - Complete and validate each before moving to next
- Frequent commits - Commit after each successful extraction
- Test after every change - Never commit without passing tests
- Backup important files - Keep .backup copies during refactoring
- All existing tests pass without modification
- All imports work from original locations
- No behavior changes in any functionality
- All CLI commands still work identically
- All files <500 lines (testable)
- Automated test generator succeeds on all new modules
- Test coverage maintained or increased
- No performance degradation
- All modules have docstrings
- Clear module organization
- No linting errors
- Consistent with established patterns
-
Extract telemetry dashboard functions (quick win)
- Create telemetry/commands/dashboard_commands.py
- Update telemetry/cli.py imports
- Run tests and commit
-
Refactor models/telemetry.py (clean separation)
- Extract data models
- Extract analytics
- Run tests and commit
-
Continue with remaining files following the plan
- Complete all 6 files within 2-3 sessions
- Generate 500+ new behavioral tests
- Achieve 90%+ test coverage across all modules
- Document patterns for future refactoring
Document Version: 1.0 Created: January 30, 2026 Author: Autonomous Refactoring Agent Status: 📋 READY FOR IMPLEMENTATION