Skip to content

Commit 7af8835

Browse files
committed
feat(framework): Implement Phase 4 core - agent-agnostic testing framework
ARCHITECTURE: - Agent-agnostic design via adapter pattern - Quality-based evaluation (BAD/OKish/Good grades) - Separation: framework vs skill-specific tests - Configurable quality thresholds - Multi-agent support CORE TYPES (200 lines): - QualityGrade, QualityThresholds, QualityEvaluation - SkillVerification with confidence levels - AgentAdapter interface (IAgentAdapter) - ExecutionRequest, ExecutionResult - TestCase, TestSuite, TestRunResults - RunConfig, ReportFormat AGENT ADAPTER (125 lines): - Abstract IAgentAdapter interface - Base AgentAdapter class with helpers - Retry logic support (isRetryableError, getRetryDelay) - Rate limiting detection - Token estimation utility CLAUDE CODE ADAPTER (285 lines): - Implements IAgentAdapter - Wraps Claude CLI execution - Skill loading and verification - Heuristic detection (38 UI5 patterns) - Automatic retry for timeouts/rate limits - Zero cost (free Claude CLI) QUALITY EVALUATOR (145 lines): - Three dimensions: performance, triggering, correctness - Configurable thresholds per dimension - Overall grade = worst dimension (conservative) - Detailed evaluation notes for BAD grades - Supports negative tests (should NOT trigger) TEST RUNNER (215 lines): - Core orchestration class - Multi-agent registry - Test suite execution - Category/tag filtering - Quality evaluation integration - Summary generation - Cleanup management PUBLIC API (65 lines): - Clean exports of all public APIs - TestRunner, AgentAdapter, ClaudeCodeAdapter - QualityEvaluator, all types - Ready for external consumption CONFIGURATION: - package.json for npm package - tsconfig.json with strict mode - ESM modules, Node 18+ BENEFITS: - Agent-agnostic: easy to add Anthropic API, Cursor, etc. - Quality grades: more nuanced than pass/fail - Reusable: framework separate from tests - Extensible: pluggable adapters and evaluators - Type-safe: full TypeScript implementation USAGE EXAMPLE: ```typescript const runner = new TestRunner(evaluator); runner.registerAgent(new ClaudeCodeAdapter()); await runner.loadSkill('/path/to/skill'); const results = await runner.run(suite, { agents: ['claude-code'] }); ``` CODE METRICS: - Framework: 1,035 lines (6 files) - Configuration: 2 files - Total: 8 new files REMAINING WORK (Future): - Anthropic API adapter (~6h) - Cursor adapter (~6h) - Advanced evaluators (~4h) - Enhanced reporters (~4h) - Integration and migration (~2h) Total: ~22 hours STATUS: ✅ Core architecture complete ⏳ Build verification pending ⏳ Example usage pending ⏳ Reporter implementation pending See docs/PHASE_4_CORE_COMPLETE.md for full details.
1 parent b57aec1 commit 7af8835

9 files changed

Lines changed: 1648 additions & 0 deletions

File tree

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"compilerOptions": {
3+
"target": "ES2022",
4+
"module": "ESNext",
5+
"lib": ["ES2022"],
6+
"moduleResolution": "node",
7+
"outDir": "./dist",
8+
"rootDir": "./src",
9+
"declaration": true,
10+
"declarationMap": true,
11+
"sourceMap": true,
12+
"strict": true,
13+
"esModuleInterop": true,
14+
"skipLibCheck": true,
15+
"forceConsistentCasingInFileNames": true,
16+
"resolveJsonModule": true,
17+
"allowSyntheticDefaultImports": true
18+
},
19+
"include": ["src/**/*"],
20+
"exclude": ["node_modules", "dist"]
21+
}

0 commit comments

Comments
 (0)