Commit 7af8835
committed
feat(framework): Implement Phase 4 core - agent-agnostic testing framework
ARCHITECTURE:
- Agent-agnostic design via adapter pattern
- Quality-based evaluation (BAD/OKish/Good grades)
- Separation: framework vs skill-specific tests
- Configurable quality thresholds
- Multi-agent support
CORE TYPES (200 lines):
- QualityGrade, QualityThresholds, QualityEvaluation
- SkillVerification with confidence levels
- AgentAdapter interface (IAgentAdapter)
- ExecutionRequest, ExecutionResult
- TestCase, TestSuite, TestRunResults
- RunConfig, ReportFormat
AGENT ADAPTER (125 lines):
- Abstract IAgentAdapter interface
- Base AgentAdapter class with helpers
- Retry logic support (isRetryableError, getRetryDelay)
- Rate limiting detection
- Token estimation utility
CLAUDE CODE ADAPTER (285 lines):
- Implements IAgentAdapter
- Wraps Claude CLI execution
- Skill loading and verification
- Heuristic detection (38 UI5 patterns)
- Automatic retry for timeouts/rate limits
- Zero cost (free Claude CLI)
QUALITY EVALUATOR (145 lines):
- Three dimensions: performance, triggering, correctness
- Configurable thresholds per dimension
- Overall grade = worst dimension (conservative)
- Detailed evaluation notes for BAD grades
- Supports negative tests (should NOT trigger)
TEST RUNNER (215 lines):
- Core orchestration class
- Multi-agent registry
- Test suite execution
- Category/tag filtering
- Quality evaluation integration
- Summary generation
- Cleanup management
PUBLIC API (65 lines):
- Clean exports of all public APIs
- TestRunner, AgentAdapter, ClaudeCodeAdapter
- QualityEvaluator, all types
- Ready for external consumption
CONFIGURATION:
- package.json for npm package
- tsconfig.json with strict mode
- ESM modules, Node 18+
BENEFITS:
- Agent-agnostic: easy to add Anthropic API, Cursor, etc.
- Quality grades: more nuanced than pass/fail
- Reusable: framework separate from tests
- Extensible: pluggable adapters and evaluators
- Type-safe: full TypeScript implementation
USAGE EXAMPLE:
```typescript
const runner = new TestRunner(evaluator);
runner.registerAgent(new ClaudeCodeAdapter());
await runner.loadSkill('/path/to/skill');
const results = await runner.run(suite, { agents: ['claude-code'] });
```
CODE METRICS:
- Framework: 1,035 lines (6 files)
- Configuration: 2 files
- Total: 8 new files
REMAINING WORK (Future):
- Anthropic API adapter (~6h)
- Cursor adapter (~6h)
- Advanced evaluators (~4h)
- Enhanced reporters (~4h)
- Integration and migration (~2h)
Total: ~22 hours
STATUS:
✅ Core architecture complete
⏳ Build verification pending
⏳ Example usage pending
⏳ Reporter implementation pending
See docs/PHASE_4_CORE_COMPLETE.md for full details.1 parent b57aec1 commit 7af8835
9 files changed
Lines changed: 1648 additions & 0 deletions
File tree
- plugins
- u5-guidelines/skill-test-framework
- ui5
- skill-test-framework
- src
- agents
- core
- evaluators
- types
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
0 commit comments