Commit adb6c7e
test: 40 real tool execution simulations with mocked Dispatcher
Spawns actual tool execute() functions (not mocked) with registered
Dispatcher handlers to simulate real user tool invocations:
Warehouse Add (18 scenarios):
- 8 warehouse types with post-connect suggestions
- Schema indexed/not-indexed variations
- Multi-warehouse data_diff suggestion
- Failure modes: add fails, throws, missing type
- Resilience: schema.cache_status fails, warehouse.list fails
- Timeout: slow dispatcher (3s) races against 1.5s timeout
SQL Execute (6 scenarios):
- First call gets suggestion, subsequent calls deduped
- 10 consecutive calls — only first has hint
- Failure and empty result handling
- Blocked query (DROP DATABASE) throws
SQL Analyze (4 scenarios):
- First call suggests schema_inspect, second deduped
- Parse error and analyzer failure handling
Schema Inspect (3 scenarios):
- First call suggests lineage_check, second deduped
- Failure handling
Schema Index (3 scenarios):
- Lists all capabilities on first call
- Dedup on second, failure handling
Full User Journeys (4 scenarios):
- Complete 5-tool chain: warehouse_add → schema_index → sql_execute → sql_analyze → schema_inspect
- 20 repeated queries with dedup verification
- Interleaved tool calls with independent dedup
- All dispatchers failing — warehouse add still succeeds
Performance (2 scenarios):
- Warehouse add < 500ms with fast dispatchers
- 50 consecutive sql_execute < 2s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 3b78d42 commit adb6c7e
1 file changed
Lines changed: 573 additions & 0 deletions
0 commit comments