Skip to content

Commit adb6c7e

Browse files
anandgupta42claude
andcommitted
test: 40 real tool execution simulations with mocked Dispatcher
Spawns actual tool execute() functions (not mocked) with registered Dispatcher handlers to simulate real user tool invocations: Warehouse Add (18 scenarios): - 8 warehouse types with post-connect suggestions - Schema indexed/not-indexed variations - Multi-warehouse data_diff suggestion - Failure modes: add fails, throws, missing type - Resilience: schema.cache_status fails, warehouse.list fails - Timeout: slow dispatcher (3s) races against 1.5s timeout SQL Execute (6 scenarios): - First call gets suggestion, subsequent calls deduped - 10 consecutive calls — only first has hint - Failure and empty result handling - Blocked query (DROP DATABASE) throws SQL Analyze (4 scenarios): - First call suggests schema_inspect, second deduped - Parse error and analyzer failure handling Schema Inspect (3 scenarios): - First call suggests lineage_check, second deduped - Failure handling Schema Index (3 scenarios): - Lists all capabilities on first call - Dedup on second, failure handling Full User Journeys (4 scenarios): - Complete 5-tool chain: warehouse_add → schema_index → sql_execute → sql_analyze → schema_inspect - 20 repeated queries with dedup verification - Interleaved tool calls with independent dedup - All dispatchers failing — warehouse add still succeeds Performance (2 scenarios): - Warehouse add < 500ms with fast dispatchers - 50 consecutive sql_execute < 2s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3b78d42 commit adb6c7e

1 file changed

Lines changed: 573 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)