Skip to content

Commit 90d9ac6

Browse files
authored
refactor(12/12): add snapshot test fixtures and benchmarks (#330)
## Summary This is **PR 12 of 12**, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11. Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts. ### Snapshot test suite (159 files, ~5700 lines) The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch. **Test harness** (`src/snapshot-tests/`): - `harness.ts`: Core test runner that invokes tools with mock executors and captures rendered output - `fixture-io.ts`: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs) - `flowdeck-fixture-io.ts`: Flowdeck-specific fixture handling - `normalize.ts`: Output normalization for stable comparisons across environments - `resource-harness.ts`: Resource-specific snapshot testing **Fixtures**: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like. ### Benchmarks (~3300 lines) Performance benchmarks for the rendering pipeline and xcodebuild parsing: - Parser throughput: lines/second for xcodebuild output parsing - Render session performance: events/second for text and JSON strategies - End-to-end tool invocation timing These benchmarks establish baselines and can be run in CI to catch performance regressions. ### Note for reviewers This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure. ## Stack navigation - PR 1-11/12: All code and configuration changes - **PR 12/12** (this PR): Snapshot tests and benchmarks ## Test plan - [ ] `npx vitest run` passes -- snapshot tests match expected fixtures - [ ] `npx vitest run --config vitest.snapshot.config.ts` runs snapshot suite specifically - [ ] Benchmarks execute without errors (performance numbers are informational)
1 parent e18c96c commit 90d9ac6

244 files changed

Lines changed: 9661 additions & 476 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-1.stderr.txt

Whitespace-only changes.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
^DUsing scheme test configuration: Debug
2+
🧪 Test: CalculatorApp
3+
Workspace: /Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/example_projects/iOS_Calculator/CalculatorApp.xcworkspace
4+
Configuration: Debug
5+
Target: iPhone 17 Pro
6+
🧪 Finding available tests...
7+
Resolved to 21 test(s):
8+
- CalculatorAppTests/CalculatorAppTests/testAppLaunch
9+
- CalculatorAppTests/CalculatorAppTests/testCalculationPerformance
10+
- CalculatorAppTests/CalculatorAppTests/testCalculatorOperationsEnum
11+
- CalculatorAppTests/CalculatorAppTests/testCalculatorServiceBasicOperation
12+
- CalculatorAppTests/CalculatorAppTests/testCalculatorServiceChainedOperations
13+
... and 16 more
14+
🧪 Running tests (0, 0 failures)
15+
🧪 Running tests (1, 0 failures)
16+
🧪 Running tests (2, 0 failures)
17+
🧪 Running tests (3, 0 failures)
18+
🧪 Running tests (4, 0 failures)
19+
🧪 Running tests (5, 0 failures)
20+
🧪 Running tests (6, 0 failures)
21+
🧪 Running tests (7, 0 failures)/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/example_projects/iOS_Calculator/CalculatorAppTests/CalculatorAppTests.swift:52: error: -[CalculatorAppTests.CalculatorAppTests testCalculatorServiceFailure] : XCTAssertEqual failed: ("0") is not equal to ("999") - This test should fail - display should be 0, not 999
22+
🧪 Running tests (8, 1 failure)
23+
🧪 Running tests (9, 1 failure)
24+
🧪 Running tests (10, 1 failure)
25+
🧪 Running tests (11, 1 failure)
26+
🧪 Running tests (12, 1 failure)
27+
🧪 Running tests (13, 1 failure)
28+
🧪 Running tests (14, 1 failure)
29+
🧪 Running tests (15, 1 failure)
30+
🧪 Running tests (16, 1 failure)
31+
🧪 Running tests (17, 1 failure)
32+
🧪 Running tests (18, 1 failure)
33+
🧪 Running tests (19, 1 failure)
34+
🧪 Running tests (20, 1 failure)
35+
🧪 Running tests (21, 1 failure)
36+
Failed Tests
37+
CalculatorAppTests
38+
✗ testCalculatorServiceFailure (0.003s)
39+
└─ XCTAssertEqual failed: ("0") is not
40+
equal to ("999") - This test should fail
41+
- display should be 0, not 999
42+
(CalculatorAppTests.swift:52)
43+
Test Summary
44+
╭───────────────────────────────────────╮
45+
│ Total: 21 │
46+
│ Passed: 20 │
47+
│ Failed: 1 │
48+
│ Skipped: 0 │
49+
│ Duration: 27.96s │
50+
╰───────────────────────────────────────╯
51+
✗ 1 test(s) failed.

benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-2.stderr.txt

Whitespace-only changes.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
^DUsing scheme test configuration: Debug
2+
🧪 Test: CalculatorApp
3+
Workspace: /Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/example_projects/iOS_Calculator/CalculatorApp.xcworkspace
4+
Configuration: Debug
5+
Target: iPhone 17 Pro
6+
🧪 Finding available tests...
7+
Resolved to 21 test(s):
8+
- CalculatorAppTests/CalculatorAppTests/testAppLaunch
9+
- CalculatorAppTests/CalculatorAppTests/testCalculationPerformance
10+
- CalculatorAppTests/CalculatorAppTests/testCalculatorOperationsEnum
11+
- CalculatorAppTests/CalculatorAppTests/testCalculatorServiceBasicOperation
12+
- CalculatorAppTests/CalculatorAppTests/testCalculatorServiceChainedOperations
13+
... and 16 more
14+
🧪 Running tests (0, 0 failures)
15+
🧪 Running tests (1, 0 failures)
16+
🧪 Running tests (2, 0 failures)
17+
🧪 Running tests (3, 0 failures)
18+
🧪 Running tests (4, 0 failures)
19+
🧪 Running tests (5, 0 failures)
20+
🧪 Running tests (6, 0 failures)
21+
🧪 Running tests (7, 0 failures)/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/example_projects/iOS_Calculator/CalculatorAppTests/CalculatorAppTests.swift:52: error: -[CalculatorAppTests.CalculatorAppTests testCalculatorServiceFailure] : XCTAssertEqual failed: ("0") is not equal to ("999") - This test should fail - display should be 0, not 999
22+
🧪 Running tests (8, 1 failure)
23+
🧪 Running tests (9, 1 failure)
24+
🧪 Running tests (10, 1 failure)
25+
🧪 Running tests (11, 1 failure)
26+
🧪 Running tests (12, 1 failure)
27+
🧪 Running tests (13, 1 failure)
28+
🧪 Running tests (14, 1 failure)
29+
🧪 Running tests (15, 1 failure)
30+
🧪 Running tests (16, 1 failure)
31+
🧪 Running tests (17, 1 failure)
32+
🧪 Running tests (18, 1 failure)
33+
🧪 Running tests (19, 1 failure)
34+
🧪 Running tests (20, 1 failure)
35+
🧪 Running tests (21, 1 failure)
36+
Failed Tests
37+
CalculatorAppTests
38+
✗ testCalculatorServiceFailure (0.009s)
39+
└─ XCTAssertEqual failed: ("0") is not
40+
equal to ("999") - This test should fail
41+
- display should be 0, not 999
42+
(CalculatorAppTests.swift:52)
43+
Test Summary
44+
╭───────────────────────────────────────╮
45+
│ Total: 21 │
46+
│ Passed: 20 │
47+
│ Failed: 1 │
48+
│ Skipped: 0 │
49+
│ Duration: 20.31s │
50+
╰───────────────────────────────────────╯
51+
✗ 1 test(s) failed.

benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-3.stderr.txt

Whitespace-only changes.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
^DUsing scheme test configuration: Debug
2+
🧪 Test: CalculatorApp
3+
Workspace: /Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/example_projects/iOS_Calculator/CalculatorApp.xcworkspace
4+
Configuration: Debug
5+
Target: iPhone 17 Pro
6+
🧪 Finding available tests...
7+
Resolved to 21 test(s):
8+
- CalculatorAppTests/CalculatorAppTests/testAppLaunch
9+
- CalculatorAppTests/CalculatorAppTests/testCalculationPerformance
10+
- CalculatorAppTests/CalculatorAppTests/testCalculatorOperationsEnum
11+
- CalculatorAppTests/CalculatorAppTests/testCalculatorServiceBasicOperation
12+
- CalculatorAppTests/CalculatorAppTests/testCalculatorServiceChainedOperations
13+
... and 16 more
14+
🧪 Running tests (0, 0 failures)
15+
🧪 Running tests (1, 0 failures)
16+
🧪 Running tests (2, 0 failures)
17+
🧪 Running tests (3, 0 failures)
18+
🧪 Running tests (4, 0 failures)
19+
🧪 Running tests (5, 0 failures)
20+
🧪 Running tests (6, 0 failures)
21+
🧪 Running tests (7, 0 failures)/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/example_projects/iOS_Calculator/CalculatorAppTests/CalculatorAppTests.swift:52: error: -[CalculatorAppTests.CalculatorAppTests testCalculatorServiceFailure] : XCTAssertEqual failed: ("0") is not equal to ("999") - This test should fail - display should be 0, not 999
22+
🧪 Running tests (8, 1 failure)
23+
🧪 Running tests (9, 1 failure)
24+
🧪 Running tests (10, 1 failure)
25+
🧪 Running tests (11, 1 failure)
26+
🧪 Running tests (12, 1 failure)
27+
🧪 Running tests (13, 1 failure)
28+
🧪 Running tests (14, 1 failure)
29+
🧪 Running tests (15, 1 failure)
30+
🧪 Running tests (16, 1 failure)
31+
🧪 Running tests (17, 1 failure)
32+
🧪 Running tests (18, 1 failure)
33+
🧪 Running tests (19, 1 failure)
34+
🧪 Running tests (20, 1 failure)
35+
🧪 Running tests (21, 1 failure)
36+
Failed Tests
37+
CalculatorAppTests
38+
✗ testCalculatorServiceFailure (0.009s)
39+
└─ XCTAssertEqual failed: ("0") is not
40+
equal to ("999") - This test should fail
41+
- display should be 0, not 999
42+
(CalculatorAppTests.swift:52)
43+
Test Summary
44+
╭───────────────────────────────────────╮
45+
│ Total: 21 │
46+
│ Passed: 20 │
47+
│ Failed: 1 │
48+
│ Skipped: 0 │
49+
│ Duration: 16.44s │
50+
╰───────────────────────────────────────╯
51+
✗ 1 test(s) failed.
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
{
2+
"generatedAt": "2026-03-17T14:20:36.285Z",
3+
"mode": "warm",
4+
"iterations": 3,
5+
"workspacePath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/example_projects/iOS_Calculator/CalculatorApp.xcworkspace",
6+
"results": [
7+
{
8+
"tool": "xcodebuildmcp",
9+
"iteration": 1,
10+
"exitCode": 1,
11+
"wallClockMs": 29067.379917000002,
12+
"firstStdoutMs": 2.148291999999998,
13+
"firstMilestoneMs": 2152.612708,
14+
"startupToFirstStreamedTestProgressMs": 13020.933500000001,
15+
"stdoutPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/xcodebuildmcp-run-1.stdout.txt",
16+
"stderrPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/xcodebuildmcp-run-1.stderr.txt"
17+
},
18+
{
19+
"tool": "flowdeck",
20+
"iteration": 1,
21+
"exitCode": 1,
22+
"wallClockMs": 28296.29575,
23+
"firstStdoutMs": 3.5727919999990263,
24+
"firstMilestoneMs": 12480.404625,
25+
"startupToFirstStreamedTestProgressMs": 12480.409000000003,
26+
"stdoutPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-1.stdout.txt",
27+
"stderrPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-1.stderr.txt"
28+
},
29+
{
30+
"tool": "xcodebuildmcp",
31+
"iteration": 2,
32+
"exitCode": 1,
33+
"wallClockMs": 20358.631999999998,
34+
"firstStdoutMs": 3.855666999996174,
35+
"firstMilestoneMs": 1894.4525829999984,
36+
"startupToFirstStreamedTestProgressMs": 6474.262499999997,
37+
"stdoutPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/xcodebuildmcp-run-2.stdout.txt",
38+
"stderrPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/xcodebuildmcp-run-2.stderr.txt"
39+
},
40+
{
41+
"tool": "flowdeck",
42+
"iteration": 2,
43+
"exitCode": 1,
44+
"wallClockMs": 20567.050875,
45+
"firstStdoutMs": 3.934166000006371,
46+
"firstMilestoneMs": 5885.525833000007,
47+
"startupToFirstStreamedTestProgressMs": 5885.5267500000045,
48+
"stdoutPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-2.stdout.txt",
49+
"stderrPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-2.stderr.txt"
50+
},
51+
{
52+
"tool": "xcodebuildmcp",
53+
"iteration": 3,
54+
"exitCode": 1,
55+
"wallClockMs": 21910.729708,
56+
"firstStdoutMs": 3.3832499999989523,
57+
"firstMilestoneMs": 2140.4143329999933,
58+
"startupToFirstStreamedTestProgressMs": 6239.000874999998,
59+
"stdoutPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/xcodebuildmcp-run-3.stdout.txt",
60+
"stderrPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/xcodebuildmcp-run-3.stderr.txt"
61+
},
62+
{
63+
"tool": "flowdeck",
64+
"iteration": 3,
65+
"exitCode": 1,
66+
"wallClockMs": 16693.48666699999,
67+
"firstStdoutMs": 3.411791999998968,
68+
"firstMilestoneMs": 5152.938666999995,
69+
"startupToFirstStreamedTestProgressMs": 5152.9394579999935,
70+
"stdoutPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-3.stdout.txt",
71+
"stderrPath": "/Users/cameroncooke/.codex/worktrees/43f4/XcodeBuildMCP/benchmarks/simulator-test/2026-03-17T14-18-19-390Z/flowdeck-run-3.stderr.txt"
72+
}
73+
]
74+
}

benchmarks/simulator-test/2026-03-17T14-18-19-390Z/xcodebuildmcp-run-1.stderr.txt

Whitespace-only changes.

0 commit comments

Comments
 (0)