refactor(tests): modular mock-free test suite with cross-platform CI#13
Merged
Conversation
- Remove the Windows exclusion guard from E2E test step in CI workflow - The process-isolated child-process harness uses only cross-platform Node.js APIs (spawn, readline, stdio pipes) and is verified to work on Windows - Add .gitattributes to enforce LF line endings on snapshot golden files - Add normalizeEOL() helper in snapshot tests for Windows CRLF handling - Add E2E_TIMEOUT_MS env var support for configurable test timeouts - Add engines.node >=22 to package.json - Add tsconfig.json for type checking - Update CI documentation in CONTRIBUTING.md Closes #12
grzegorznowak
approved these changes
Jun 13, 2026
grzegorznowak
left a comment
Collaborator
There was a problem hiding this comment.
great work, no major findings @ofriw 💪🏾
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: This PR was generated by an AI agent. If you'd like to talk with other humans, drop by our Discord!
What
This PR replaces the single monolithic 3,840-line test file with per-module unit suites, adds a process-isolated E2E layer, extracts shared test infrastructure (singleton container, harness factory, module loader), and wires cross-platform CI that exercises the full suite on Linux, macOS, and Windows.
Why
The old
agenticoding.test.tswas a single file that grew organically. It used ad-hoc mocks and stubs, imported production module-level singletons directly (leaking state between tests), and had no cross-platform coverage. The new architecture enforces no-mocks testing — every suite verifies external invariants, constraints, or user-facing contracts through only real code paths. The E2E layer runs real tool invocations against a minimal pi host in a child process, testing the actual tool contracts without mocking the SDK.What changed
Test infrastructure:
RuntimeSingletons— a single container the test harness swaps atomically via__setSingletons(), eliminating per-test patchescreateTestHarness()intests/test-utils.ts— one call per test isolates singletons and captures console output; teardown restores originalsscripts/run-node-test.mjs— runner that loads the custom module bootstrap via--import(replacing the deprecated--experimental-loaderflag)test-loader.mjsto resolve SDK packages by walkingnode_modulesinstead of hardcoding an absolute path — works from any install locationUnit test reorganization:
tests/unit/handoff.test.ts— handoff tool, command, and compaction contractstests/unit/notebook.test.ts— write/read/index lifecycle, overwrites, rehydration, stale detection, concurrent write isolationtests/unit/render-snapshots.test.ts— golden-file snapshot tests for TUI render output (indicator levels, spawn call/result frames, nested session views)tests/unit/register-loader.test.ts— validates the module loader bootstrap resolves correctlytests/unit/runtime-singletons.test.ts— singleton container atomic swap, pending-write protectiontests/unit/spawn-render.test.ts— TUI rendering of spawn lifecycle (expanded/collapsed, success/error/abort, truncation markers)tests/unit/spawn.test.ts(migrated from the original) — spawn execution logic, tool inheritance, session lifecycle, cleanup guarantees, cancellationtests/unit/state-invariants.test.ts— property-based tests usingfast-checkto verify runtime singletons never silently drop pending writestests/unit/system-prompt.test.ts— system prompt injection atbefore_agent_starttests/unit/topic.test.ts— notebook topic lifecycle (set, override, human-authoritative)tests/unit/tui-indicators.test.ts— TUI status/widget indicators for context usagetests/unit/watchdog.test.ts— primacy-zone advisory injection at thresholdsE2E layer:
tests/e2e/test-host.ts— minimal pi host REPL that loads the real extension, no TUItests/e2e/pty-harness.ts— process-isolated child-process harness for stdin/stdout communication (no PTY dependency, works on Windows)tests/e2e/basic.test.ts— tool registration, notebook round-trips, topic lifecycle, handoff state, command registration, error handlingCI:
npm ci→ type check → security audit → unit tests → E2E tests → upload results.gitattributesenforcing LF line endingsUPDATE_SNAPSHOTS=1env varSDK upgrade: Bumped
peerDependenciesto^0.78.1and adapted to new type surfaces (CustomEntry,TextContent, TypeBoxStatic, widened message content union). Addedtypebox,fast-checkas dev dependencies.Breaking changes: None to the extension's public API or behavior. The package.json
descriptionandkeywordswere updated to reflect the current primitive set (notebook replaces ledger). Node engine minimum set to>=22.Attached is an agent optimized description of the changes in this PR - AGENT_REVIEW.md