Skip to content

refactor(tests): modular mock-free test suite with cross-platform CI#13

Merged
grzegorznowak merged 40 commits into
mainfrom
refactor/tests
Jun 13, 2026
Merged

refactor(tests): modular mock-free test suite with cross-platform CI#13
grzegorznowak merged 40 commits into
mainfrom
refactor/tests

Conversation

@ofriw

@ofriw ofriw commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Note: This PR was generated by an AI agent. If you'd like to talk with other humans, drop by our Discord!


What

This PR replaces the single monolithic 3,840-line test file with per-module unit suites, adds a process-isolated E2E layer, extracts shared test infrastructure (singleton container, harness factory, module loader), and wires cross-platform CI that exercises the full suite on Linux, macOS, and Windows.

Why

The old agenticoding.test.ts was a single file that grew organically. It used ad-hoc mocks and stubs, imported production module-level singletons directly (leaking state between tests), and had no cross-platform coverage. The new architecture enforces no-mocks testing — every suite verifies external invariants, constraints, or user-facing contracts through only real code paths. The E2E layer runs real tool invocations against a minimal pi host in a child process, testing the actual tool contracts without mocking the SDK.

What changed

Test infrastructure:

  • Extracted three module globals (write lock, frame scheduler, async context) into RuntimeSingletons — a single container the test harness swaps atomically via __setSingletons(), eliminating per-test patches
  • Created createTestHarness() in tests/test-utils.ts — one call per test isolates singletons and captures console output; teardown restores originals
  • Added scripts/run-node-test.mjs — runner that loads the custom module bootstrap via --import (replacing the deprecated --experimental-loader flag)
  • Migrated test-loader.mjs to resolve SDK packages by walking node_modules instead of hardcoding an absolute path — works from any install location

Unit test reorganization:

  • tests/unit/handoff.test.ts — handoff tool, command, and compaction contracts
  • tests/unit/notebook.test.ts — write/read/index lifecycle, overwrites, rehydration, stale detection, concurrent write isolation
  • tests/unit/render-snapshots.test.ts — golden-file snapshot tests for TUI render output (indicator levels, spawn call/result frames, nested session views)
  • tests/unit/register-loader.test.ts — validates the module loader bootstrap resolves correctly
  • tests/unit/runtime-singletons.test.ts — singleton container atomic swap, pending-write protection
  • tests/unit/spawn-render.test.ts — TUI rendering of spawn lifecycle (expanded/collapsed, success/error/abort, truncation markers)
  • tests/unit/spawn.test.ts (migrated from the original) — spawn execution logic, tool inheritance, session lifecycle, cleanup guarantees, cancellation
  • tests/unit/state-invariants.test.ts — property-based tests using fast-check to verify runtime singletons never silently drop pending writes
  • tests/unit/system-prompt.test.ts — system prompt injection at before_agent_start
  • tests/unit/topic.test.ts — notebook topic lifecycle (set, override, human-authoritative)
  • tests/unit/tui-indicators.test.ts — TUI status/widget indicators for context usage
  • tests/unit/watchdog.test.ts — primacy-zone advisory injection at thresholds

E2E layer:

  • tests/e2e/test-host.ts — minimal pi host REPL that loads the real extension, no TUI
  • tests/e2e/pty-harness.ts — process-isolated child-process harness for stdin/stdout communication (no PTY dependency, works on Windows)
  • tests/e2e/basic.test.ts — tool registration, notebook round-trips, topic lifecycle, handoff state, command registration, error handling

CI:

  • Cross-platform matrix: Ubuntu (Node 22 + 24), macOS (Node 24), Windows (Node 24)
  • Steps: npm ci → type check → security audit → unit tests → E2E tests → upload results
  • Git LFS snapshots with .gitattributes enforcing LF line endings
  • Snapshot update controlled via UPDATE_SNAPSHOTS=1 env var
  • Test artifacts retained 30 days

SDK upgrade: Bumped peerDependencies to ^0.78.1 and adapted to new type surfaces (CustomEntry, TextContent, TypeBox Static, widened message content union). Added typebox, fast-check as dev dependencies.

Breaking changes: None to the extension's public API or behavior. The package.json description and keywords were updated to reflect the current primitive set (notebook replaces ledger). Node engine minimum set to >=22.


Attached is an agent optimized description of the changes in this PR - AGENT_REVIEW.md

ofriw added 16 commits June 7, 2026 09:30
- Remove the Windows exclusion guard from E2E test step in CI workflow
- The process-isolated child-process harness uses only cross-platform Node.js
  APIs (spawn, readline, stdio pipes) and is verified to work on Windows
- Add .gitattributes to enforce LF line endings on snapshot golden files
- Add normalizeEOL() helper in snapshot tests for Windows CRLF handling
- Add E2E_TIMEOUT_MS env var support for configurable test timeouts
- Add engines.node >=22 to package.json
- Add tsconfig.json for type checking
- Update CI documentation in CONTRIBUTING.md

Closes #12
@ofriw ofriw marked this pull request as draft June 8, 2026 17:22
@ofriw ofriw changed the title Enable E2E tests on Windows (cross-platform CI) refactor(tests): modular mock-free test suite with cross-platform CI Jun 9, 2026
@ofriw ofriw marked this pull request as ready for review June 12, 2026 18:36
@ofriw ofriw requested a review from grzegorznowak June 12, 2026 18:36

@grzegorznowak grzegorznowak left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work, no major findings @ofriw 💪🏾

@grzegorznowak grzegorznowak merged commit b051c2b into main Jun 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants