Skip to content

Expand E2E test coverage across all 4 SDKs#1186

Merged
stephentoub merged 7 commits intomainfrom
stephentoub/e2etests
May 2, 2026
Merged

Expand E2E test coverage across all 4 SDKs#1186
stephentoub merged 7 commits intomainfrom
stephentoub/e2etests

Conversation

@stephentoub
Copy link
Copy Markdown
Collaborator

@stephentoub stephentoub commented May 1, 2026

Adds comprehensive end-to-end coverage of the SDK + Copilot CLI surface area across C#, Python, TypeScript, and Go, with the goal that every public API and every JSON-RPC method has at least one E2E test. Reorganizes the C# and Python test layouts to clearly separate unit and E2E tests, and adds E2E suffixes to test class/file names for clarity.

What's added

E2E coverage (parity across all 4 SDKs):

  • Session lifecycle: connect/disconnect, dispose semantics, multi-client scenarios, resume, force-stop, idle-then-suspend.
  • Session config: model selection (vision-enabled/disabled transitions), agent selection, allowed/denied tool sets, working directory, environment variables, system prompts, MCP servers.
  • RPC surface: agent (get/getCurrent/list/reload), session state (capabilities/getCurrent/getMetadata/list/reset), trust, model (get/getCurrent/list/setCurrent), permission (request/respond/list), hooks, plan, telemetry, command execute/elicit/respond.
  • Streaming: assistant.message_delta + reasoning_delta ordering and matching message IDs across delta and final events.
  • Suspend RPC: suspend during pending permission, suspend during pending external tool, suspend idle session, resume + continue conversation after suspend.
  • Hooks: pre/post tool, pre/post session, deny verification (asserts target file is unchanged after a deny).
  • Permissions: per-session auth tokens with auto-token opt-out for tests that explicitly verify the unauthenticated path.
  • GitHub references, attachments, custom request headers, trace context propagation.
  • Tool routing, command handlers, elicitation flow.

Unit coverage:

  • Forward-compatibility for unknown discriminators and unknown event envelope types so unrecognized future events round-trip safely.

Snapshot harness:

  • 100+ new YAML snapshots under test/snapshots/ covering the new scenarios; existing snapshots normalized for portability.

Test layout:

  • C# tests split into dotnet/test/E2E/ and dotnet/test/Unit/ folders; xUnit collection serializes E2E execution to avoid CLI process contention (with explanatory comment at the attribute site).
  • Python E2E tests live under python/e2e/ with a shared harness proxy/context. E2E suffix added to classes/methods.
  • Go E2E tests use _e2e_test.go suffix and an E2E test-function suffix for clarity.
  • TypeScript E2E tests use .e2e.test.ts suffix; createSdkTestContext registers Vitest hooks that auto-stop CopilotClient and CapiProxy.

What's fixed

Snapshot drift root cause:

  • test/harness/replayingCapiProxy.ts: writeCapturesToDisk was called from both updateConfig and stop regardless of CI mode. When a test exercised only a subset of a multi-conversation snapshot, the file was silently rewritten with that subset, breaking later runs. Both writes are now guarded by process.env.GITHUB_ACTIONS !== "true".

gh CLI environment-dependent help text:

  • Added a normalizeGhAuthMessages tool result normalizer so the two forms of "auth required" help text emitted by gh (GitHub Actions vs. local dev) both map to a stable \ placeholder before snapshot match.

PerSessionAuth auto-token leak:

  • dotnet/test/Harness/E2ETestContext.cs: added an autoInjectGitHubToken parameter so CreateAuthTestClient can opt out of the CI auto-token injection that was silently authenticating tests verifying the unauthenticated path.

Go SDK lifecycle and codegen:

  • Polling-based waitForCapability helper in go/session_test.go to replace race-prone fixed sleeps.
  • Telemetry marker constant aligned with snapshot.
  • Codegen / lifecycle bugs surfaced while porting C# E2E coverage.

TypeScript cleanup hangs:

  • Replaced fixed setTimeout waits in lifecycle/session tests with bounded polling helpers (waitFor / getLastSessionId / getSessionMetadata polls).
  • Bumped the should stop cleanly test timeout to 60s to absorb slow-machine variability.

Python:

  • _wait_for polling helper in test_commands_and_elicitation.py replaces flaky fixed sleeps.

Adds comprehensive end-to-end coverage of the SDK + Copilot CLI surface area
across C#, Python, TypeScript, and Go, with the goal that every public API
and every JSON-RPC method has at least one E2E test. Reorganizes the C#
and Python test layouts to clearly separate unit and E2E tests, and adds
`E2E` suffixes to test class/file names for clarity.

What's added
------------

E2E coverage (parity across all 4 SDKs):
- Session lifecycle: connect/disconnect, dispose semantics, multi-client
  scenarios, resume, force-stop, idle-then-suspend.
- Session config: model selection (vision-enabled/disabled transitions),
  agent selection, allowed/denied tool sets, working directory,
  environment variables, system prompts, MCP servers.
- RPC surface: agent (get/getCurrent/list/reload), session state
  (capabilities/getCurrent/getMetadata/list/reset), trust, model
  (get/getCurrent/list/setCurrent), permission (request/respond/list),
  hooks, plan, telemetry, command execute/elicit/respond.
- Streaming: assistant.message_delta + reasoning_delta ordering and
  matching message IDs across delta and final events.
- Suspend RPC: suspend during pending permission, suspend during pending
  external tool, suspend idle session, resume + continue conversation
  after suspend.
- Hooks: pre/post tool, pre/post session, deny verification (asserts
  target file is unchanged after a deny).
- Permissions: per-session auth tokens with auto-token opt-out for tests
  that explicitly verify the unauthenticated path.
- GitHub references, attachments, custom request headers, trace context
  propagation.
- Tool routing, command handlers, elicitation flow.

Unit coverage:
- Forward-compatibility for unknown discriminators and unknown event
  envelope types so unrecognized future events round-trip safely.

Snapshot harness:
- 100+ new YAML snapshots under `test/snapshots/` covering the new
  scenarios; existing snapshots normalized for portability.

Test layout:
- C# tests split into `dotnet/test/E2E/` and `dotnet/test/Unit/`
  folders; xUnit collection serializes E2E execution to avoid CLI
  process contention (with explanatory comment at the attribute site).
- Python E2E tests live under `python/e2e/` with a shared harness
  proxy/context. `E2E` suffix added to classes/methods.
- Go E2E tests use `_e2e_test.go` suffix and an `E2E` test-function
  suffix for clarity.
- TypeScript E2E tests use `.e2e.test.ts` suffix; `createSdkTestContext`
  registers Vitest hooks that auto-stop `CopilotClient` and `CapiProxy`.

What's fixed
------------

Snapshot drift root cause:
- `test/harness/replayingCapiProxy.ts`: `writeCapturesToDisk` was
  called from both `updateConfig` and `stop` regardless of CI mode.
  When a test exercised only a subset of a multi-conversation snapshot,
  the file was silently rewritten with that subset, breaking later runs.
  Both writes are now guarded by `process.env.GITHUB_ACTIONS !== "true"`.

`gh` CLI environment-dependent help text:
- Added a `normalizeGhAuthMessages` tool result normalizer so the two
  forms of "auth required" help text emitted by `gh` (GitHub Actions
  vs. local dev) both map to a stable `\` placeholder
  before snapshot match.

PerSessionAuth auto-token leak:
- `dotnet/test/Harness/E2ETestContext.cs`: added an
  `autoInjectGitHubToken` parameter so `CreateAuthTestClient` can
  opt out of the CI auto-token injection that was silently authenticating
  tests verifying the unauthenticated path.

Go SDK lifecycle and codegen:
- Polling-based `waitForCapability` helper in `go/session_test.go`
  to replace race-prone fixed sleeps.
- Telemetry marker constant aligned with snapshot.
- Codegen / lifecycle bugs surfaced while porting C# E2E coverage.

TypeScript cleanup hangs:
- Replaced fixed `setTimeout` waits in lifecycle/session tests with
  bounded polling helpers (`waitFor` / `getLastSessionId` /
  `getSessionMetadata` polls).
- Bumped the `should stop cleanly` test timeout to 60s to absorb
  slow-machine variability.

Python:
- `_wait_for` polling helper in `test_commands_and_elicitation.py`
  replaces flaky fixed sleeps.

Verification
------------

Five consecutive full-suite runs across all 4 SDKs, all clean (the final
run was executed with all 4 suites in parallel to confirm the fixes hold
under contention):

- C#:     319 passed / 0 failed / 4 skipped
- Python: 356 passed / 0 failed / 8 skipped
- TS:     328 passed / 0 failed / 9 skipped
- Go:     all packages ok

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub stephentoub requested a review from a team as a code owner May 1, 2026 21:36
Copilot AI review requested due to automatic review settings May 1, 2026 21:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Comment thread python/e2e/test_pending_work_resume_e2e.py Fixed
Comment thread python/e2e/test_rpc_mcp_and_skills_e2e.py Fixed
Comment thread python/e2e/test_rpc_session_state_e2e.py Fixed
Comment thread python/e2e/test_rpc_tasks_and_handlers_e2e.py Fixed
Comment thread python/e2e/test_rpc_server_e2e.py Fixed
Comment thread python/e2e/test_rpc_server_e2e.py Fixed
Comment thread dotnet/test/E2E/HookLifecycleAndOutputE2ETests.cs Fixed
Comment thread dotnet/test/E2E/ClientOptionsE2ETests.cs Fixed
Comment thread dotnet/test/E2E/RpcShellAndFleetE2ETests.cs Fixed
Comment thread dotnet/test/E2E/SessionConfigE2ETests.cs Outdated
Comment thread dotnet/test/E2E/SessionE2ETests.cs Outdated
Comment thread dotnet/test/E2E/SessionE2ETests.cs Outdated
Comment thread dotnet/test/E2E/SessionE2ETests.cs Outdated
Comment thread dotnet/test/E2E/HooksE2ETests.cs Fixed
@github-actions

This comment has been minimized.

- Format Node E2E test files with prettier (4 files: client_lifecycle,
  commands, session_fs, session_lifecycle).
- Format/lint Python files with ruff: switched Callable import from
  typing to collections.abc in test_commands_and_elicitation.py; split a
  long comment in conftest.py to satisfy E501; removed unnecessary "r"
  open mode in test_hooks_e2e.py.
- Resolve macOS /var -> /private/var symlinks in test harnesses so the
  paths match what spawned subprocesses see when they resolve their cwd:
    * Go: filepath.EvalSymlinks on home_dir / work_dir.
    * .NET: P/Invoke libc realpath (with CA2101-compliant marshaling)
      and a Windows fallback to Path.GetFullPath.
    * Python: os.path.realpath wrapping tempfile.mkdtemp in the shared
      E2ETestContext and in the per-test multi-client harnesses
      (test_commands_e2e, test_multi_client_e2e,
      test_ui_elicitation_multi_client_e2e).
- Loosen per-session "no token" assertions in .NET and Python so they
  match Node/Go: in CI the process-level fake GITHUB_TOKEN can make
  IsAuthenticated true even without a per-session token, and the Login
  may surface as null/None on Linux/macOS but as an empty string on
  Windows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
@github-actions

This comment has been minimized.

- StreamingFidelityE2ETests: lock around the events list in the three
  remaining streaming tests. Without locking, the event handler runs on
  the dispatcher thread and continues to append after SendAndWaitAsync
  returns, racing with the test thread's enumeration and producing
  "Collection was modified; enumeration operation may not execute".
  This mirrors the lock-and-snapshot pattern already used in the
  Should_Emit_AssistantMessageStart_Before_Deltas_With_Matching_MessageId
  test in the same file.
- python/pyproject.toml: add opentelemetry-sdk to dev dependencies so
  test_telemetry_e2e.py::test_restores_w3c_trace_context can construct
  a real TracerProvider. The opentelemetry-api package alone provides
  the namespace package "opentelemetry" but not "opentelemetry.sdk".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

- Bind awaited results in async helpers to silence `no-effect''
  warnings (Python `await x` -> `_ = await x`).
- Add explanatory comments to intentional `except ExceptionGroup: pass`
  blocks in test_rpc_server_e2e.py.
- HookLifecycleAndOutputE2ETests: remove unused errorInputs container.
- ClientOptionsE2ETests: dispose TcpListener via `using`.
- RpcShellAndFleetE2ETests: simplify `Contains(...) == true` to a
  null-coalescing form.
- SessionConfigE2ETests/SessionE2ETests: use `catch (Exception)` in
  `DisposeAsync` cleanup blocks instead of bare `catch`.
- E2ETestContext.ResolveSymlinks: replace libc realpath/free P/Invoke
  with managed `DirectoryInfo.ResolveLinkTarget`-based component walk;
  preserves macOS `/var -> /private/var` resolution behavior. Also
  switches the fallback `catch` to typed exceptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread python/e2e/test_pending_work_resume_e2e.py Fixed
Comment thread python/e2e/test_rpc_mcp_and_skills_e2e.py Fixed
Comment thread python/e2e/test_rpc_session_state_e2e.py Fixed
Comment thread python/e2e/test_rpc_tasks_and_handlers_e2e.py Fixed
Comment thread dotnet/test/E2E/HooksE2ETests.cs Fixed
Comment thread dotnet/test/Harness/E2ETestContext.cs Fixed
stephentoub and others added 2 commits May 1, 2026 19:15
CodeQL flagged the named `_result`/`_captured_permission_request`
discards as unused locals after the prior commit. Switch to the
single-underscore convention, which CodeQL's py/unused-local-variable
rule and ruff F841 both treat as an intentional discard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CodeQL flags Path.Combine because if a later argument is rooted, it
silently discards the earlier ones. Path.Join concatenates segments
literally and is the safer choice for these cases.

- E2ETestContext.ResolveSymlinks: use Path.Join when walking path
  components (the loop appends user-controlled directory names that
  realistically never start with a path separator on these temp dirs,
  but the safer API removes the risk regardless).
- HooksE2ETests: use Path.Join for the protected-file lookup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@stephentoub
Copy link
Copy Markdown
Collaborator Author

The PR description says:
Suspend RPC: suspend during pending permission, suspend during pending external tool, suspend idle session, resume + continue conversation after suspend. (parity across all 4 SDKs)

Accepted. I updated the PR description to clarify that the dedicated session.suspend RPC coverage is .NET-specific, while Node.js, Go, and Python cover the separate ForceStop pending-work-resume path.

Generated by Copilot.

Add missing E2E coverage so suspend RPC, event fidelity, built-in tool smoke tests, error resilience, and multi-turn scenarios are represented across all SDKs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Cross-SDK Consistency Review ✅

This PR adds comprehensive E2E test coverage across all four SDK implementations (TypeScript, Python, Go, .NET) and the changes are well-aligned across languages. Here's a brief breakdown:

SDK Source Changes (non-test)

File Change Nature
dotnet/src/Client.cs Add PermissionRequestResultKind to JsonSerializable context .NET-specific AOT/source-gen fix
dotnet/src/Types.cs Add PermissionRequestResultKind to JsonSerializable context .NET-specific AOT/source-gen fix
dotnet/src/Generated/Rpc.cs Add primitive types (bool, double, int, long, string) to JsonSerializable .NET-specific codegen/AOT fix
python/copilot/session_fs_provider.py Fix datetime.UTC import (use UTC from datetime module directly) Python 3.11+ language-specific bug fix
scripts/codegen/csharp.ts Emit primitive type registrations in codegen output Keeps Rpc.cs in sync
scripts/codegen/go.ts Add addMissingOmitemptyToQuicktypeStructs post-processor Go codegen improvement

All source changes are language-internal (serialization infrastructure, type system quirks) with no impact on the public API surface. No cross-SDK consistency concerns here.

E2E Test Coverage

The PR adds ~36–39 parallel E2E test files per language covering the same scenarios across all four SDKs. Key coverage areas are fully represented in all SDKs:

  • Session lifecycle (create, disconnect, resume, list, delete)
  • Session config (model selection, working directory, MCP servers, custom agents, system prompts, env vars)
  • RPC surface (agents, state, models, permissions, hooks, plan, telemetry, commands, elicitation)
  • Streaming fidelity (delta/final event ordering and message ID matching)
  • Suspend RPC (pending permissions, pending external tools, idle-then-suspend, resume after suspend)
  • Hooks, skills, compaction, multi-client, pending work resume, per-session auth
  • Error resilience, event fidelity, built-in tools, tool results

Minor structural differences (e.g., Go combining commands + elicitation into a single file; .NET splitting into more granular test classes) reflect language idioms rather than coverage gaps — the actual test scenarios are equivalent across SDKs.

No cross-SDK consistency issues found. The PR maintains excellent feature parity across all four implementations.

Generated by SDK Consistency Review Agent for issue #1186 · ● 1.6M ·

@stephentoub stephentoub enabled auto-merge May 2, 2026 01:59
@stephentoub stephentoub disabled auto-merge May 2, 2026 01:59
@stephentoub stephentoub merged commit f7f9450 into main May 2, 2026
36 checks passed
@stephentoub stephentoub deleted the stephentoub/e2etests branch May 2, 2026 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants