Expand E2E test coverage across all 4 SDKs#1186
Conversation
Adds comprehensive end-to-end coverage of the SDK + Copilot CLI surface area across C#, Python, TypeScript, and Go, with the goal that every public API and every JSON-RPC method has at least one E2E test. Reorganizes the C# and Python test layouts to clearly separate unit and E2E tests, and adds `E2E` suffixes to test class/file names for clarity. What's added ------------ E2E coverage (parity across all 4 SDKs): - Session lifecycle: connect/disconnect, dispose semantics, multi-client scenarios, resume, force-stop, idle-then-suspend. - Session config: model selection (vision-enabled/disabled transitions), agent selection, allowed/denied tool sets, working directory, environment variables, system prompts, MCP servers. - RPC surface: agent (get/getCurrent/list/reload), session state (capabilities/getCurrent/getMetadata/list/reset), trust, model (get/getCurrent/list/setCurrent), permission (request/respond/list), hooks, plan, telemetry, command execute/elicit/respond. - Streaming: assistant.message_delta + reasoning_delta ordering and matching message IDs across delta and final events. - Suspend RPC: suspend during pending permission, suspend during pending external tool, suspend idle session, resume + continue conversation after suspend. - Hooks: pre/post tool, pre/post session, deny verification (asserts target file is unchanged after a deny). - Permissions: per-session auth tokens with auto-token opt-out for tests that explicitly verify the unauthenticated path. - GitHub references, attachments, custom request headers, trace context propagation. - Tool routing, command handlers, elicitation flow. Unit coverage: - Forward-compatibility for unknown discriminators and unknown event envelope types so unrecognized future events round-trip safely. Snapshot harness: - 100+ new YAML snapshots under `test/snapshots/` covering the new scenarios; existing snapshots normalized for portability. Test layout: - C# tests split into `dotnet/test/E2E/` and `dotnet/test/Unit/` folders; xUnit collection serializes E2E execution to avoid CLI process contention (with explanatory comment at the attribute site). - Python E2E tests live under `python/e2e/` with a shared harness proxy/context. `E2E` suffix added to classes/methods. - Go E2E tests use `_e2e_test.go` suffix and an `E2E` test-function suffix for clarity. - TypeScript E2E tests use `.e2e.test.ts` suffix; `createSdkTestContext` registers Vitest hooks that auto-stop `CopilotClient` and `CapiProxy`. What's fixed ------------ Snapshot drift root cause: - `test/harness/replayingCapiProxy.ts`: `writeCapturesToDisk` was called from both `updateConfig` and `stop` regardless of CI mode. When a test exercised only a subset of a multi-conversation snapshot, the file was silently rewritten with that subset, breaking later runs. Both writes are now guarded by `process.env.GITHUB_ACTIONS !== "true"`. `gh` CLI environment-dependent help text: - Added a `normalizeGhAuthMessages` tool result normalizer so the two forms of "auth required" help text emitted by `gh` (GitHub Actions vs. local dev) both map to a stable `\` placeholder before snapshot match. PerSessionAuth auto-token leak: - `dotnet/test/Harness/E2ETestContext.cs`: added an `autoInjectGitHubToken` parameter so `CreateAuthTestClient` can opt out of the CI auto-token injection that was silently authenticating tests verifying the unauthenticated path. Go SDK lifecycle and codegen: - Polling-based `waitForCapability` helper in `go/session_test.go` to replace race-prone fixed sleeps. - Telemetry marker constant aligned with snapshot. - Codegen / lifecycle bugs surfaced while porting C# E2E coverage. TypeScript cleanup hangs: - Replaced fixed `setTimeout` waits in lifecycle/session tests with bounded polling helpers (`waitFor` / `getLastSessionId` / `getSessionMetadata` polls). - Bumped the `should stop cleanly` test timeout to 60s to absorb slow-machine variability. Python: - `_wait_for` polling helper in `test_commands_and_elicitation.py` replaces flaky fixed sleeps. Verification ------------ Five consecutive full-suite runs across all 4 SDKs, all clean (the final run was executed with all 4 suites in parallel to confirm the fixes hold under contention): - C#: 319 passed / 0 failed / 4 skipped - Python: 356 passed / 0 failed / 8 skipped - TS: 328 passed / 0 failed / 9 skipped - Go: all packages ok Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
- Format Node E2E test files with prettier (4 files: client_lifecycle,
commands, session_fs, session_lifecycle).
- Format/lint Python files with ruff: switched Callable import from
typing to collections.abc in test_commands_and_elicitation.py; split a
long comment in conftest.py to satisfy E501; removed unnecessary "r"
open mode in test_hooks_e2e.py.
- Resolve macOS /var -> /private/var symlinks in test harnesses so the
paths match what spawned subprocesses see when they resolve their cwd:
* Go: filepath.EvalSymlinks on home_dir / work_dir.
* .NET: P/Invoke libc realpath (with CA2101-compliant marshaling)
and a Windows fallback to Path.GetFullPath.
* Python: os.path.realpath wrapping tempfile.mkdtemp in the shared
E2ETestContext and in the per-test multi-client harnesses
(test_commands_e2e, test_multi_client_e2e,
test_ui_elicitation_multi_client_e2e).
- Loosen per-session "no token" assertions in .NET and Python so they
match Node/Go: in CI the process-level fake GITHUB_TOKEN can make
IsAuthenticated true even without a per-session token, and the Login
may surface as null/None on Linux/macOS but as an empty string on
Windows.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
- StreamingFidelityE2ETests: lock around the events list in the three remaining streaming tests. Without locking, the event handler runs on the dispatcher thread and continues to append after SendAndWaitAsync returns, racing with the test thread's enumeration and producing "Collection was modified; enumeration operation may not execute". This mirrors the lock-and-snapshot pattern already used in the Should_Emit_AssistantMessageStart_Before_Deltas_With_Matching_MessageId test in the same file. - python/pyproject.toml: add opentelemetry-sdk to dev dependencies so test_telemetry_e2e.py::test_restores_w3c_trace_context can construct a real TracerProvider. The opentelemetry-api package alone provides the namespace package "opentelemetry" but not "opentelemetry.sdk". Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
- Bind awaited results in async helpers to silence `no-effect'' warnings (Python `await x` -> `_ = await x`). - Add explanatory comments to intentional `except ExceptionGroup: pass` blocks in test_rpc_server_e2e.py. - HookLifecycleAndOutputE2ETests: remove unused errorInputs container. - ClientOptionsE2ETests: dispose TcpListener via `using`. - RpcShellAndFleetE2ETests: simplify `Contains(...) == true` to a null-coalescing form. - SessionConfigE2ETests/SessionE2ETests: use `catch (Exception)` in `DisposeAsync` cleanup blocks instead of bare `catch`. - E2ETestContext.ResolveSymlinks: replace libc realpath/free P/Invoke with managed `DirectoryInfo.ResolveLinkTarget`-based component walk; preserves macOS `/var -> /private/var` resolution behavior. Also switches the fallback `catch` to typed exceptions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CodeQL flagged the named `_result`/`_captured_permission_request` discards as unused locals after the prior commit. Switch to the single-underscore convention, which CodeQL's py/unused-local-variable rule and ruff F841 both treat as an intentional discard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CodeQL flags Path.Combine because if a later argument is rooted, it silently discards the earlier ones. Path.Join concatenates segments literally and is the safer choice for these cases. - E2ETestContext.ResolveSymlinks: use Path.Join when walking path components (the loop appends user-controlled directory names that realistically never start with a path separator on these temp dirs, but the safer API removes the risk regardless). - HooksE2ETests: use Path.Join for the protected-file lookup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Accepted. I updated the PR description to clarify that the dedicated Generated by Copilot. |
Add missing E2E coverage so suspend RPC, event fidelity, built-in tool smoke tests, error resilience, and multi-turn scenarios are represented across all SDKs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency Review ✅This PR adds comprehensive E2E test coverage across all four SDK implementations (TypeScript, Python, Go, .NET) and the changes are well-aligned across languages. Here's a brief breakdown: SDK Source Changes (non-test)
All source changes are language-internal (serialization infrastructure, type system quirks) with no impact on the public API surface. No cross-SDK consistency concerns here. E2E Test CoverageThe PR adds ~36–39 parallel E2E test files per language covering the same scenarios across all four SDKs. Key coverage areas are fully represented in all SDKs:
Minor structural differences (e.g., Go combining commands + elicitation into a single file; .NET splitting into more granular test classes) reflect language idioms rather than coverage gaps — the actual test scenarios are equivalent across SDKs. No cross-SDK consistency issues found. The PR maintains excellent feature parity across all four implementations.
|
Adds comprehensive end-to-end coverage of the SDK + Copilot CLI surface area across C#, Python, TypeScript, and Go, with the goal that every public API and every JSON-RPC method has at least one E2E test. Reorganizes the C# and Python test layouts to clearly separate unit and E2E tests, and adds
E2Esuffixes to test class/file names for clarity.What's added
E2E coverage (parity across all 4 SDKs):
Unit coverage:
Snapshot harness:
test/snapshots/covering the new scenarios; existing snapshots normalized for portability.Test layout:
dotnet/test/E2E/anddotnet/test/Unit/folders; xUnit collection serializes E2E execution to avoid CLI process contention (with explanatory comment at the attribute site).python/e2e/with a shared harness proxy/context.E2Esuffix added to classes/methods._e2e_test.gosuffix and anE2Etest-function suffix for clarity..e2e.test.tssuffix;createSdkTestContextregisters Vitest hooks that auto-stopCopilotClientandCapiProxy.What's fixed
Snapshot drift root cause:
test/harness/replayingCapiProxy.ts:writeCapturesToDiskwas called from bothupdateConfigandstopregardless of CI mode. When a test exercised only a subset of a multi-conversation snapshot, the file was silently rewritten with that subset, breaking later runs. Both writes are now guarded byprocess.env.GITHUB_ACTIONS !== "true".ghCLI environment-dependent help text:normalizeGhAuthMessagestool result normalizer so the two forms of "auth required" help text emitted bygh(GitHub Actions vs. local dev) both map to a stable\placeholder before snapshot match.PerSessionAuth auto-token leak:
dotnet/test/Harness/E2ETestContext.cs: added anautoInjectGitHubTokenparameter soCreateAuthTestClientcan opt out of the CI auto-token injection that was silently authenticating tests verifying the unauthenticated path.Go SDK lifecycle and codegen:
waitForCapabilityhelper ingo/session_test.goto replace race-prone fixed sleeps.TypeScript cleanup hangs:
setTimeoutwaits in lifecycle/session tests with bounded polling helpers (waitFor/getLastSessionId/getSessionMetadatapolls).should stop cleanlytest timeout to 60s to absorb slow-machine variability.Python:
_wait_forpolling helper intest_commands_and_elicitation.pyreplaces flaky fixed sleeps.