feat(acp): Setup acp with gemini agent to replace API model#56
Closed
CSRessel wants to merge 40 commits intofork/upstream-mainfrom
Closed
feat(acp): Setup acp with gemini agent to replace API model#56CSRessel wants to merge 40 commits intofork/upstream-mainfrom
CSRessel wants to merge 40 commits intofork/upstream-mainfrom
Conversation
Add two blackbox snapshot tests to codex-tui that verify UI rendering: - blackbox_typing_snapshot: Tests that text appears in composer when set - blackbox_model_picker_snapshot: Tests that /model popup renders correctly These tests treat the ChatWidget as a black box, testing the rendering pipeline by setting state through public interfaces and verifying the visual output via insta snapshots. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
acp fmt
Add AcpModelClient that implements end-to-end streaming from ACP agents: - Spawn agent subprocess and initialize with capabilities - Create session and send prompts - Stream AgentMessageChunk notifications as TextDelta events - Map prompt response to Completed event This thin slice proves the architecture works and provides foundation for subsequent phases (bidirectional transport, client handlers, session management, complete event mapping, error handling). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds the gemini-acp provider to the built-in model providers list to fix the "Model provider `gemini-acp` not found" error on TUI startup. This provider entry is needed for the ACP thin slice implementation to work with the Gemini CLI agent.
Expose Gemini ACP model in the TUI model selection list and implement logic to switch to the correct provider when a Gemini model is selected.
Add comprehensive TUI integration testing infrastructure: - New tui-integration-tests crate with PTY session management - Test coverage for startup, prompt flow, input handling, and cancellation - Enhanced mock-acp-agent with configurable responses and delays - Support for simulating realistic streaming behavior via env vars The testing framework uses portable-pty and vt100 parser to drive the TUI programmatically and verify screen contents, enabling reliable end-to-end testing of terminal interactions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Crossterm queries cursor position during initialization via ESC[6n control sequence, but the PTY emulator doesn't automatically respond. This caused tests to timeout waiting for the response. Added intercept_control_sequences() to detect cursor position queries in the PTY output stream and write back ESC[1;1R responses, enabling the TUI to initialize properly in the test environment. Updated test expectations to match the initial welcome screen that now appears, and added snapshot for regression detection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…imeout hangs Previously, tests using wait_for_text() would hang indefinitely when the expected text never appeared, because the PTY reader was in blocking mode. This caused read() to block forever waiting for data, preventing the timeout mechanism from executing. Changes: - Set PTY master to non-blocking mode using fcntl(O_NONBLOCK) on Unix - Add helper function set_nonblocking() using libc for low-level fcntl ops - read() now returns WouldBlock immediately when no data is available - Timeout checks in wait_for() can now execute, failing tests cleanly after 5s - Add test_poll_does_not_block_when_no_data to verify non-blocking behavior - Convert debug logging to conditional (DEBUG_TUI_PTY env var) - Uncomment and fix previously disabled tests in startup.rs - Update documentation with PTY non-blocking details and debugging info Dependencies added (Unix only): - nix = "0.27" (features: fs) - libc = "0.2" The fix ensures test timeouts work correctly, with tests failing cleanly rather than hanging indefinitely when conditions are not met. 🤖 Generated with [Nori](https://nori.ai) Co-Authored-By: Nori <noreply@tilework.tech>
…ession testing Adds snapshot assertions to 4 startup tests to capture final screen states: - startup_shows_welcome: Welcome screen with logo and /tmp/ path - startup_welcome_dimensions_40x120: Welcome screen at 40x120 dimensions - runs_in_temp_directory: Verification of /tmp/ directory usage - trust_screen_skipped: Main prompt without trust screen Excludes test_poll_does_not_block_when_no_data as it tests timing behavior, not UI. 🤖 Generated with [Nori](https://nori.ai) Co-Authored-By: Nori <noreply@tilework.tech>
…overage Move normalize_for_snapshot() and TIMEOUT constant from test modules to library exports for consistent reuse across all tests. Consolidate streaming/cancellation tests into a unified streaming.rs module. Add Drop trait implementation to TuiSession that prints screen state on panic, improving debugging of PTY timing issues. Expand snapshot testing coverage to input handling scenarios (ctrl-c, typing, arrow navigation). Add 100ms sleep delays after PTY input operations in tests to prevent race conditions between sending input and TUI processing. Update documentation to reflect new exported utilities, debugging aids, and PTY timing patterns.
…communication Adds full support for Agent Client Protocol (ACP) as a wire API option alongside existing Chat and Responses APIs. ACP enables subprocess-based agent communication over JSON-RPC 2.0 via stdin/stdout instead of HTTP requests. Core changes: - Add ACP agent registry (acp/src/registry.rs) mapping provider names to binary commands - Implement ModelClient::stream() for WireApi::Acp with event mapping to ResponseEvent - Add "mock-acp" and "gemini-acp" model families for proper model resolution - Configure mock-acp and gemini-acp providers with WireApi::Acp - Add comprehensive unit tests (8/8 passing) covering providers, registry, and model families Implementation follows existing Chat API patterns with subprocess spawning instead of HTTP. Event mapping: AcpEvent::TextDelta → ResponseEvent::OutputTextDelta, etc. Test coverage: - Unit tests: Provider configuration, model families, registry integration - Integration test: ModelClient streaming with mock-acp-agent binary - TUI test: Updated to use "mock-acp" model name (model resolution pending) Note: TUI integration test still needs model-name-to-provider-ID resolution work, tracked separately as it's a config loading concern rather than core ACP implementation. 🤖 Generated with [Nori](https://nori.ai) Co-Authored-By: Nori <noreply@tilework.tech>
Add init_file_tracing() function to enable structured file logging for the ACP package. - Accepts custom log file path parameter - Filters at DEBUG level and above (TRACE excluded) - Uses non-blocking file appender for async-safe writes - Returns error on re-initialization (global subscriber constraint) - Includes comprehensive integration test - Updates package documentation 🤖 Generated with [Nori](https://nori.ai) Co-Authored-By: Nori <noreply@tilework.tech>
Add automatic initialization of ACP file tracing when the CLI starts. Logs are written to .codex-acp.log in the current working directory. - Add codex-acp dependency to cli/Cargo.toml - Call init_file_tracing() in cli_main() after CLI argument parsing - Log warning (non-fatal) if tracing initialization fails - Update acp/docs.md to document automatic CLI initialization This ensures ACP tracing is always available when using the codex CLI, without requiring manual initialization by library users. 🤖 Generated with [Nori](https://nori.ai) Co-Authored-By: Nori <noreply@tilework.tech>
…rary Replaces custom protocol.rs and transport.rs implementations with the official agent-client-protocol v0.7 library, eliminating ~250 lines of duplicative JSON-RPC code. Key changes: - Implement Client trait in new client_handler.rs for permission/session callbacks - Refactor AgentProcess to use ClientSideConnection instead of StdioTransport - Add tokio-util compat layer for AsyncRead/Write trait compatibility - Use LocalSet pattern throughout for handling !Send futures from the library - Replace custom JsonRpcRequest/Response with typed library requests - Update all tests to wrap in LocalSet contexts - Fix clippy warnings (unused variables, unwrap usage, error formatting) Benefits: - Type-safe protocol communication with compile-time validation - Spec-compliant implementation maintained by library authors - Reduced maintenance burden and code complexity - Better error handling with library-defined error types Breaking changes: - AgentProcess API updated with typed new_session() and prompt() methods - ClientEvent enum introduced for permission requests and session updates - Public API now exports library types instead of custom types 🤖 Generated with [Nori](https://nori.ai) Co-Authored-By: Nori <noreply@tilework.tech>
Add comprehensive JSON request/response logging for ACP protocol debugging: - Log initialize and new_session requests/responses with pretty-printed JSON - Change protocol version from V1 to V0 for Gemini compatibility - Fix client capabilities to use camelCase format with meta fields - Add retry configuration (2 retries) for ACP model providers - Increase log tail from 50 to 150 lines for better error diagnostics Switch integration tests to use mock-acp instead of gemini-acp for faster iteration during development. Note: Includes temporary debugging code (hardcoded mock-acp provider in client.rs) that should be reverted before merging.
## Summary - Adds PTY-based integration test framework for full binary TUI testing - Implements visual regression testing using insta snapshots - Provides test coverage for startup, input handling, prompt flow, and cancellation flows - Includes comprehensive documentation and Claude skills for PTY testing patterns ## Key Features - **PTY Test Harness**: Custom test framework that spawns the actual binary in a pseudo-terminal for realistic testing - **Visual Regression Tests**: Insta snapshot tests capture terminal output for detecting UI regressions - **Test Coverage**: Startup flows, input handling, prompt interactions, and cancellation scenarios - **Developer Tools**: PTY testing skill documentation to guide future test development ## Technical Details - Intercepts cursor position queries to prevent test hangs - Non-blocking PTY reader implementation for reliable test execution - Normalizes screen snapshots to handle variable content - Supports multiple terminal dimensions for responsive UI testing ## Test Plan - [x] All existing tests pass - [x] New PTY integration tests pass with snapshots - [x] Tests run reliably without timeouts or hangs - [x] Documentation accurately describes the test framework
🤖 Generated with [Nori](https://nori.ai)\n\nCo-Authored-By: Nori <noreply@tilework.tech> Change the ACP registry to use model names for lookups instead of provider names. This separates the model identifier (what the user specifies) from the provider identifier (which agent subprocess to use). The provider field in AcpAgentConfig now uniquely identifies the agent subprocess, enabling future optimization to determine when an existing subprocess can be reused vs when a new one must be spawned when switching models. Changes: - Add provider field to AcpAgentConfig struct - Change get_agent_config() to accept model_name instead of provider_name - Update registry to map exact model names (e.g., "mock-model", "gemini-flash-2.0") to agent configurations - Fix client.rs WireApi::Acp to pass self.config.model to registry - Update all tests to use model names and verify provider field - Update documentation to reflect model-centric architecture The registry uses exact matching with case-insensitive normalization. Each model must be explicitly registered.
Collaborator
Author
Collaborator
Author
|
Branched to new default branch, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

0th in stack