diff --git a/CHANGELOG.md b/CHANGELOG.md index 2e025e7..69b1468 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,13 +16,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **RFC-3986 URI codec** — `bridge::resources` module with percent-encoding via `url::Url::from_file_path`; empty-authority injection is rejected to prevent UNC-path attacks on Windows - **Subscription cap** — `ResourceSubscriptions` enforces a `MAX_SUBSCRIPTIONS = 1_000` limit per session to guard against memory exhaustion - **MCP tools** — `get_signature_help` (`textDocument/signatureHelp`), `go_to_implementation` (`textDocument/implementation`), `go_to_type_definition` (`textDocument/typeDefinition`), and `get_inlay_hints` (`textDocument/inlayHint`) tools exposing LSP 3.6/3.15/3.17 capabilities (#116) +- **Non-blocking startup for slow LSP servers** — `serve_with` spawns LSP initialization in a background task and starts the MCP server immediately, so the MCP `initialize` handshake no longer waits for the language server to finish loading. Large solutions that take a long time to load (e.g. OmniSharp on a ~130-project Unity solution, ~86 s) no longer trip the MCP client's initialize timeout. (#172) +- **`ServerInitializing` error** — a request for a configured language whose server is still loading returns a dedicated "still initializing, wait and retry" error instead of the misleading "no LSP server configured for language". (#172) ### Changed - **LSP API** — Breaking change: `InboundMessage` is now non-exhaustive and includes a server-request variant for LSP server-to-client JSON-RPC requests. Downstream exhaustive matches must include a wildcard arm. +- **Error API** — Breaking change: `Error` is now `#[non_exhaustive]` and gains a `ServerInitializing(String)` variant. Downstream exhaustive matches must include a wildcard arm. (#172) ### Fixed +- **Initialize handshake timeout** — the LSP `initialize` request now honors the per-server `timeout_seconds` configuration instead of a hardcoded 30 s, so servers that need longer to load on large projects are no longer killed mid-initialization. (#172) +- **Startup `null` messages** — the LSP receive loop skips bare `null`/non-object JSON-RPC messages (emitted by OmniSharp during startup) and keeps reading, instead of treating them as a fatal protocol error that drops the connection. (#172) +- **Partial-success expected languages** — the "expected languages" set is cleared once background initialization completes, so a language whose server failed to spawn (when others succeeded) falls back to `NoServerForLanguage` instead of reporting `ServerInitializing` forever. (#172) + - **ServerCancelled retry** — `LspClient::request()` now retries up to 3 times with exponential backoff (500 ms → 1 s → 2 s) when an LSP server returns error code -32802 with `data.retriggerRequest: true`, instead of propagating the error immediately to the MCP caller (#128) - **Integration test readiness gate** — Replaced `publishDiagnostics`-based readiness signal with hover-probe polling (3 consecutive successful hover responses required), matching the ra_e2e approach; fixes 3 of 5 integration tests that failed consistently in isolation after PR #123 (#127) - **LSP server requests** — Handle server-to-client requests such as `client/registerCapability`, fixing tsgo timeouts. diff --git a/crates/mcpls-core/src/bridge/translator.rs b/crates/mcpls-core/src/bridge/translator.rs index e0a6de7..30164a1 100644 --- a/crates/mcpls-core/src/bridge/translator.rs +++ b/crates/mcpls-core/src/bridge/translator.rs @@ -1,6 +1,6 @@ //! MCP to LSP translation layer. -use std::collections::HashMap; +use std::collections::{HashMap, HashSet}; use std::path::{Path, PathBuf}; use lsp_types::{ @@ -39,6 +39,10 @@ pub struct Translator { workspace_roots: Vec, /// Custom file extension to language ID mappings. extension_map: HashMap, + /// Languages that are configured + applicable but whose LSP server may not + /// have finished initializing yet (background init). Used to return a clear + /// "still initializing" error instead of "no server configured". + expected_languages: HashSet, } impl Translator { @@ -52,6 +56,7 @@ impl Translator { notification_cache: NotificationCache::new(), workspace_roots: vec![], extension_map: HashMap::new(), + expected_languages: HashSet::new(), } } @@ -60,6 +65,17 @@ impl Translator { self.workspace_roots = roots; } + /// Mark the set of languages whose LSP servers are expected (configured + + /// applicable) but may still be initializing in the background. + pub fn set_expected_languages(&mut self, languages: HashSet) { + self.expected_languages = languages; + } + + /// Clear the expected-languages set (e.g. after background init failed). + pub fn clear_expected_languages(&mut self) { + self.expected_languages.clear(); + } + /// Configure custom file extension mappings. /// /// This method sets the extension map and updates the document tracker @@ -555,10 +571,17 @@ impl Translator { /// Get a cloned LSP client for a file path based on language detection. fn get_client_for_file(&self, path: &Path) -> Result { let language_id = detect_language(path, &self.extension_map); - self.lsp_clients - .get(&language_id) - .cloned() - .ok_or(Error::NoServerForLanguage(language_id)) + self.lsp_clients.get(&language_id).cloned().ok_or_else(|| { + // A configured+applicable language whose server has not registered + // yet is still initializing (e.g. a large Unity solution loading via + // OmniSharp); tell the caller to wait and retry rather than implying + // no server is configured at all. + if self.expected_languages.contains(&language_id) { + Error::ServerInitializing(language_id) + } else { + Error::NoServerForLanguage(language_id) + } + }) } /// Parse and validate a file URI, returning the validated path. @@ -1135,13 +1158,17 @@ impl Translator { ))); } - // Workspace search requires at least one LSP client - let client = self - .lsp_clients - .values() - .next() - .cloned() - .ok_or(Error::NoServerConfigured)?; + // Workspace search requires at least one LSP client. If none are + // registered yet but a configured server is still initializing, tell the + // caller to wait and retry rather than implying nothing is configured. + let client = self.lsp_clients.values().next().cloned().ok_or_else(|| { + self.expected_languages + .iter() + .next() + .map_or(Error::NoServerConfigured, |lang| { + Error::ServerInitializing(lang.clone()) + }) + })?; let params = LspWorkspaceSymbolParams { query, @@ -2092,6 +2119,53 @@ mod tests { // This test verifies the data structure is properly initialized. } + #[test] + fn test_get_client_for_file_server_initializing_when_expected() { + // A configured/applicable language whose LSP client has not registered + // yet (large solution still loading via OmniSharp) must surface + // ServerInitializing — "wait and retry" — not NoServerForLanguage. + let mut translator = Translator::new(); + let path = PathBuf::from("/ws/Assets/Scripts/Player.cs"); + let lang = detect_language(&path, &translator.extension_map); + + let mut expected = HashSet::new(); + expected.insert(lang.clone()); + translator.set_expected_languages(expected); + + let err = translator.get_client_for_file(&path).unwrap_err(); + assert!(matches!(err, Error::ServerInitializing(ref l) if *l == lang)); + } + + #[test] + fn test_get_client_for_file_no_server_when_not_expected() { + // When the language is not in the expected set (no server configured for + // it at all), the error stays NoServerForLanguage. + let translator = Translator::new(); + let path = PathBuf::from("/ws/Assets/Scripts/Player.cs"); + let lang = detect_language(&path, &translator.extension_map); + + let err = translator.get_client_for_file(&path).unwrap_err(); + assert!(matches!(err, Error::NoServerForLanguage(ref l) if *l == lang)); + } + + #[test] + fn test_clear_expected_languages_reverts_to_no_server() { + // After initialization fails the expected set is cleared; subsequent + // lookups must fall back to NoServerForLanguage rather than keep + // implying the server is still on its way. + let mut translator = Translator::new(); + let path = PathBuf::from("/ws/Assets/Scripts/Player.cs"); + let lang = detect_language(&path, &translator.extension_map); + + let mut expected = HashSet::new(); + expected.insert(lang); + translator.set_expected_languages(expected); + translator.clear_expected_languages(); + + let err = translator.get_client_for_file(&path).unwrap_err(); + assert!(matches!(err, Error::NoServerForLanguage(_))); + } + #[test] fn test_diagnostic_request_params_omit_optional_null_fields() { let uri = "file:///test.ts".parse().unwrap(); diff --git a/crates/mcpls-core/src/error.rs b/crates/mcpls-core/src/error.rs index 113e6c3..11f3162 100644 --- a/crates/mcpls-core/src/error.rs +++ b/crates/mcpls-core/src/error.rs @@ -27,7 +27,12 @@ impl std::fmt::Display for ServerSpawnFailure { } /// The main error type for mcpls-core operations. +/// +/// This enum is `#[non_exhaustive]`: downstream crates that match on it must +/// include a wildcard arm. New variants (such as [`Error::ServerInitializing`]) +/// can then be added without further breaking changes. #[derive(Debug, thiserror::Error)] +#[non_exhaustive] pub enum Error { /// LSP server failed to initialize. #[error("LSP server initialization failed: {message}")] @@ -59,6 +64,12 @@ pub enum Error { #[error("no LSP server configured for language: {0}")] NoServerForLanguage(String), + /// LSP server for the language is configured but still initializing. + #[error( + "LSP server for language '{0}' is still initializing (large project load in progress); wait and retry the request (this may take a few minutes on large projects)" + )] + ServerInitializing(String), + /// No LSP server is currently configured. #[error("no LSP server configured")] NoServerConfigured, diff --git a/crates/mcpls-core/src/lib.rs b/crates/mcpls-core/src/lib.rs index b225dbf..553fab2 100644 --- a/crates/mcpls-core/src/lib.rs +++ b/crates/mcpls-core/src/lib.rs @@ -311,40 +311,21 @@ pub async fn serve_with(config: ServerConfig, transport: Transport) -> Result<() applicable_configs.len() ); - // notification_receivers collects per-language mpsc receivers used by the pump tasks. - let mut notification_receivers = std::collections::HashMap::new(); - - if applicable_configs.is_empty() { - warn!("No applicable LSP servers configured — starting in protocol-only mode"); - } else { - // Spawn all servers with graceful degradation. - let result = LspServer::spawn_batch(&applicable_configs).await; - - // Handle the three possible outcomes. - if result.all_failed() { - return Err(Error::AllServersFailedToInit { - count: result.failure_count(), - failures: result.failures, - }); - } - - if result.partial_success() { - warn!( - "Partial server initialization: {} succeeded, {} failed", - result.server_count(), - result.failure_count() - ); - for failure in &result.failures { - error!("Server initialization failed: {}", failure); - } - } - - // Register servers and extract their notification receivers. - let server_count = result.server_count(); - notification_receivers = register_servers(result, &mut translator); - info!("Proceeding with {} LSP server(s)", server_count); - } - + // Mark applicable languages as "expected" so a tool call that arrives while + // its server is still initializing gets a clear "still initializing" error + // (instead of "no server configured"), telling the caller to wait and retry. + let expected_languages: std::collections::HashSet = applicable_configs + .iter() + .map(|c| c.server_config.language_id.clone()) + .collect(); + translator.set_expected_languages(expected_languages); + + // Shared state, built BEFORE LSP initialization so the MCP server can answer + // `initialize` immediately. LSP servers (which can take minutes to initialize + // on a large solution, e.g. a 130-project Unity .sln via OmniSharp) are spawned + // in a background task and registered into this shared translator once ready. + // Blocking the MCP handshake on LSP init makes slow servers exceed the client's + // initialize-request timeout (Claude Code: ~60s) -> "Request timed out". let translator = Arc::new(Mutex::new(translator)); let subscriptions = Arc::new(ResourceSubscriptions::new()); // Peer cell is populated after the MCP transport is established (Phase B). @@ -352,19 +333,21 @@ pub async fn serve_with(config: ServerConfig, transport: Transport) -> Result<() // Cancellation for pump tasks: send `true` to request shutdown. let (cancel_tx, cancel_rx) = tokio::sync::watch::channel(false); - let mut pumps: JoinSet<()> = JoinSet::new(); - - // Phase A: start pump tasks before MCP serve so no notifications are dropped - // while the transport is being established. - for (lang, rx) in notification_receivers { - pumps.spawn(diagnostics_pump( - lang, - rx, + + if applicable_configs.is_empty() { + warn!("No applicable LSP servers configured — starting in protocol-only mode"); + } else { + info!( + "Spawning {} LSP server(s) in the background...", + applicable_configs.len() + ); + spawn_lsp_servers_background( + applicable_configs, Arc::clone(&translator), Arc::clone(&subscriptions), Arc::clone(&peer_cell), cancel_rx.clone(), - )); + ); } info!("Starting MCP server with rmcp..."); @@ -380,14 +363,86 @@ pub async fn serve_with(config: ServerConfig, transport: Transport) -> Result<() Transport::Http(cfg) => run_http(mcp_server, cfg).await, }; - // Signal pump tasks to exit and wait for them. + // Signal background pump tasks to exit. let _ = cancel_tx.send(true); - while pumps.join_next().await.is_some() {} info!("MCPLS server shutting down"); result } +/// Spawn the applicable LSP servers in a background task and register them into +/// the shared `translator` once ready. +/// +/// This intentionally does NOT block the caller: `serve_with` starts the MCP +/// server immediately so its `initialize` handshake returns before slow language +/// servers (e.g. `OmniSharp` on a large Unity solution, which can take minutes to +/// load) finish initializing. Tool calls that arrive before a server has +/// registered return a `ServerInitializing` error telling the caller to wait and +/// retry. If every server fails, the "expected languages" set is cleared so those +/// calls fall back to a plain "no server configured" error instead. +fn spawn_lsp_servers_background( + applicable_configs: Vec, + translator: Arc>, + subscriptions: Arc, + peer_cell: Arc>>, + cancel_rx: tokio::sync::watch::Receiver, +) { + tokio::spawn(async move { + let result = LspServer::spawn_batch(&applicable_configs).await; + + if result.all_failed() { + error!( + "All {} configured LSP server(s) failed to initialize", + result.failure_count() + ); + for failure in &result.failures { + error!("Server initialization failed: {}", failure); + } + // No server will register; stop reporting "still initializing". + translator.lock().await.clear_expected_languages(); + return; + } + + if result.partial_success() { + warn!( + "Partial server initialization: {} succeeded, {} failed", + result.server_count(), + result.failure_count() + ); + for failure in &result.failures { + error!("Server initialization failed: {}", failure); + } + } + + let server_count = result.server_count(); + let notification_receivers = { + let mut t = translator.lock().await; + let receivers = register_servers(result, &mut t); + // Background initialization has completed; stop reporting "still + // initializing" (especially for languages whose server failed to + // spawn on partial success, which would otherwise return + // ServerInitializing forever instead of NoServerForLanguage). + t.clear_expected_languages(); + receivers + }; + info!("Proceeding with {} LSP server(s)", server_count); + + // Start diagnostics pump tasks now that servers are registered. + let mut pumps: JoinSet<()> = JoinSet::new(); + for (lang, rx) in notification_receivers { + pumps.spawn(diagnostics_pump( + lang, + rx, + Arc::clone(&translator), + Arc::clone(&subscriptions), + Arc::clone(&peer_cell), + cancel_rx.clone(), + )); + } + while pumps.join_next().await.is_some() {} + }); +} + #[cfg(test)] #[allow(clippy::unwrap_used)] mod tests { @@ -644,10 +699,18 @@ mod tests { } #[tokio::test] - async fn test_serve_fails_with_no_servers_available() { + async fn test_serve_degrades_when_all_servers_fail_to_spawn() { use crate::config::{LspServerConfig, WorkspaceConfig}; - // Create a config with an invalid server (guaranteed to fail) + // A configured server whose command cannot spawn used to make serve() + // fail synchronously with NoServersAvailable / AllServersFailedToInit. + // LSP initialization now runs in a background task so the MCP + // `initialize` handshake is never blocked, which means the spawn + // failure is handled in the background instead: serve() starts the MCP + // server in degraded mode (mirroring `test_serve_starts_with_empty_config`) + // rather than failing fast. Any error it surfaces must therefore be a + // transport/MCP error from the closed test connection, NOT a fail-fast + // server-availability error. let config = ServerConfig { workspace: WorkspaceConfig { roots: vec![PathBuf::from("/tmp/test-workspace")], @@ -667,18 +730,25 @@ mod tests { }], }; - let result = serve(config).await; - - assert!(result.is_err()); - let err = result.unwrap_err(); - - // The serve function should now return NoServersAvailable error - // because all servers failed, but has_servers() returned false - assert!( - matches!(err, Error::NoServersAvailable(_)) - || matches!(err, Error::AllServersFailedToInit { .. }), - "Expected NoServersAvailable or AllServersFailedToInit error, got: {err:?}" - ); + // serve() proceeds to run the MCP server and blocks on the stdio + // transport until EOF; bound it so the test can't hang if stdin stays + // open (e.g. under multi-threaded `cargo test`, where several serve() + // tests share the process stdin). + let outcome = + tokio::time::timeout(std::time::Duration::from_secs(2), serve(config)).await; + + match outcome { + // Still serving after the deadline => it did not fail fast. Good. + Err(_elapsed) => {} + // Transport closed cleanly. Also fine. + Ok(Ok(())) => {} + // It returned an error: it must not be a fail-fast availability error. + Ok(Err(err)) => assert!( + !matches!(err, Error::NoServersAvailable(_)) + && !matches!(err, Error::AllServersFailedToInit { .. }), + "serve() must not fail fast now that LSP init is backgrounded; got: {err:?}" + ), + } } #[tokio::test] diff --git a/crates/mcpls-core/src/lsp/lifecycle.rs b/crates/mcpls-core/src/lsp/lifecycle.rs index 862fc02..1bb4626 100644 --- a/crates/mcpls-core/src/lsp/lifecycle.rs +++ b/crates/mcpls-core/src/lsp/lifecycle.rs @@ -382,8 +382,15 @@ impl LspServer { ..Default::default() }; + // Use the server's configured timeout for the initialize handshake too, + // not a hardcoded 30s: large solutions (e.g. a 130-project Unity .sln via + // OmniSharp) take minutes to respond to `initialize`. let result: InitializeResult = client - .request("initialize", params, Duration::from_secs(30)) + .request( + "initialize", + params, + Duration::from_secs(config.server_config.timeout_seconds), + ) .await .map_err(|e| Error::LspInitFailed { message: format!("Initialize request failed: {e}"), diff --git a/crates/mcpls-core/src/lsp/transport.rs b/crates/mcpls-core/src/lsp/transport.rs index bbb3afa..a9273aa 100644 --- a/crates/mcpls-core/src/lsp/transport.rs +++ b/crates/mcpls-core/src/lsp/transport.rs @@ -13,7 +13,7 @@ use std::collections::HashMap; use serde_json::Value; use tokio::io::{AsyncBufReadExt, AsyncReadExt, AsyncWriteExt, BufReader}; use tokio::process::{ChildStdin, ChildStdout}; -use tracing::{trace, warn}; +use tracing::{debug, trace, warn}; use crate::error::{Error, Result}; use crate::lsp::types::{InboundMessage, JsonRpcNotification, JsonRpcRequest, JsonRpcResponse}; @@ -84,27 +84,42 @@ impl LspTransport { /// - JSON parsing fails /// - Message format is invalid pub async fn receive(&mut self) -> Result { - let headers = self.read_headers().await?; - - let content_length = headers - .get("content-length") - .ok_or_else(|| Error::LspProtocolError("Missing Content-Length header".to_string()))? - .parse::() - .map_err(|e| Error::LspProtocolError(format!("Invalid Content-Length: {e}")))?; - - if content_length > MAX_CONTENT_LENGTH { - return Err(Error::LspProtocolError(format!( - "Content-Length {content_length} exceeds maximum allowed size of {MAX_CONTENT_LENGTH} bytes" - ))); - } + loop { + let headers = self.read_headers().await?; + + let content_length = headers + .get("content-length") + .ok_or_else(|| { + Error::LspProtocolError("Missing Content-Length header".to_string()) + })? + .parse::() + .map_err(|e| Error::LspProtocolError(format!("Invalid Content-Length: {e}")))?; + + if content_length > MAX_CONTENT_LENGTH { + return Err(Error::LspProtocolError(format!( + "Content-Length {content_length} exceeds maximum allowed size of {MAX_CONTENT_LENGTH} bytes" + ))); + } - let content = self.read_content(content_length).await?; + let content = self.read_content(content_length).await?; - trace!("Received LSP message: {}", content); + trace!("Received LSP message: {}", content); - let value: Value = serde_json::from_str(&content)?; + let value: Value = serde_json::from_str(&content)?; - parse_inbound_message(value) + // Some servers (notably OmniSharp) occasionally emit a bare `null` + // (or other non-object) JSON-RPC message. Skip it and read the next + // framed message instead of killing the whole message loop. + if !value.is_object() { + // Some servers (notably OmniSharp) emit a burst of these during + // startup; log at debug to avoid flooding the logs for what is a + // recoverable, expected condition. + debug!("Skipping non-object LSP message: {}", value); + continue; + } + + return parse_inbound_message(value); + } } /// Read headers until blank line. diff --git a/docs/user-guide/configuration.md b/docs/user-guide/configuration.md index 26e5fe9..ce29d45 100644 --- a/docs/user-guide/configuration.md +++ b/docs/user-guide/configuration.md @@ -234,7 +234,10 @@ Glob pattern syntax: **Type**: Integer **Default**: `30` -Timeout in seconds for LSP server operations. +Timeout in seconds for LSP server operations, including the initial `initialize` +handshake. Servers that load a large project before answering `initialize` +(e.g. OmniSharp on a big Unity/C# solution) need this raised - the default 30 s +can otherwise cut the server off mid-initialization. ```toml [[lsp_servers]] diff --git a/docs/user-guide/troubleshooting.md b/docs/user-guide/troubleshooting.md index eabe6a0..2f6efa5 100644 --- a/docs/user-guide/troubleshooting.md +++ b/docs/user-guide/troubleshooting.md @@ -233,6 +233,8 @@ mcpls --log-level debug roots = ["/Users/username/current-project"] ``` +**Note**: `timeout_seconds` also bounds the initial `initialize` handshake, so raising it (Solution 1) is what helps a server that needs minutes to load a large solution. While a configured server is still initializing, tool calls for that language return a "server is still initializing - wait and retry" message (loading a large solution may take a few minutes) rather than a hard "no server configured" error. + ### "rust-analyzer indexing takes forever" **Problem**: Large codebase with many dependencies