feat: add model health monitoring and trace context logging by vlordier · Pull Request #58 · Liquid4All/cookbook

vlordier · 2026-03-06T20:01:01Z

Add ModelStatus struct, get_status method, and trace_id logging

1. Model Health Check + Auto-Recovery: - Add ModelStatus struct with key, display_name, base_url, healthy, model_name, error - Add InferenceClient::get_status() method for detailed health status - Health check hits /models endpoint to verify connectivity 2. Parallel MCP Server Startup: - Already implemented in lifecycle.rs using tokio::spawn - All servers start concurrently 3. Structured Logging with Trace Context: - Add trace_id (8-char UUID) to each send_message request - Log trace_id at start of message processing - Enables correlation across logs for debugging Testing: 365 tests pass, clippy clean

Copilot

Pull request overview

Adds health/status reporting for the currently selected inference model and introduces per-request trace ID logging to improve observability in the LocalCowork Tauri app.

Changes:

Adds a ModelStatus struct for serializable model health/status reporting.
Introduces InferenceClient::get_status() to return endpoint + health details.
Logs a generated trace_id at the start of send_message() for log correlation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
examples/localcowork/src-tauri/src/inference/types.rs	Adds `ModelStatus` type (serde-serializable) for health monitoring consumers.
examples/localcowork/src-tauri/src/inference/client.rs	Adds `get_status()` API and extends tests with an LM Studio model config case.
examples/localcowork/src-tauri/src/commands/chat.rs	Generates and logs a `trace_id` at the beginning of message processing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T20:05:38Z

+    pub async fn get_status(&self) -> super::types::ModelStatus {
+        let url = format!("{}/models", self.current_model.base_url);
+        match self.http.get(&url).timeout(CONNECT_TIMEOUT).send().await {
+            Ok(resp) if resp.status().is_success() => super::types::ModelStatus {


This uses CONNECT_TIMEOUT (documented as a TCP connect timeout) as a per-request total timeout via .timeout(CONNECT_TIMEOUT). That’s confusing and easy to misuse because reqwest::RequestBuilder::timeout applies to the entire request/response. Consider introducing a dedicated short HEALTH_CHECK_TIMEOUT (or similar) for health/status calls, or updating the constant name/doc to match its actual use here.

Copilot · 2026-03-06T20:05:39Z

+    let trace_id = uuid::Uuid::new_v4().to_string()[..8].to_string();
+
+    tracing::info!(trace_id = %trace_id, session_id = %session_id, content_len = content.len(), "starting message processing");
+


trace_id is generated but only attached to a single log record. If the goal is request-level correlation across subsequent logs, consider creating a tracing span that includes trace_id (and entering it for the rest of the function) so downstream tracing::info!/debug! calls inherit the same context. Also, truncating to 8 hex chars (32 bits) increases the chance of collisions in logs; using the full UUID or a longer prefix would reduce that risk.

Suggested change

let trace_id = uuid::Uuid::new_v4().to_string()[..8].to_string();

tracing::info!(trace_id = %trace_id, session_id = %session_id, content_len = content.len(), "starting message processing");

let trace_id = Uuid::new_v4().to_string();

// Create a span so all subsequent logs inherit the same trace_id/session_id context.

let span = tracing::info_span!("send_message", trace_id = %trace_id, session_id = %session_id);

let _span_guard = span.enter();

tracing::info!(content_len = content.len(), "starting message processing");

Copilot · 2026-03-06T20:05:39Z

+            Ok(resp) => super::types::ModelStatus {
+                key: self.current_model_key.clone(),
+                display_name: self.current_model.display_name.clone(),
+                base_url: self.current_model.base_url.clone(),
+                healthy: false,
+                model_name: None,
+                error: Some(format!("HTTP {}", resp.status())),


get_status sets model_name: None for non-2xx and transport errors, even though the client already knows the configured model name/key. This makes the status payload less useful for UIs/health dashboards because you lose which model is unhealthy. Consider always populating model_name (e.g., from current_model.model_name falling back to current_model_key) and reserve error/healthy for the health signal.

- Add 3 attempts with exponential backoff to health_check() - Improves reliability of PR Liquid4All#58 health monitoring by handling transient failures This helps avoid false negatives when model server is temporarily slow.

Paulescu · 2026-05-19T11:50:53Z

Closing this one. The three changes (ModelStatus, get_status, trace_id) each land as a primitive with no consumer, so nothing in the app actually exercises them.

get_status is dead code. No Tauri command, chat handler, or orchestrator path calls it. ModelStatus is referenced only as the return type of the method that nobody invokes. For a "health monitoring" feature, the integration is the point — a Tauri command like get_model_status that the frontend calls, ideally with a polling story or an event when health changes.

trace_id is logged once and never propagated. send_message generates a trace_id and emits a single tracing::info! with it, then the rest of the function logs without the field. There's nothing to correlate. The idiomatic shape is either #[tracing::instrument(fields(trace_id))] on send_message or let _span = tracing::info_span!("send_message", trace_id = %trace_id).entered(); at the top of the body — both attach the field to every event in the span automatically.

Hardcodes /models instead of using runtime.health_check. _models/config.yaml declares per-runtime health endpoints (Ollama: /api/tags, llama_cpp: /health, lmstudio: /v1/models). get_status unconditionally hits {base_url}/models, which 404s on Ollama. Read the configured health_check URL for the runtime, fall back to /v1/models only if absent.

Smuggled content from #53. The lmstudio-model test fixture and test_lmstudio_model_config test in inference/client.rs are the same lines added by #53. Not blocking on their own, but the partial overlap will conflict on rebase. Once #53 is resolved, drop those hunks from this branch.

Smaller nits:

uuid::Uuid::new_v4().to_string()[..8].to_string() is fragile and gives 32 bits of entropy. If short IDs are wanted, use a counter or nanoid. The [..8] is ASCII-safe only because of UUID's hex format.
The multi-line assert! reformatting in types.rs is unrelated rustfmt churn.
PR description is one sentence with no motivation or integration plan.

If you want to resubmit, the rule of thumb is: ship the primitive and its first consumer in the same PR. Add get_status, expose it as get_model_status Tauri command, and wire the frontend (or at least an integration point) that calls it. Add trace_id propagation via a span on the request handler. With those, this becomes a reviewable feature.

Copilot AI review requested due to automatic review settings March 6, 2026 20:01

Copilot started reviewing on behalf of vlordier March 6, 2026 20:01 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

vlordier mentioned this pull request Mar 6, 2026

feat: add retry logic to health check #69

Closed

Paulescu closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add model health monitoring and trace context logging#58

feat: add model health monitoring and trace context logging#58
vlordier wants to merge 1 commit into
Liquid4All:mainfrom
vlordier:feat/health-monitoring-trace-context

vlordier commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Paulescu commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		let trace_id = uuid::Uuid::new_v4().to_string()[..8].to_string();

		tracing::info!(trace_id = %trace_id, session_id = %session_id, content_len = content.len(), "starting message processing");

-    let trace_id = uuid::Uuid::new_v4().to_string()[..8].to_string();
-    tracing::info!(trace_id = %trace_id, session_id = %session_id, content_len = content.len(), "starting message processing");
+    let trace_id = Uuid::new_v4().to_string();
+    // Create a span so all subsequent logs inherit the same trace_id/session_id context.
+    let span = tracing::info_span!("send_message", trace_id = %trace_id, session_id = %session_id);
+    let _span_guard = span.enter();
+    tracing::info!(content_len = content.len(), "starting message processing");

Conversation

vlordier commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Paulescu commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants