quiet-node
diff --git a/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/configurations.md‎
Lines changed: 6 additions & 8 deletions b/‎docs/configurations.md‎
Lines changed: 6 additions & 8 deletions
diff --git a/‎src-tauri/src/commands.rs‎
Lines changed: 261 additions & 0 deletions b/‎src-tauri/src/commands.rs‎
Lines changed: 261 additions & 0 deletions
@@ -7,8 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## Unreleased
 
+### Added
+
+- **Unified trace recorder.** Records every chat conversation and `/search` session as JSON-Lines under `app_data_dir/traces/{chat,search}/<conversation_id>.jsonl`. Off by default; toggle from Settings or set `[debug] trace_enabled = true` in `config.toml`.
+
 ### Changed
 
+- **BREAKING**: Renamed `[debug] search_trace_enabled` to `trace_enabled` (now covers both chat and search). Rename the field in your `config.toml` after upgrading. Trace file layout also changed to `traces/{chat,search}/<conversation_id>.jsonl`.
+- The `ask_ollama`, `search_pipeline`, and `capture_full_screen_command` Tauri commands now require a `conversationId: String` argument (and `ask_ollama` additionally requires `isFirstTurn: bool` and `slashCommand: Option<String>`). The frontend's `useOllama` hook generates a stable trace id per session and threads it transparently. External callers that invoked these commands directly must update their `invoke()` calls. A new fire-and-forget `record_conversation_end` command lets the frontend signal end-of-conversation (used by `useOllama.reset()` and `useOllama.loadMessages()`) so the chat-domain trace file gets a clean closing line.
 - **BREAKING**: Renamed the `[model]` section in `config.toml` to `[inference]`. The section still contains a single field, `ollama_url`, but the name now reflects what it actually configures (the inference daemon endpoint, not a model). There is no backward-compatibility shim: if you had a custom `[model]` section, rename it to `[inference]` after upgrading.
 - Active model selection is now strictly Option-typed end to end. Ollama's `/api/tags` is the single source of truth: when nothing is installed and nothing is persisted, Thuki refuses to dispatch requests and surfaces a "Pick a model" prompt instead of falling back to a hardcoded slug. The previous `DEFAULT_MODEL_NAME` constant has been removed.
 
 
@@ -75,10 +75,8 @@ judge_timeout_s = 30
 router_timeout_s = 45
 
 [debug]
-# When true, writes a forensic JSON-Lines trace file for every /search turn to
-# ~/Library/Application Support/com.quietnode.thuki/traces/.
-# Also toggleable from the Settings panel (Web tab, Diagnostics section).
-search_trace_enabled = false
+# Records every chat conversation and /search session to disk for later inspection.
+trace_enabled = false
 ```
 
 ## Reading the reference tables
@@ -180,11 +178,11 @@ For security, both URLs default to your local machine (`127.0.0.1`) and should s
 
 ### `[debug]`
 
-Diagnostics toggles. `search_trace_enabled` is exposed in the Settings panel (Web tab, Diagnostics section) so you can flip it without editing `config.toml`.
+Records every chat conversation and `/search` session as JSON-Lines under `app_data_dir/traces/{chat,search}/<conversation_id>.jsonl`. Off by default; toggleable from Settings. Trace files stay on your disk and are never uploaded.
 
-| Field                  | Default | Tunable? | Why not tunable | Bounds | Description |
-| :--------------------- | :------ | :------- | :-------------- | :----- | :---------- |
-| `search_trace_enabled` | `false` | Yes      | —               | —      | When on, Thuki writes a forensic JSON-Lines trace file for every `/search` turn to `~/Library/Application Support/com.quietnode.thuki/traces/`. Each file records every query sent to SearXNG, every page the reader fetched, and every AI decision in that turn. Useful for diagnosing why a search went wrong; leave off for normal use. |
+| Field           | Default | Tunable? | Why not tunable | Bounds | Description                                                                  |
+| :-------------- | :------ | :------- | :-------------- | :----- | :--------------------------------------------------------------------------- |
+| `trace_enabled` | `false` | Yes      | —               | —      | Records every chat conversation and `/search` session to disk for debugging. |
 
 ### `[activation]` (not in TOML)
 
 
@@ -209,6 +209,13 @@ pub enum StreamChunk {
     Cancelled,
     /// A structured, user-friendly error occurred during processing.
     Error(OllamaError),
+    /// Emitted exactly once per turn, after the backend has cleared every
+    /// pre-`ConversationStart` gate (no-model bail, model lookup, etc.) and
+    /// committed to opening the trace for this `conversation_id`. Carries
+    /// no payload; the frontend uses it as the unambiguous signal to
+    /// retire its `is_first_turn` flag without relying on token-arrival
+    /// ordering. Does not appear in the trace itself.
+    TurnAccepted,
 }
 
 /// A single message in the Ollama `/api/chat` conversation format.
@@ -463,10 +470,65 @@ pub async fn stream_ollama_chat(
     accumulated
 }
 
+/// Mirrors a streaming chunk into the chat-domain trace recorder. Pulled out
+/// of [`ask_ollama`] so the per-token routing logic and the token-count
+/// increment are exercised by the unit-test suite rather than the
+/// coverage-off Tauri command body. `Done`, `Cancelled`, and `Error` chunks
+/// are intentionally noops here: those terminal events are summarized by
+/// `AssistantComplete` after the stream returns.
+pub(crate) fn record_chunk_to_trace(
+    chunk: &StreamChunk,
+    recorder: &std::sync::Arc<crate::trace::BoundRecorder>,
+    token_count: &AtomicU64,
+) {
+    match chunk {
+        StreamChunk::Token(text) => {
+            token_count.fetch_add(1, Ordering::Relaxed);
+            recorder.record(crate::trace::RecorderEvent::AssistantTokens {
+                chunk: text.clone(),
+            });
+        }
+        StreamChunk::ThinkingToken(text) => {
+            recorder.record(crate::trace::RecorderEvent::AssistantThinking {
+                chunk: text.clone(),
+            });
+        }
+        StreamChunk::Done
+        | StreamChunk::Cancelled
+        | StreamChunk::Error(_)
+        | StreamChunk::TurnAccepted => {}
+    }
+}
+
+/// Emits `ConversationStart` to the trace recorder iff this is the first
+/// turn of the conversation. Pulled out of [`ask_ollama`] and the search
+/// pipeline so the gate is covered by tests instead of the coverage-off
+/// Tauri command body.
+pub(crate) fn record_conversation_start_if_first_turn(
+    recorder: &std::sync::Arc<crate::trace::BoundRecorder>,
+    is_first_turn: bool,
+    model: String,
+    system_prompt: String,
+) {
+    if is_first_turn {
+        recorder.record(crate::trace::RecorderEvent::ConversationStart {
+            model,
+            system_prompt,
+        });
+    }
+}
+
 /// Streams a chat response from the local Ollama backend. Appends the user
 /// message and assistant response to conversation history after completion
 /// or cancellation (retaining context for follow-up requests). Uses an epoch
 /// counter to prevent stale writes after a reset.
+///
+/// `conversation_id` flows from the frontend (`useConversationHistory.ts`).
+/// `is_first_turn` lets the frontend tell the backend "emit
+/// `ConversationStart` before this turn's `UserMessage`" without the backend
+/// needing to track per-conversation state. Both feed the unified trace
+/// recorder when `[debug] trace_enabled = true`; off by default they collapse
+/// to noop calls.
 #[cfg_attr(coverage_nightly, coverage(off))]
 #[cfg_attr(not(coverage), tauri::command)]
 #[allow(clippy::too_many_arguments)]
@@ -475,13 +537,17 @@ pub async fn ask_ollama(
     quoted_text: Option<String>,
     image_paths: Option<Vec<String>>,
     think: bool,
+    conversation_id: String,
+    is_first_turn: bool,
+    slash_command: Option<String>,
     on_event: Channel<StreamChunk>,
     client: State<'_, reqwest::Client>,
     generation: State<'_, GenerationState>,
     history: State<'_, ConversationHistory>,
     config: State<'_, parking_lot::RwLock<AppConfig>>,
     active_model: State<'_, crate::models::ActiveModelState>,
     capabilities_cache: State<'_, ModelCapabilitiesCache>,
+    trace_recorder: State<'_, std::sync::Arc<crate::trace::LiveTraceRecorder>>,
 ) -> Result<(), String> {
     // Snapshot the config once so all downstream reads (endpoint, prompt, model)
     // see a consistent view even if the user edits Settings mid-stream.
@@ -507,6 +573,35 @@ pub async fn ask_ollama(
     let cancel_token = CancellationToken::new();
     generation.set_token(cancel_token.clone());
 
+    // Bind the trace recorder to this conversation. When tracing is on,
+    // every event for this turn flows to
+    // `traces/chat/<conversation_id>.jsonl` via the registry. When off,
+    // each `record()` is a constant-time noop. The bound recorder is
+    // cheap to clone and is captured by the streaming-pump closure so
+    // per-token emits skip the registry lookup on the hot path.
+    let live: std::sync::Arc<crate::trace::LiveTraceRecorder> =
+        std::sync::Arc::clone(trace_recorder.inner());
+    let live_inner: std::sync::Arc<dyn crate::trace::TraceRecorder> = live;
+    let bound_recorder = std::sync::Arc::new(crate::trace::BoundRecorder::new(
+        live_inner,
+        crate::trace::ConversationId::new(conversation_id),
+    ));
+
+    // Emit ConversationStart at the moment we know the model + resolved
+    // system prompt. The frontend's `is_first_turn` flag prevents this
+    // event from firing on subsequent turns of the same conversation.
+    record_conversation_start_if_first_turn(
+        &bound_recorder,
+        is_first_turn,
+        model_name.clone(),
+        config.prompt.resolved_system.clone(),
+    );
+    // Tell the frontend the trace was opened for this conversation_id.
+    // Sent unconditionally (regardless of `is_first_turn`) so the hook
+    // can retire its flag the moment ANY turn lands, even if a previous
+    // first-turn attempt was cancelled before any token arrived.
+    let _ = on_event.send(StreamChunk::TurnAccepted);
+
     // Build user message content.  When quoted text is present, label it
     // explicitly so the model knows the highlighted text is the primary
     // subject and any attached images provide surrounding context.
@@ -517,6 +612,16 @@ pub async fn ask_ollama(
         _ => message,
     };
 
+    // Emit UserMessage before any image base64 work, so the trace
+    // captures the user's intent even if encoding fails. Image paths
+    // are recorded as strings (matching the IPC contract); image bytes
+    // never enter the JSONL.
+    bound_recorder.record(crate::trace::RecorderEvent::UserMessage {
+        content: content.clone(),
+        attached_images: image_paths.clone().unwrap_or_default(),
+        slash_command: slash_command.clone(),
+    });
+
     // Base64-encode attached images for the Ollama multimodal API.
     let images = match image_paths {
         Some(ref paths) if !paths.is_empty() => {
@@ -580,6 +685,14 @@ pub async fn ask_ollama(
         ))
     };
 
+    let stream_started_ms = std::time::SystemTime::now()
+        .duration_since(std::time::UNIX_EPOCH)
+        .map(|d| d.as_millis() as u64)
+        .unwrap_or(0);
+    let token_count_atomic = std::sync::Arc::new(AtomicU64::new(0));
+    let token_count_for_pump = std::sync::Arc::clone(&token_count_atomic);
+    let recorder_for_pump = std::sync::Arc::clone(&bound_recorder);
+
     let accumulated = stream_ollama_chat(
         OllamaChatParams {
             endpoint,
@@ -592,11 +705,25 @@ pub async fn ask_ollama(
         &client,
         cancel_token.clone(),
         |chunk| {
+            // Mirror the user-visible chunk into the trace before
+            // forwarding it to the frontend. Token / ThinkingToken
+            // chunks land as discrete trace events; terminal chunks are
+            // summarized below by `AssistantComplete`.
+            record_chunk_to_trace(&chunk, &recorder_for_pump, &token_count_for_pump);
             let _ = on_event.send(chunk);
         },
     )
     .await;
 
+    let stream_ended_ms = std::time::SystemTime::now()
+        .duration_since(std::time::UNIX_EPOCH)
+        .map(|d| d.as_millis() as u64)
+        .unwrap_or(0);
+    bound_recorder.record(crate::trace::RecorderEvent::AssistantComplete {
+        total_tokens: token_count_atomic.load(Ordering::Relaxed),
+        latency_ms: stream_ended_ms.saturating_sub(stream_started_ms),
+    });
+
     // Persist user + assistant messages to in-memory history when the epoch
     // has not changed (no reset during streaming) and we received content.
     // This includes cancelled generations so that subsequent requests retain
@@ -658,6 +785,34 @@ pub fn reset_conversation(history: State<'_, ConversationHistory>) {
     history.messages.lock().unwrap().clear();
 }
 
+/// Frontend-driven `ConversationEnd` emission.
+///
+/// The chat-domain trace lifecycle is owned by the frontend because
+/// Thuki's window-close intercept hides instead of quits, and the same
+/// conversation can resume on the next hotkey activation. Emitting
+/// `ConversationEnd` from the backend on window-hide would falsely mark
+/// every still-open conversation ended on every dismiss. The frontend
+/// invokes this command exactly when the user-perceived conversation
+/// terminates: clicking "New conversation", loading a different
+/// conversation from history, or quitting from the tray.
+///
+/// The command is a thin trace-only signal; it does NOT mutate
+/// `ConversationHistory` (that is `reset_conversation`'s job) and does
+/// NOT touch the SQLite-backed history UI.
+#[cfg_attr(coverage_nightly, coverage(off))]
+#[cfg_attr(not(coverage), tauri::command)]
+pub fn record_conversation_end(
+    conversation_id: String,
+    reason: String,
+    trace_recorder: State<'_, std::sync::Arc<crate::trace::LiveTraceRecorder>>,
+) {
+    use crate::trace::TraceRecorder;
+    trace_recorder.record(
+        &crate::trace::ConversationId::new(conversation_id),
+        crate::trace::RecorderEvent::ConversationEnd { reason },
+    );
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -2122,4 +2277,110 @@ mod tests {
         assert_eq!(err.kind, OllamaErrorKind::ModelNotFound);
         assert!(!err.message.contains("picker chip"));
     }
+
+    // ─── Trace orchestration helpers ────────────────────────────────────────
+
+    /// Builds a `BoundRecorder` over a `MockRecorder` so each helper test
+    /// can inspect what got recorded without going through the file system.
+    fn mock_bound_recorder(
+        conv_id: &str,
+    ) -> (
+        Arc<crate::trace::BoundRecorder>,
+        Arc<crate::trace::recorder::MockRecorder>,
+    ) {
+        let mock = Arc::new(crate::trace::recorder::MockRecorder::new());
+        let inner: Arc<dyn crate::trace::TraceRecorder> = mock.clone();
+        let bound = Arc::new(crate::trace::BoundRecorder::new(
+            inner,
+            crate::trace::ConversationId::new(conv_id),
+        ));
+        (bound, mock)
+    }
+
+    #[test]
+    fn record_chunk_to_trace_emits_assistant_tokens_and_increments_count() {
+        let (bound, mock) = mock_bound_recorder("conv-token");
+        let counter = AtomicU64::new(0);
+        record_chunk_to_trace(&StreamChunk::Token("hi".to_string()), &bound, &counter);
+        record_chunk_to_trace(&StreamChunk::Token(" there".to_string()), &bound, &counter);
+        assert_eq!(counter.load(Ordering::Relaxed), 2);
+        let snapshot = mock.snapshot();
+        assert_eq!(snapshot.len(), 2);
+        for (id, _) in &snapshot {
+            assert_eq!(id.as_str(), "conv-token");
+        }
+        assert!(matches!(
+            snapshot[0].1,
+            crate::trace::RecorderEvent::AssistantTokens { ref chunk } if chunk == "hi"
+        ));
+        assert!(matches!(
+            snapshot[1].1,
+            crate::trace::RecorderEvent::AssistantTokens { ref chunk } if chunk == " there"
+        ));
+    }
+
+    #[test]
+    fn record_chunk_to_trace_emits_assistant_thinking_without_increment() {
+        let (bound, mock) = mock_bound_recorder("conv-think");
+        let counter = AtomicU64::new(0);
+        record_chunk_to_trace(
+            &StreamChunk::ThinkingToken("planning".to_string()),
+            &bound,
+            &counter,
+        );
+        assert_eq!(counter.load(Ordering::Relaxed), 0);
+        let snapshot = mock.snapshot();
+        assert_eq!(snapshot.len(), 1);
+        assert!(matches!(
+            snapshot[0].1,
+            crate::trace::RecorderEvent::AssistantThinking { ref chunk } if chunk == "planning"
+        ));
+    }
+
+    #[test]
+    fn record_chunk_to_trace_skips_terminal_chunks() {
+        let (bound, mock) = mock_bound_recorder("conv-term");
+        let counter = AtomicU64::new(0);
+        record_chunk_to_trace(&StreamChunk::Done, &bound, &counter);
+        record_chunk_to_trace(&StreamChunk::Cancelled, &bound, &counter);
+        record_chunk_to_trace(
+            &StreamChunk::Error(no_model_selected_error()),
+            &bound,
+            &counter,
+        );
+        assert_eq!(counter.load(Ordering::Relaxed), 0);
+        assert_eq!(mock.snapshot().len(), 0);
+    }
+
+    #[test]
+    fn record_conversation_start_if_first_turn_emits_when_true() {
+        let (bound, mock) = mock_bound_recorder("conv-start");
+        record_conversation_start_if_first_turn(
+            &bound,
+            true,
+            "model-a".to_string(),
+            "you are helpful".to_string(),
+        );
+        let snapshot = mock.snapshot();
+        assert_eq!(snapshot.len(), 1);
+        assert!(matches!(
+            snapshot[0].1,
+            crate::trace::RecorderEvent::ConversationStart {
+                ref model,
+                ref system_prompt,
+            } if model == "model-a" && system_prompt == "you are helpful"
+        ));
+    }
+
+    #[test]
+    fn record_conversation_start_if_first_turn_skips_when_false() {
+        let (bound, mock) = mock_bound_recorder("conv-skip");
+        record_conversation_start_if_first_turn(
+            &bound,
+            false,
+            "model-a".to_string(),
+            "ignored".to_string(),
+        );
+        assert_eq!(mock.snapshot().len(), 0);
+    }
 }