UN-3431 [FIX] Restore log_events_id on tool-run dispatch and persist without UI subscriber (#1960)

chandrasekharan-zipstack · claude · vishnuszipstack · web-flow · commit a65d01750e6d · 2026-05-13T14:39:15.000+05:30
* UN-3431 [FIX] Restore log_events_id on tool-run dispatch and persist without UI subscriber

structure_tool_task lost log_events_id from both the agentic_table and
structure_pipeline ExecutionContexts during the agentic_table refactor;
the executor shim therefore received an empty log_events_id and bailed
before publishing anything. Tool-run lines stopped reaching the workflow
logs UI for every dispatch through these paths.

Two changes:
- structure_tool_task: thread log_events_id into both ExecutionContexts.
- executor_tool_shim.stream_log: gate only the PROGRESS path on
  log_events_id; the LOG payload now falls back to execution_id as the
  routing channel so logs persist to execution_log even when no
  websocket subscriber exists (API deployments).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* UN-3431 [MISC] Improve tool-run log narrative for workflow execution UI

Reshape the per-run shim.stream_log emissions so the workflow logs UI
reads as a per-phase narrative with one start, one end, and adapter
identity surfaced exactly once per unique adapter:

- Add a non-sensitive run-config preamble at the top of
  _handle_structure_pipeline: prompt count + single_pass / summarize /
  challenge flags. No prompt names or text are logged.
- Introduce ExecutorToolShim.log_adapter_once(kind, adapter_id, adapter)
  with a per-shim dedup set so "Using LLM/Embedding/Vector DB:
  `&lt;model&gt;`" appears at most once per unique adapter id. Used from
  _initialize_adapters, _handle_index, and the summarize path.
- Drop intermediate / redundant lines that did not add information
  on their own: "Initializing text extractor", "Using text extractor"
  (rolled into the start line), "Extracting text from document",
  "Saving extraction metadata", "Initialized embedding and vector DB
  adapters", "Indexing file", "Adding nodes to vector db".
- Collapse the index-status trio ("Document already indexed,
  re-indexing" + "Indexing document for the first time" + "Indexing
  document into vector store") into a single "Indexing document" /
  "Re-indexing document" line driven by doc_id_found.
- Gate "Retrieving context for" and "Retrieved N chunks via RAG for"
  on chunk_size &gt; 0 so single-pass / full-context paths do not emit
  a misleading retrieval line.
- Combine summarize start into one line that names the LLM model.
- Wrap dynamic identifiers (adapter labels, extractor class, prompt
  names) in backticks; drop trailing "..." across all stream_log
  emissions.
- Emit a final "Pipeline completed: N/M prompts answered" with a
  non-null count from structured_output[OUTPUT].

Pairs with the cloud-side log cleanup PR.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
Co-authored-by: vishnuszipstack &lt;117254672+vishnuszipstack@users.noreply.github.com&gt;
diff --git a/workers/executor/executor_tool_shim.py b/workers/executor/executor_tool_shim.py
@@ -93,6 +93,9 @@ def __init__(
         # silently swallowing every subsequent log line at DEBUG.
         self._progress_publish_failed = False
         self._log_publish_failed = False
+        # Adapters whose name+model has already been surfaced to the UI;
+        # later mentions skip repeating the model line.
+        self._adapters_logged: set[str] = set()
         # Initialize StreamMixin.  EXECUTION_BY_TOOL is not set in
         # the worker environment, so _exec_by_tool will be False.
         super().__init__(log_level=LogLevel.INFO)
@@ -161,41 +164,36 @@ def stream_log(
         if _levels.index(level) < _levels.index(self.log_level):
             return
 
-        # Publish progress to frontend via the log consumer queue.
-        if not self.log_events_id:
-            return
-
         wf_level = _SDK_TO_WF_LEVEL.get(level, "INFO")
 
-        # PROGRESS payload — IDE prompt-card live updates only. Dropped at
-        # the DB persist layer because LogPublisher.publish only stores
-        # payloads whose `type == "LOG"`.
-        try:
-            progress_payload = LogPublisher.log_progress(
-                component=self.component,
-                level=wf_level,
-                state=stage,
-                message=log,
-            )
-            LogPublisher.publish(
-                channel_id=self.log_events_id,
-                payload=progress_payload,
-            )
-        except Exception:
-            first_failure = not self._progress_publish_failed
-            self._progress_publish_failed = True
-            logger.log(
-                logging.WARNING if first_failure else logging.DEBUG,
-                "Failed to publish progress log (non-fatal)",
-                exc_info=first_failure,
-            )
-
-        # LOG payload — feeds workflow logs UI and persists to execution_log.
-        # LogDataDTO validation requires `execution_id` and `organization_id`;
-        # `file_execution_id` is optional.
+        # PROGRESS payload routes via the websocket room; skip when absent.
+        if self.log_events_id:
+            try:
+                progress_payload = LogPublisher.log_progress(
+                    component=self.component,
+                    level=wf_level,
+                    state=stage,
+                    message=log,
+                )
+                LogPublisher.publish(
+                    channel_id=self.log_events_id,
+                    payload=progress_payload,
+                )
+            except Exception:
+                first_failure = not self._progress_publish_failed
+                self._progress_publish_failed = True
+                logger.log(
+                    logging.WARNING if first_failure else logging.DEBUG,
+                    "Failed to publish progress log (non-fatal)",
+                    exc_info=first_failure,
+                )
+
+        # LOG payload persists to execution_log; falls back to execution_id
+        # as the routing channel so it survives without a websocket subscriber.
         if not (self.execution_id and self.organization_id):
             return
 
+        log_channel = self.log_events_id or self.execution_id
         try:
             log_payload = LogPublisher.log_workflow(
                 stage=stage,
@@ -206,7 +204,7 @@ def stream_log(
                 organization_id=self.organization_id,
             )
             LogPublisher.publish(
-                channel_id=self.log_events_id,
+                channel_id=log_channel,
                 payload=log_payload,
             )
         except Exception:
@@ -234,3 +232,26 @@ def stream_error_and_exit(self, message: str, err: Exception | None = None) -> N
         """
         logger.error(message)
         raise SdkError(message, actual_err=err)
+
+    def log_adapter_once(
+        self,
+        kind: str,
+        adapter_id: str,
+        adapter: Any,
+    ) -> None:
+        """Surface adapter identity to the UI on first use only.
+
+        ``kind`` is the human label ("LLM", "Embedding", "Vector DB").
+        ``adapter`` is an SDK instance — read only for non-sensitive
+        identity (model name or adapter display name); ``adapter_id``
+        is the dedup key. Subsequent calls for the same id are no-ops.
+        """
+        if not adapter_id or adapter_id in self._adapters_logged:
+            return
+        self._adapters_logged.add(adapter_id)
+        get_model = getattr(adapter, "get_model_name", None)
+        if callable(get_model):
+            label = get_model() or adapter_id
+        else:
+            label = getattr(adapter, "_adapter_name", "") or adapter_id
+        self.stream_log(f"Using {kind}: `{label}`")
diff --git a/workers/executor/executors/index.py b/workers/executor/executors/index.py
@@ -155,7 +155,6 @@ def perform_indexing(
         ):
             return doc_id
 
-        self.tool.stream_log("Indexing file...")
         full_text = [
             {
                 "section": "full",
@@ -171,7 +170,6 @@ def perform_indexing(
     def _trigger_indexing(self, vector_db: Any, documents: list) -> None:
         import openai
 
-        self.tool.stream_log("Adding nodes to vector db...")
         try:
             vector_db.index_document(
                 documents,
diff --git a/workers/executor/executors/legacy_executor.py b/workers/executor/executors/legacy_executor.py
@@ -13,6 +13,7 @@
 from executor.executor_tool_shim import ExecutorToolShim
 from executor.executors.constants import ExecutionSource
 from executor.executors.constants import IndexingConstants as IKeys
+from executor.executors.constants import PromptServiceConstants as PSKeys
 from executor.executors.dto import (
     ChunkingConfig,
     FileInfo,
@@ -242,15 +243,15 @@ def _handle_extract(self, context: ExecutionContext) -> ExecutionResult:
             Path(file_path).name,
             context.run_id,
         )
-        shim.stream_log("Initializing text extractor...")
-        shim.stream_log(f"Using text extractor: {type(x2text.x2text_instance).__name__}")
-
+        extractor_name = type(x2text.x2text_instance).__name__
         try:
-            shim.stream_log("Extracting text from document...")
+            shim.stream_log(
+                f"Extracting text using `{extractor_name}`"
+                + (" (with highlight)" if enable_highlight else "")
+            )
             if enable_highlight and isinstance(
                 x2text.x2text_instance, (LLMWhisperer, LLMWhispererV2)
             ):
-                shim.stream_log("Extracting text with highlight support enabled...")
                 process_response: TextExtractionResult = x2text.process(
                     input_file_path=file_path,
                     output_file_path=output_file_path,
@@ -301,7 +302,6 @@ def _handle_extract(self, context: ExecutionContext) -> ExecutionResult:
                 process_response.extraction_metadata
                 and process_response.extraction_metadata.line_metadata
             ):
-                shim.stream_log("Saving extraction metadata...")
                 result_data["highlight_metadata"] = (
                     process_response.extraction_metadata.line_metadata
                 )
@@ -605,12 +605,23 @@ def _failure(child_result: ExecutionResult) -> ExecutionResult:
         shim = self._build_shim(
             platform_api_key=extract_params.get("platform_api_key", ""),
         )
+
+        # One-shot run-config line — non-sensitive flags only; adapter
+        # identities are emitted inline on first use with full model info.
+        tool_settings = answer_params.get(PSKeys.TOOL_SETTINGS, {})
+        outputs = answer_params.get(PSKeys.OUTPUTS, [])
+        shim.stream_log(
+            f"Run config: prompts={len(outputs)} | "
+            f"single_pass={'on' if is_single_pass else 'off'} | "
+            f"summarize={'on' if is_summarization else 'off'} | "
+            f"challenge="
+            f"{'on' if tool_settings.get(PSKeys.ENABLE_CHALLENGE) else 'off'}"
+        )
         step = 1
 
         try:
             # ---- Step 1: Extract ----
             if not skip_extraction:
-                shim.stream_log(f"Pipeline step {step}: Extracting text from document...")
                 step += 1
                 extract_ctx = ExecutionContext(
                     executor_name=context.executor_name,
@@ -632,7 +643,6 @@ def _failure(child_result: ExecutionResult) -> ExecutionResult:
 
             # ---- Step 2: Summarize (if enabled) ----
             if is_summarization:
-                shim.stream_log(f"Pipeline step {step}: Summarizing extracted text...")
                 step += 1
                 summarize_result = self._run_pipeline_summarize(
                     context=context,
@@ -648,9 +658,6 @@ def _failure(child_result: ExecutionResult) -> ExecutionResult:
                 answer_params["file_path"] = input_file_path
             elif not is_single_pass:
                 # ---- Step 3: Index per output with dedup ----
-                shim.stream_log(
-                    f"Pipeline step {step}: Indexing document into vector store..."
-                )
                 step += 1
                 index_metrics = self._run_pipeline_index(
                     context=context,
@@ -693,7 +700,9 @@ def _failure(child_result: ExecutionResult) -> ExecutionResult:
             index_metrics=index_metrics,
         )
 
-        shim.stream_log("Pipeline completed successfully")
+        output_map = structured_output.get(PSKeys.OUTPUT, {}) or {}
+        answered = sum(1 for v in output_map.values() if v not in (None, "", [], {}))
+        shim.stream_log(f"Pipeline completed: {answered}/{len(outputs)} prompts answered")
         out_metadata = {
             k: v
             for k, v in (answer_result.metadata or {}).items()
@@ -728,12 +737,9 @@ def _run_pipeline_answer_step(
                 output["chunk-size"] = 0
                 output["chunk-overlap"] = 0
             operation = Operation.SINGLE_PASS_EXTRACTION.value
-            mode_label = "single pass"
         else:
             operation = Operation.ANSWER_PROMPT.value
-            mode_label = "prompt"
 
-        shim.stream_log(f"Pipeline step {step}: Running {mode_label} execution...")
         answer_ctx = ExecutionContext(
             executor_name=context.executor_name,
             operation=operation,
@@ -1075,8 +1081,6 @@ def _handle_index(self, context: ExecutionContext) -> ExecutionResult:
             Path(file_path).name,
             context.run_id,
         )
-        shim.stream_log("Initializing indexing pipeline...")
-
         # Skip indexing when chunk_size is 0 — no vector operations needed.
         # ChunkingConfig raises ValueError for 0, so handle before DTO.
         if chunk_size == 0:
@@ -1117,7 +1121,6 @@ def _handle_index(self, context: ExecutionContext) -> ExecutionResult:
             )
             doc_id = index.generate_index_key(file_info=file_info, fs=fs_instance)
             logger.debug("Generated index key: doc_id=%s", doc_id)
-            shim.stream_log("Checking document index status...")
 
             embedding = embedding_compat(
                 adapter_instance_id=embedding_instance_id,
@@ -1129,7 +1132,8 @@ def _handle_index(self, context: ExecutionContext) -> ExecutionResult:
                 adapter_instance_id=vector_db_instance_id,
                 embedding=embedding,
             )
-            shim.stream_log("Initialized embedding and vector DB adapters")
+            shim.log_adapter_once("Embedding", embedding_instance_id, embedding)
+            shim.log_adapter_once("Vector DB", vector_db_instance_id, vector_db)
 
             doc_id_found = index.is_document_indexed(
                 doc_id=doc_id, embedding=embedding, vector_db=vector_db
@@ -1150,11 +1154,9 @@ def _handle_index(self, context: ExecutionContext) -> ExecutionResult:
                 )
                 return ExecutionResult(success=True, data={IKeys.DOC_ID: doc_id})
 
-            if doc_id_found and reindex:
-                shim.stream_log("Document already indexed, re-indexing...")
-            else:
-                shim.stream_log("Indexing document for the first time...")
-            shim.stream_log("Indexing document into vector store...")
+            shim.stream_log(
+                "Re-indexing document" if doc_id_found else "Indexing document"
+            )
             index.perform_indexing(
                 vector_db=vector_db,
                 doc_id=doc_id,
@@ -1689,7 +1691,8 @@ def _execute_single_prompt(
             retrieval_strategy = output.get(PSKeys.RETRIEVAL_STRATEGY)
             valid_strategies = {s.value for s in RetrievalStrategy}
             if retrieval_strategy in valid_strategies:
-                shim.stream_log(f"Retrieving context for: `{prompt_name}`")
+                if chunk_size > 0:
+                    shim.stream_log(f"Retrieving context for: `{prompt_name}`")
                 logger.info(
                     "Performing retrieval: prompt=%s strategy=%s chunk_size=%d",
                     prompt_name,
@@ -1713,9 +1716,11 @@ def _execute_single_prompt(
                         context_retrieval_metrics=context_retrieval_metrics,
                     )
                 metadata[PSKeys.CONTEXT][prompt_name] = context_list
-                shim.stream_log(
-                    f"Retrieved {len(context_list)} context chunks for: `{prompt_name}`"
-                )
+                if chunk_size > 0:
+                    shim.stream_log(
+                        f"Retrieved {len(context_list)} chunks via RAG "
+                        f"for `{prompt_name}`"
+                    )
                 logger.debug(
                     "Retrieved %d context chunks for prompt: %s",
                     len(context_list),
@@ -1861,6 +1866,11 @@ def _init_llm_and_retrieval(
                     adapter_instance_id=output[PSKeys.VECTOR_DB],
                     embedding=embedding,
                 )
+            shim.log_adapter_once("LLM", output[PSKeys.LLM], llm)
+            if embedding is not None:
+                shim.log_adapter_once("Embedding", output[PSKeys.EMBEDDING], embedding)
+            if vector_db is not None:
+                shim.log_adapter_once("Vector DB", output[PSKeys.VECTOR_DB], vector_db)
             shim.stream_log(
                 f"Initialized LLM and retrieval adapters for: `{prompt_name}`"
             )
@@ -2274,7 +2284,6 @@ def _handle_summarize(self, context: ExecutionContext) -> ExecutionResult:
 
         _, _, _, _, llm_cls, _, _ = self._get_prompt_deps()
 
-        shim.stream_log("Initializing LLM for summarization...")
         llm: Any = None
         try:
             llm = llm_cls(
@@ -2286,7 +2295,9 @@ def _handle_summarize(self, context: ExecutionContext) -> ExecutionResult:
                 AnswerPromptService as answer_prompt_svc,
             )
 
-            shim.stream_log("Running document summarization...")
+            shim.stream_log(
+                f"Summarizing extracted text using LLM: `{llm.get_model_name()}`"
+            )
             summary = answer_prompt_svc.run_completion(llm=llm, prompt=prompt)
             records = list(llm.flush_pending_usage())
             logger.info("Summarization completed: run_id=%s", context.run_id)
diff --git a/workers/file_processing/structure_tool_task.py b/workers/file_processing/structure_tool_task.py
@@ -459,6 +459,7 @@ def _execute_structure_tool_impl(params: dict) -> dict:
             execution_source="tool",
             organization_id=organization_id,
             request_id=file_execution_id,
+            log_events_id=log_events_id,
             execution_id=execution_id,
             file_execution_id=file_execution_id,
             executor_params=agentic_params,
@@ -490,6 +491,7 @@ def _execute_structure_tool_impl(params: dict) -> dict:
             execution_source="tool",
             organization_id=organization_id,
             request_id=file_execution_id,
+            log_events_id=log_events_id,
             execution_id=execution_id,
             file_execution_id=file_execution_id,
             executor_params={