Merge pull request #1111 from Serhan-Asad/fix/1106-metadata-finalization-alignment

gltanaka · web-flow · commit 2973a5e03633 · 2026-05-20T20:14:54.000-07:00
fix: align metadata finalization skip/fail semantics for update and CI heal
diff --git a/README.md b/README.md
@@ -2412,7 +2412,7 @@ Options:
 - `--git`: Use git history to find the original code file, eliminating the need for the `INPUT_CODE_FILE` argument.
 - `--extensions EXTENSIONS`: In repository-wide mode, filter the update to only include files with the specified comma-separated extensions (e.g., `py,js,ts`).
 - `--simple`: Use the legacy 2-stage LLM update process instead of the default agentic mode. Useful when agentic CLIs are not available or for faster updates.
-- `--sync-metadata`: After the prompt update, run the shared metadata-sync orchestrator so prompt PDD tags, `architecture.json` entries, run reports, and fingerprint state are reconciled in one step. Works in single-file, regeneration, and repo modes. **Fingerprint note:** default single-file/regeneration `pdd update <code>` already finalizes the per-target fingerprint (`.pdd/meta/<basename>_<language>.json`) on success, and logs a skip reason when finalization is intentionally bypassed; `--sync-metadata` does not gate that behavior. Default repo-mode `pdd update --repo` likewise finalizes per-pair fingerprints and, before writing each fingerprint, clears the affected module's stale `.pdd/meta/<basename>_<language>_run.json` runtime-verification report so metadata and runtime state stay in lock-step (clear failures are surfaced as non-fatal warnings, and if the stale `_run.json` still exists after the clear attempt the fingerprint write is skipped so a fresh fingerprint cannot coexist with stale runtime state — closing issue [#1057](https://github.com/promptdriven/pdd/issues/1057)). Without this flag, the broader prompt-tag/architecture/run-report orchestrator is not run and those layers must be reconciled with separate commands. **Scope note:** the `tags` stage currently *preserves* existing PDD tags and only *seeds* tags from the matching `architecture.json` entry when a prompt has none — LLM-first **refresh** of stale-but-present tags is tracked at issue [#870](https://github.com/promptdriven/pdd/issues/870) and is not invoked by this orchestrator. When a prompt has zero PDD tags AND no architecture entry, the `tags` stage reports `skipped` (never `ok`) so operators see honest status. On any stage `failed`, `pdd update --sync-metadata` exits non-zero so CI auto-heal does not treat a half-finalized update as healed.
+- `--sync-metadata`: After the prompt update, run the shared metadata-sync orchestrator so prompt PDD tags, `architecture.json` entries, run reports, and fingerprint state are reconciled in one step. Works in single-file, regeneration, and repo modes. **Fingerprint note:** default single-file/regeneration `pdd update <code>` already finalizes the per-target fingerprint (`.pdd/meta/<basename>_<language>.json`) on success, and logs a skip reason when finalization is intentionally bypassed; `--sync-metadata` does not gate that behavior. Before writing the single-file/regeneration fingerprint, the command clears the affected module's stale `.pdd/meta/<basename>_<language>_run.json` runtime-verification report and re-checks it is gone; if the stale report survives the clear attempt (for example, silent `os.remove` failure on permissions/locks), the fingerprint write is skipped with a yellow warning so a fresh fingerprint cannot coexist with stale runtime state — best-effort, the successful `(prompt, cost, model)` update tuple is preserved either way (closing issue [#1106](https://github.com/promptdriven/pdd/issues/1106)). The stale-report warning intentionally surfaces even under `--quiet`, because it describes a real metadata-consistency problem the operator should learn about regardless of other log suppression. Default repo-mode `pdd update --repo` likewise finalizes per-pair fingerprints and, before writing each fingerprint, clears the affected module's stale `.pdd/meta/<basename>_<language>_run.json` runtime-verification report so metadata and runtime state stay in lock-step (clear failures are surfaced as non-fatal warnings, and if the stale `_run.json` still exists after the clear attempt the fingerprint write is skipped so a fresh fingerprint cannot coexist with stale runtime state — closing issue [#1057](https://github.com/promptdriven/pdd/issues/1057)). Without this flag, the broader prompt-tag/architecture/run-report orchestrator is not run and those layers must be reconciled with separate commands. **Scope note:** the `tags` stage currently *preserves* existing PDD tags and only *seeds* tags from the matching `architecture.json` entry when a prompt has none — LLM-first **refresh** of stale-but-present tags is tracked at issue [#870](https://github.com/promptdriven/pdd/issues/870) and is not invoked by this orchestrator. When a prompt has zero PDD tags AND no architecture entry, the `tags` stage reports `skipped` (never `ok`) so operators see honest status. On any stage `failed`, `pdd update --sync-metadata` exits non-zero so CI auto-heal does not treat a half-finalized update as healed.
 
 Example (Metadata Sync):
 ```bash
diff --git a/pdd/ci_drift_heal.py b/pdd/ci_drift_heal.py
@@ -1251,36 +1251,57 @@ def _heal_update(drift: DriftInfo, env: Dict[str, str], skip_set: Set[str]) -> O
         drift.prompt_path = str(candidate)
         prompt_path = drift.prompt_path
 
+    # Issue #1106 Gap 2: the lazy block above only fail-closes when
+    # `drift.prompt_path` was initially None. If it was set on `drift`
+    # BEFORE this heal ran but the file is missing on disk after the
+    # `pdd update` subprocess (typo, deleted, renamed by update, language
+    # detection mismatch), the previous code's `if prompt_exists:` guard
+    # further down silently skipped `_run_metadata_sync_safe` AND still
+    # fell through to the follow-up `pdd example` — masking metadata
+    # failure as a successful heal. Pull that gate up here so the missing
+    # case mirrors the lazy-unresolvable case: explicit hard failure, no
+    # metadata sync, no follow-up example.
+    try:
+        prompt_exists_post_update = Path(str(prompt_path)).exists()
+    except Exception:
+        prompt_exists_post_update = False
+    if not prompt_exists_post_update:
+        console.print(
+            f"[red]heal failed for {drift.basename}: prompt_path "
+            f"{prompt_path} set but missing on disk post-update[/red]"
+        )
+        drift.metadata_finalization_failed = True
+        drift.metadata_finalization_error = (
+            "prompt_path set but missing on disk post-update"
+        )
+        return False
+
     # Gates.
     if not _enforce_prompt_churn_gate(drift):
         return False
     if not _enforce_structural_invariants(drift):
         return False
 
-    # Snapshot + metadata orchestrator (only when prompt file exists on disk).
-    snapshot: Optional[Dict[str, Optional[bytes]]] = None
-    try:
-        prompt_exists = Path(str(prompt_path)).exists()
-    except Exception:
-        prompt_exists = False
-    if prompt_exists:
-        snapshot = _snapshot_metadata_state_for(drift)
-        meta_ok = _run_metadata_sync_safe(str(prompt_path), str(code_path) if code_path else None)
-        if not meta_ok:
-            try:
-                _revert_prompt_file(drift)
-            except PromptRevertError:
-                raise
-            if snapshot is not None:
-                _restore_metadata_state_for(snapshot)
-            # Metadata finalization is a hard requirement (Issue #1006): a
-            # successful auto-heal commit must include the updated fingerprint,
-            # so this failure must surface distinctly from advisory subprocess
-            # failures and fail the run loudly in every mode.
-            drift.metadata_finalization_failed = True
-            drift.metadata_finalization_error = "metadata sync returned false"
-            return False
-        drift.metadata_finalized = True
+    # Snapshot + metadata orchestrator. Prompt-exists was verified above, so
+    # the previous `if prompt_exists:` guard is now unconditional — keep the
+    # snapshot/revert flow inline.
+    snapshot = _snapshot_metadata_state_for(drift)
+    meta_ok = _run_metadata_sync_safe(str(prompt_path), str(code_path) if code_path else None)
+    if not meta_ok:
+        try:
+            _revert_prompt_file(drift)
+        except PromptRevertError:
+            raise
+        if snapshot is not None:
+            _restore_metadata_state_for(snapshot)
+        # Metadata finalization is a hard requirement (Issue #1006): a
+        # successful auto-heal commit must include the updated fingerprint,
+        # so this failure must surface distinctly from advisory subprocess
+        # failures and fail the run loudly in every mode.
+        drift.metadata_finalization_failed = True
+        drift.metadata_finalization_error = "metadata sync returned false"
+        return False
+    drift.metadata_finalized = True
 
     # Optional follow-up: skip when module bypassed via env.
     if drift.basename in skip_set:
diff --git a/pdd/prompts/ci_drift_heal_python.prompt b/pdd/prompts/ci_drift_heal_python.prompt
@@ -59,7 +59,7 @@ A standalone CI script (`pdd/ci_drift_heal.py`) that orchestrates drift detectio
 
 7. **Noninteractive strength override:** All heal subprocess commands pass `--force --strength 0.5`. `--force` suppresses overwrite prompts in CI; `--strength 0.5` overrides `.pddrc` context strengths that would otherwise push model selection into expensive tiers.
 
-8. **Lazy prompt_path resolution:** For the code-without-prompt flow, `prompt_path` starts as None. After `pdd update` creates the prompt, resolve `prompt_path` via `get_pdd_file_paths()` before running the follow-up example step and gates. Fail closed if prompt_path remains unresolvable post-update.
+8. **Lazy prompt_path resolution and post-update existence gate:** For the code-without-prompt flow, `prompt_path` starts as None. After `pdd update` creates the prompt, resolve `prompt_path` via `get_pdd_file_paths()` before running the follow-up example step and gates. **Fail closed in both shapes of missing prompt post-update** (issue #1106): (a) when `prompt_path` was initially None and remains unresolvable after the lazy block — set `drift.metadata_finalization_failed = True`, `drift.metadata_finalization_error = "prompt_path unresolvable post-update"`, log `[red]heal failed for <basename>: prompt_path unresolvable post-update[/red]`, and return False; and (b) when `prompt_path` was initially set on `drift` but `Path(prompt_path).exists()` is False after the `pdd update` subprocess (typo, deleted/renamed by update, language detection mismatch) — set `drift.metadata_finalization_failed = True`, set `drift.metadata_finalization_error` to an explicit reason such as `"prompt_path set but missing on disk post-update"`, log `[red]heal failed for <basename>: prompt_path <prompt_path> set but missing on disk post-update[/red]`, and return False. In both cases the heal MUST NOT call `_run_metadata_sync_safe` and MUST NOT run the follow-up `pdd example` subprocess — otherwise a heal can silently mask an inconsistent post-update state by completing the example step with no metadata sync. Place the preset-but-missing existence gate AFTER the lazy-resolution block and BEFORE the churn / structural-invariants gates so a stable check order applies to both branches.
 
 9. **Prompt churn gate:** After `pdd update`, compare prompt churn (lines added+deleted vs HEAD) against code churn (vs diff_base). If ratio exceeds `_HEAL_PROMPT_CHURN_MAX_RATIO` (default 5.0, env-overridable via `PDD_HEAL_PROMPT_CHURN_MAX_RATIO`), revert the prompt and fail. Permissive when inputs are missing or git fails — let structural invariants be the fallback guard.
 
diff --git a/pdd/prompts/update_main_python.prompt b/pdd/prompts/update_main_python.prompt
@@ -74,7 +74,7 @@ Supports three modes: true update, regeneration, and repository-wide updates. Ro
     - If the written prompt path differs from the canonical source prompt (i.e. `--output` redirected the write away from the input prompt in true-update mode), do not finalize metadata and print `[info][metadata] Skipping fingerprint finalization: output redirected[/info]` unless `quiet`. This guard applies in addition to (and independently of) the `sync_metadata=True` skip — when both are set, the orchestrator call itself is also skipped per Requirement 14 so neither layer records a fingerprint against the redirected path.
     - If `infer_module_identity(prompt_path)` returns `(None, None)`, do not finalize metadata and print `[info][metadata] Skipping fingerprint finalization: unable to infer module identity for <prompt_path>[/info]` unless `quiet`.
     - If `save_fingerprint` raises, keep the successful return tuple and print `[warning][metadata] Fingerprint save failed: <exc>[/warning]` unless `quiet`.
-    - Existing stale `_run.json` cleanup behavior must remain unchanged.
+    - Stale `_run.json` cleanup MUST reuse `pdd.operation_log._clear_run_report_before_fingerprint(basename, language)`, which clears the run report then re-checks existence and returns `False` (with a yellow console warning) when a silent `os.remove` left it on disk. When the helper returns `False`, skip `save_fingerprint` so a fresh fingerprint never coexists with stale runtime state (issue #1106; mirrors repo-mode and the `log_operation` decorator). The helper's warning surfaces unconditionally — it does NOT honour the caller's `quiet` flag, intentionally, because it describes a real metadata problem. Wrap the helper call in `try/except`: on an unexpected exception, emit `[warning][metadata] Run report clear failed: <exc>[/warning]` unless `quiet` and skip `save_fingerprint` (best-effort cleanup must never break the successful update tuple). The function-local `from .operation_log import (_clear_run_report_before_fingerprint, infer_module_identity, save_fingerprint)` MUST also be wrapped in `try/except ImportError`: on ImportError, emit `[warning][metadata] Could not import finalization helpers: <exc>[/warning]` unless `quiet` and return without calling `save_fingerprint`. This is required because `_clear_run_report_before_fingerprint` is a private underscore-prefixed symbol and therefore more fragile to internal `operation_log` renames than the public names alongside it — without the import guard, an ImportError would propagate to `update_main`'s outer `except Exception: return None` and silently convert a successful `(prompt, cost, model)` tuple to `None`, violating the "best-effort metadata cleanup must never break the successful update tuple" contract.
 
 % Modes
 1. **True update**: `input_prompt_file` + `modified_code_file` + (use_git OR `input_code_file`).
diff --git a/pdd/update_main.py b/pdd/update_main.py
@@ -1103,11 +1103,29 @@ def _finalize_single_file_fingerprint(
             )
         return
 
-    from .operation_log import (
-        clear_run_report,
-        infer_module_identity,
-        save_fingerprint,
-    )
+    # Wrap the import itself so the user's successful update tuple is never
+    # broken by an import-time failure (e.g. `_clear_run_report_before_fingerprint`
+    # gets renamed in a future operation_log refactor — it's a private
+    # underscore-prefixed name and therefore more fragile than the public
+    # `clear_run_report` / `infer_module_identity` / `save_fingerprint`
+    # alongside it). An ImportError raised here would propagate up to
+    # `update_main`'s outer `except Exception: return None`, converting a
+    # successful `(prompt, cost, model)` tuple to None — which violates the
+    # issue #1106 acceptance criterion: best-effort metadata cleanup must
+    # never fail the successful update tuple.
+    try:
+        from .operation_log import (
+            _clear_run_report_before_fingerprint,
+            infer_module_identity,
+            save_fingerprint,
+        )
+    except ImportError as exc:
+        if not quiet:
+            rprint(
+                f"[warning][metadata] Could not import finalization helpers: "
+                f"{exc}[/warning]"
+            )
+        return
     basename, language = infer_module_identity(prompt_path)
     if not (basename and language):
         if not quiet:
@@ -1117,13 +1135,33 @@ def _finalize_single_file_fingerprint(
             )
         return
 
+    # Reuse the shared helper so the single-file finalize path enforces the
+    # same invariant the `log_operation` decorator and repo-mode block already
+    # do: a fresh fingerprint must never coexist with a stale `_run.json`
+    # (issue #1106). The helper re-checks that the run report is actually
+    # gone after `clear_run_report()` and emits a console warning if a
+    # silent `os.remove` failure left it behind — see
+    # `pdd.operation_log._clear_run_report_before_fingerprint`. The warning
+    # surfaces unconditionally (the helper does not consult `quiet`): the
+    # contract the issue text quotes is "print a warning" without qualifying
+    # on quiet mode, and the user should learn that runtime verification
+    # state still describes the pre-mutation files even when --quiet
+    # suppresses other chatter (why: informational about a real metadata
+    # problem, not status fluff).
     try:
-        clear_run_report(basename, language)
+        fingerprint_allowed = _clear_run_report_before_fingerprint(basename, language)
     except Exception as exc:
+        # Defensive: surrounding pattern in this function treats metadata
+        # cleanup as best-effort; an unexpected raise must not break the
+        # successful update tuple. Warn and skip the save, matching the
+        # `save_fingerprint` except-arm below.
         if not quiet:
             rprint(
                 f"[warning][metadata] Run report clear failed: {exc}[/warning]"
             )
+        return
+    if not fingerprint_allowed:
+        return
 
     try:
         save_fingerprint(
diff --git a/tests/test_ci_drift_heal.py b/tests/test_ci_drift_heal.py
diff --git a/tests/test_update_main.py b/tests/test_update_main.py