fix: review inherits committed baseline depth for the PR diff

Svilen-Stefanov · claude · Svilen-Stefanov · commit 34e95dc986a4 · 2026-06-19T13:54:27.000+02:00
Review mode hardcoded the analysis depth to 2 when depth_level was unset,
ignoring the committed .codeboarding/analysis.json's own depth_level. When the
committed baseline was deeper (e.g. depth 4), validate-base rejected it as
"deeper than expected", so the action regenerated a shallower depth-2 base,
diffed head(2) vs that, and wrote base_commit_found=false / base_commit_sha=null
into the PR artifact. The webview then fell back to diffing the depth-2 head
against the committed depth-4 base, reporting hundreds of phantom "deleted"
components for sub-trees that only differ by analysis depth.

Review now inherits the committed baseline's depth_level (mirroring sync's
run_analyze) via a new stdlib-only `baseline-depth` subcommand invoked before the
engine is installed, so head and base are analyzed at the same depth, the
committed baseline is reused, and the artifact carries a real base_commit_sha the
webview diffs against. The accepted depth ceiling is raised 3 -&gt; 4 (the engine
has no depth cap) so a committed depth-4 baseline is a first-class value.

- engine_adapter.py: best-effort engine imports so metadata-only subcommands run
  without the engine installed; add baseline_depth() + `baseline-depth` command;
  widen _supported_depth and argparse choices to include 4.
- action.yml: resolve_depth inherits the committed baseline depth in review mode
  (sync unchanged); depth guard + input doc widened to 1-4.
- README: depth_level doc widened to 1-4.
- tests: baseline-depth coverage incl. engine-absent subprocess run; depth-4
  accepted across base/head/analyze/validate-base; depth-5 now rejected.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -265,7 +265,7 @@ Review mode does not need `contents: write`: PR-specific generated files are sto
 | `github_token` | both | `${{ github.token }}` | Token for GitHub API calls; in review mode it posts or updates the PR comment. |
 | `push_token` | sync | `${{ github.token }}` | Token used for sync-mode pushes to `target_branch`. The workflow token can push when the workflow grants `permissions: contents: write`. Separate from `github_token` so commenting can use a GitHub App token while the push uses the workflow token. |
 | `codeboarding_version` | both | `0.12.3` | CodeBoarding PyPI package version used as the analysis engine. Pin for reproducibility. |
-| `depth_level` | both | empty (`2` for cold starts) | Analysis depth, 1 to 3, used for first analysis and `force_full` rebuilds. Once `.codeboarding/analysis.json` exists, its `metadata.depth_level` is the source of truth for incremental analysis and fallback-full recovery. |
+| `depth_level` | both | empty (`2` for cold starts) | Analysis depth, 1 to 4, used for first analysis and `force_full` rebuilds. Once `.codeboarding/analysis.json` exists, its `metadata.depth_level` is the source of truth: sync runs incremental at the baseline depth, and review analyzes the PR head at the committed baseline depth so the diff is apples-to-apples. |
 | `render_depth` | review | `1` | Display depth for the PR diagram. Keep `1` for a clean top-level view. |
 | `diagram_direction` | review | `LR` | Mermaid direction: `LR`, `TD`, `TB`, `RL`, or `BT`. |
 | `changed_only` | review | `false` | Render only changed components and incident edges. |
diff --git a/action.yml b/action.yml
@@ -36,7 +36,7 @@ inputs:
     required: false
     default: '0.12.3'
   depth_level:
-    description: 'Analysis depth (1-3) for cold-start or force_full rebuilds. Once .codeboarding/analysis.json exists, its metadata.depth_level is the source of truth for incremental analysis and fallback-full recovery. Empty (default): 2 for cold starts.'
+    description: 'Analysis depth (1-4) for cold-start or force_full rebuilds. Once .codeboarding/analysis.json exists, its metadata.depth_level is the source of truth: sync runs incremental at the baseline depth, and review analyzes the PR head at the committed baseline depth so the diff is apples-to-apples. Empty (default): 2 for cold starts.'
     required: false
     default: ''
   agent_model:
@@ -210,8 +210,8 @@ runs:
           *) echo "::error::mode must be 'review' or 'sync' (got '$MODE')."; exit 1 ;;
         esac
         case "$DEPTH" in
-          ''|1|2|3) ;;
-          *) echo "::error::depth_level must be 1, 2, or 3 (empty = default cold-start depth 2)."; exit 1 ;;
+          ''|1|2|3|4) ;;
+          *) echo "::error::depth_level must be 1, 2, 3, or 4 (empty = default cold-start depth 2)."; exit 1 ;;
         esac
         echo "mode=$MODE" >> "$GITHUB_OUTPUT"
 
@@ -472,8 +472,13 @@ runs:
       id: resolve_depth
       if: steps.guard.outputs.skip != 'true'
       shell: bash
+      working-directory: target-repo
       env:
         INPUT_DEPTH: ${{ inputs.depth_level }}
+        MODE: ${{ steps.guard.outputs.mode }}
+        BASE_SHA: ${{ steps.guard.outputs.base_sha }}
+        EMPTY_BASE: ${{ steps.guard.outputs.empty_base }}
+        ACTION_PATH: ${{ github.action_path }}
       run: |
         set -euo pipefail
         # Explicit input controls cold-start/force_full rebuilds. Existing
@@ -483,6 +488,31 @@ runs:
           echo "Using explicit depth_level=$INPUT_DEPTH."
           exit 0
         fi
+        # Review against a committed baseline: analyze the PR head at the SAME
+        # depth the committed .codeboarding/analysis.json was generated with, so
+        # head and base diff apples-to-apples. Defaulting to a shallower depth
+        # would make validate-base reject the deeper baseline, force the action to
+        # regenerate a shallower base, and (because the artifact then carries no
+        # committed base SHA) leave the webview diffing the deeper committed base
+        # against the shallower head — reporting phantom "deleted" components.
+        # The engine is not installed yet at this step, so baseline-depth runs on
+        # the runner's stdlib python3 (it only parses the committed JSON).
+        if [ "$MODE" = "review" ] && [ "$EMPTY_BASE" != "true" ] && [ -n "$BASE_SHA" ]; then
+          BASE_ANALYSIS="$(mktemp)"
+          if git show "${BASE_SHA}:.codeboarding/analysis.json" > "$BASE_ANALYSIS" 2>/dev/null; then
+            INHERITED="$(python3 "$ACTION_PATH/scripts/engine_adapter.py" baseline-depth --analysis "$BASE_ANALYSIS" | sed -n 's/^depth_level=//p')"
+            rm -f "$BASE_ANALYSIS"
+            if [ -n "$INHERITED" ]; then
+              echo "depth=$INHERITED" >> "$GITHUB_OUTPUT"
+              echo "Inheriting committed baseline depth_level=$INHERITED for the PR-head analysis."
+              exit 0
+            fi
+            echo "Committed baseline has no usable depth_level; using default cold-start depth."
+          else
+            rm -f "$BASE_ANALYSIS"
+            echo "No committed baseline at ${BASE_SHA}; using default cold-start depth."
+          fi
+        fi
         DEPTH=2
         echo "depth=$DEPTH" >> "$GITHUB_OUTPUT"
         echo "Using default cold-start depth_level=$DEPTH."
diff --git a/scripts/engine_adapter.py b/scripts/engine_adapter.py
@@ -1,11 +1,12 @@
 """CLI adapter between the action and the CodeBoarding analysis ENGINE.
 
 No analysis logic lives here. The engine is the published ``codeboarding`` PyPI
-package installed by the action and imported lazily inside each function
-(``codeboarding_workflows`` etc.); this module just turns the action's shell
-steps into typed, tested calls into it. The lazy imports mean this file imports
-fine without the package present — the tests stub those modules and assert we
-call the engine with the right args.
+package installed by the action (``codeboarding_workflows`` etc.); this module
+just turns the action's shell steps into typed, tested calls into it. The engine
+imports are best-effort at module load, so this file imports fine without the
+package present — the metadata-only subcommands (``baseline-info``,
+``baseline-depth``, ``validate-base``) run with the stdlib alone, and the tests
+stub the engine modules to assert we call the engine with the right args.
 
 Subcommands (all paths/refs come in as argv, never interpolated into source):
 
@@ -15,6 +16,7 @@
   health         --artifact-dir D --repo P --name N --issues-out FILE
   validate-base  --analysis F --expected-sha SHA [--expected-depth K]
   baseline-info  --analysis F
+  baseline-depth --analysis F
   analyze        --repo P --out D --name N --run-id ID --source-sha SHA --depth K [--force-full]
   render         --analysis F --out D --repo-name N --repo-ref R [--format .md]
   concat         --docs-dir D --out F
@@ -60,12 +62,23 @@
 import shutil
 from pathlib import Path
 
-from codeboarding_workflows.analysis import BaselineUnavailableError, run_full, run_incremental
-from codeboarding_workflows.rendering import render_docs
-from diagram_analysis.exceptions import IncrementalCacheMissingError
-from static_analyzer import get_static_analysis
-from static_analyzer.analysis_cache import StaticAnalysisCache
-from static_analyzer.cluster_helpers import build_all_cluster_results
+# The engine packages are imported best-effort so the metadata-only subcommands
+# (``baseline-info``, ``baseline-depth``, ``validate-base``) run with the stdlib
+# alone — they parse a committed analysis.json and never touch the engine. The
+# action invokes them BEFORE the engine package is pip-installed (e.g. while
+# resolving the review depth), so a hard import here would break that step. The
+# analysis subcommands that DO need the engine fail loudly when these are None.
+try:
+    from codeboarding_workflows.analysis import BaselineUnavailableError, run_full, run_incremental
+    from codeboarding_workflows.rendering import render_docs
+    from diagram_analysis.exceptions import IncrementalCacheMissingError
+    from static_analyzer import get_static_analysis
+    from static_analyzer.analysis_cache import StaticAnalysisCache
+    from static_analyzer.cluster_helpers import build_all_cluster_results
+except Exception:  # engine package not installed (metadata-only subcommands don't need it)
+    BaselineUnavailableError = IncrementalCacheMissingError = _MissingEngine = type("_MissingEngine", (Exception,), {})
+    run_full = run_incremental = render_docs = None
+    get_static_analysis = StaticAnalysisCache = build_all_cluster_results = None
 
 try:
     from health.models import Severity
@@ -151,7 +164,7 @@ def _metadata_depth(metadata: dict) -> int | None:
 
 def _supported_depth(metadata: dict) -> int | None:
     depth = _metadata_depth(metadata)
-    return depth if depth in range(1, 4) else None
+    return depth if depth in range(1, 5) else None
 
 
 def _analysis_depth_or_default(output_dir: Path, default_depth: int = _DEFAULT_DEPTH) -> int:
@@ -180,6 +193,21 @@ def baseline_info(analysis_path: Path) -> str:
     return commit if _SHA_RE.match(commit) else ""
 
 
+def baseline_depth(analysis_path: Path) -> int | None:
+    """Return the committed baseline's metadata.depth_level when present and a
+    supported value (1-4), else None. Review mode uses this (via the
+    ``baseline-depth`` subcommand) to analyze the PR head at the SAME depth the
+    committed baseline was generated with, so head and base diff apples-to-apples
+    instead of defaulting to a shallower depth and reporting phantom changes.
+    Parsing + the supported-range guard live here so the action shell never reads
+    the JSON inline (mirrors ``baseline_info``).
+    """
+    metadata = _load_metadata(analysis_path)
+    if not isinstance(metadata, dict):
+        return None
+    return _supported_depth(metadata)
+
+
 def validate_base_analysis(
     analysis_path: Path, expected_sha: str, expected_depth: int | None = None
 ) -> tuple[bool, str]:
@@ -198,6 +226,12 @@ def validate_base_analysis(
     expands persists depth_level 1 — rejecting that would force a full
     regeneration on every PR without ever converging. A missing or
     unparseable depth_level is accepted — legacy baselines predate the field.
+
+    Review now derives ``expected_depth`` from the committed baseline's own
+    depth_level (via the ``baseline-depth`` subcommand), so the deeper-than-expected
+    rejection no longer fires for the normal case — head and base are analyzed at
+    the same depth. The rejection remains a safety net for an explicit
+    ``depth_level`` input that is shallower than the committed baseline.
     """
     try:
         data = json.loads(analysis_path.read_text(encoding="utf-8"))
@@ -523,7 +557,7 @@ def main(argv=None) -> int:
     b = sub.add_parser("base")
     for a in ("--repo", "--out", "--name", "--run-id", "--source-sha"):
         b.add_argument(a, required=True)
-    b.add_argument("--depth", required=True, type=int, choices=range(1, 4))
+    b.add_argument("--depth", required=True, type=int, choices=range(1, 5))
 
     s = sub.add_parser("seed")
     for a in ("--repo", "--out", "--source-sha"):
@@ -532,7 +566,7 @@ def main(argv=None) -> int:
     h = sub.add_parser("head")
     for a in ("--repo", "--out", "--name", "--run-id", "--base-ref", "--target-ref", "--source-sha"):
         h.add_argument(a, required=True)
-    h.add_argument("--depth", required=True, type=int, choices=range(1, 4))
+    h.add_argument("--depth", required=True, type=int, choices=range(1, 5))
     h.add_argument("--force-full", action="store_true", help="Run a full PR-head analysis instead of incremental.")
 
     hc = sub.add_parser("health")
@@ -542,15 +576,18 @@ def main(argv=None) -> int:
     vb = sub.add_parser("validate-base")
     vb.add_argument("--analysis", required=True)
     vb.add_argument("--expected-sha", required=True)
-    vb.add_argument("--expected-depth", type=int, choices=range(1, 4))
+    vb.add_argument("--expected-depth", type=int, choices=range(1, 5))
 
     bi = sub.add_parser("baseline-info")
     bi.add_argument("--analysis", required=True)
 
+    bd = sub.add_parser("baseline-depth")
+    bd.add_argument("--analysis", required=True)
+
     an = sub.add_parser("analyze")
     for a in ("--repo", "--out", "--name", "--run-id", "--source-sha"):
         an.add_argument(a, required=True)
-    an.add_argument("--depth", required=True, type=int, choices=range(1, 4))
+    an.add_argument("--depth", required=True, type=int, choices=range(1, 5))
     an.add_argument("--force-full", action="store_true", help="Ignore any committed baseline and run a full analysis.")
 
     rn = sub.add_parser("render")
@@ -591,6 +628,9 @@ def main(argv=None) -> int:
             return 0 if ok else 1
         elif args.cmd == "baseline-info":
             print(f"commit_hash={baseline_info(Path(args.analysis))}")
+        elif args.cmd == "baseline-depth":
+            depth = baseline_depth(Path(args.analysis))
+            print(f"depth_level={depth if depth is not None else ''}")
         elif args.cmd == "analyze":
             run_analyze(args.repo, args.out, args.name, args.run_id, args.source_sha, args.depth, args.force_full)
         elif args.cmd == "render":
diff --git a/tests/test_engine_adapter.py b/tests/test_engine_adapter.py
@@ -189,7 +189,7 @@ def test_main_sets_github_action_source(self):
             self.assertEqual(os.environ["CODEBOARDING_SOURCE"], "github_action")
 
     def test_main_rejects_invalid_depth(self):
-        for depth in ("0", "4", "x"):
+        for depth in ("0", "5", "x"):
             with self.subTest(depth=depth):
                 with redirect_stderr(StringIO()):
                     with self.assertRaises(SystemExit):
@@ -211,6 +211,30 @@ def test_main_rejects_invalid_depth(self):
                             ]
                         )
 
+    def test_main_accepts_depth_four(self):
+        # The action's accepted depth ceiling is 4 so a committed depth-4 baseline
+        # is a first-class value review can inherit (the engine has no depth cap).
+        rf = _Rec()
+        self._install(run_full=rf)
+        engine_adapter.main(
+            [
+                "base",
+                "--repo",
+                "/repo",
+                "--out",
+                "/out",
+                "--name",
+                "myrepo",
+                "--run-id",
+                "rid-base",
+                "--depth",
+                "4",
+                "--source-sha",
+                "abc123",
+            ]
+        )
+        self.assertEqual(rf.calls[0]["depth_level"], 4)
+
     def test_head_uses_incremental(self):
         ri, rf = _Rec(), _Rec()
         self._install(run_full=rf, run_incremental=ri)
@@ -497,6 +521,25 @@ def test_validate_base_without_expected_depth_ignores_depth(self):
             self.assertTrue(ok)
             self.assertIn("matches", message)
 
+    def test_validate_base_accepts_depth_four_baseline(self):
+        # The core fix: review inherits the committed baseline's depth, so a
+        # depth-4 baseline validated at --expected-depth 4 is accepted (reused,
+        # not regenerated). Validated at a shallower expected depth it is still
+        # rejected (an explicit shallower depth_level input).
+        with tempfile.TemporaryDirectory() as tmp:
+            path = Path(tmp) / "analysis.json"
+            path.write_text(
+                json.dumps({"metadata": {"commit_hash": "abc123", "depth_level": 4}}),
+                encoding="utf-8",
+            )
+
+            ok_same, _ = engine_adapter.validate_base_analysis(path, "abc123", expected_depth=4)
+            ok_shallower, message = engine_adapter.validate_base_analysis(path, "abc123", expected_depth=2)
+
+            self.assertTrue(ok_same)
+            self.assertFalse(ok_shallower)
+            self.assertIn("deeper", message)
+
     def test_main_validate_base_expected_depth_exit_codes(self):
         # patch.dict: main() setdefaults CODEBOARDING_SOURCE; don't leak it.
         with patch.dict(os.environ), tempfile.TemporaryDirectory() as tmp:
@@ -518,10 +561,18 @@ def test_main_validate_base_expected_depth_exit_codes(self):
                 ),
                 1,
             )
+            # depth 4 is now an accepted value (against a depth-2 baseline a
+            # shallower-or-equal expected depth passes the depth check).
+            self.assertEqual(
+                engine_adapter.main(
+                    ["validate-base", "--analysis", str(path), "--expected-sha", "abc123", "--expected-depth", "4"]
+                ),
+                0,
+            )
             with redirect_stderr(StringIO()):
-                with self.assertRaises(SystemExit):  # depth outside 1-3 rejected by argparse
+                with self.assertRaises(SystemExit):  # depth outside 1-4 rejected by argparse
                     engine_adapter.main(
-                        ["validate-base", "--analysis", str(path), "--expected-sha", "abc123", "--expected-depth", "4"]
+                        ["validate-base", "--analysis", str(path), "--expected-sha", "abc123", "--expected-depth", "5"]
                     )
 
 
diff --git a/tests/test_sync_subcommands.py b/tests/test_sync_subcommands.py