flow/designs: add cross-PDK WNS estimate-accuracy view

oharboe · claude · oharboe · commit 0364faaae0b5 · 2026-06-24T10:55:37.000+02:00
Extend plot_wns.py to quantify how well the cts and globalroute
worst-slack estimates predict the final WNS. Each design's per-stage
error (stage - finish) is normalized by its clock period, parsed from
the .sdc, so PDKs with different timing units are comparable.

Adds flow/designs/wns_accuracy.png (per-PDK strip plot of normalized
estimate error, + optimistic / - pessimistic) and a new
flow/designs/README.md with a "## WNS estimate accuracy across PDKs"
section: a per-PDK MAE/bias table plus hand-written findings. Covers the
67 designs across 8 PDKs that expose cts/globalroute slack and a parsable
clock period; the rest are noted as omitted.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
Signed-off-by: Øyvind Harboe &lt;oyvind.harboe@zylin.com&gt;
diff --git a/flow/designs/README.md b/flow/designs/README.md
@@ -0,0 +1,57 @@
+# ORFS designs
+
+## Findings: how accurate are early-stage WNS estimates?
+
+Reading the committed `rules-base.json` baselines, normalized by each design's clock
+period (parsed from its `.sdc`), the picture across the 67 designs / 8 PDKs that expose
+`cts` and `globalroute` slack is:
+
+- **Global route usually tightens the estimate.** For most PDKs the mean absolute error
+  drops from `cts` to `globalroute` — dramatically for `sky130hs` (10.5% → 1.9%) and
+  `gt2n` (3.3% → 0.0%), and clearly for `gf12` (2.2% → 1.1%) and `ihp-sg13g2`
+  (0.6% → 0.0%). `ihp-sg13g2` and `gf180` are already accurate at `cts`.
+
+- **`cts` is biased optimistic; `globalroute` often overshoots into pessimism.** Every
+  PDK's `cts` bias is ≥ 0 (cts reports more slack than the design finally closes with),
+  whereas `globalroute` bias flips negative for `sky130hd` (−3.5%), `sky130hs` (−1.9%),
+  `gf12` (−1.1%) and `nangate45` (−0.5%). Global route tends to *over-correct*.
+
+- **`sky130hd` is the exception where routing makes the estimate worse**, not better:
+  `globalroute` MAE (3.5%) exceeds `cts` MAE (2.9%), and it is consistently pessimistic.
+
+- **Outliers are design-specific, not PDK-wide.** `sky130hs/gcd` has `cts` +45.9%
+  (wildly optimistic, fully corrected by `globalroute`), and `asap7/swerv_wrapper` is
+  +14.9% optimistic at *both* stages — the cases most likely to mislead an early-stage
+  go/no-go decision.
+
+Practical reading: `cts` slack is a usable optimistic rank-ordering; `globalroute` is the
+first estimate within a few % of final for most PDKs, but on `sky130hd` (and for specific
+designs elsewhere) even `globalroute` can be off by 3–10% of the clock period. This is the
+design-level companion to the per-net GRT-vs-RCX divergence in
+[`flow/docs/rcx`](../docs/rcx/README.md) (PR #4302). Per-PDK design breakdowns:
+[asap7](asap7/README.md), [nangate45](nangate45/README.md), [sky130hd](sky130hd/README.md),
+[sky130hs](sky130hs/README.md), [gf12](gf12/README.md), [gf180](gf180/README.md),
+[gt2n](gt2n/README.md), [ihp-sg13g2](ihp-sg13g2/README.md).
+
+<!-- BEGIN WNS-ACCURACY (generated by flow/util/plot_wns.py) -->
+## WNS estimate accuracy across PDKs
+
+How closely the earlier-stage worst-slack estimates (`cts`, `globalroute`) match the final (`finish`) WNS, per design, normalized by that design's clock period so PDKs with different timing units are comparable. Error is `(stage − finish) / clock_period`; **positive = optimistic** (the stage reported more slack than the design actually closes with), negative = pessimistic. Clock period is parsed from each design's `.sdc`; designs whose period could not be parsed are omitted.
+
+![WNS estimate accuracy by stage, across PDKs](wns_accuracy.png)
+
+Mean absolute error (MAE) and mean signed error (bias), in % of clock period:
+
+| PDK | designs | cts MAE | cts bias | grt MAE | grt bias | worst (design) |
+| --- | ---: | ---: | ---: | ---: | ---: | --- |
+| asap7 | 16 | 2.8% | +1.5% | 2.9% | +1.1% | +14.9% (swerv_wrapper globalroute) |
+| gf12 | 9 | 2.2% | -2.2% | 1.1% | -1.1% | -14.2% (jpeg cts) |
+| gf180 | 5 | 1.0% | +1.0% | 0.4% | +0.3% | +4.3% (aes cts) |
+| gt2n | 3 | 3.3% | +3.3% | 0.0% | +0.0% | +10.0% (aes cts) |
+| ihp-sg13g2 | 7 | 0.6% | +0.6% | 0.0% | +0.0% | +4.5% (spi cts) |
+| nangate45 | 15 | 0.8% | +0.6% | 1.0% | -0.5% | -3.2% (black_parrot globalroute) |
+| sky130hd | 7 | 2.9% | +2.0% | 3.5% | -3.5% | -10.0% (gcd globalroute) |
+| sky130hs | 5 | 10.5% | +10.5% | 1.9% | -1.9% | +45.9% (gcd cts) |
+
+_Generated by `flow/util/plot_wns.py`; regenerate with `python3 flow/util/plot_wns.py`._
+<!-- END WNS-ACCURACY -->
diff --git a/flow/designs/wns_accuracy.png b/flow/designs/wns_accuracy.png
diff --git a/flow/util/plot_wns.py b/flow/util/plot_wns.py
@@ -14,6 +14,13 @@
     flow/designs/<pdk>/wns.png      -- horizontal bar chart of finish-stage WNS
     flow/designs/<pdk>/README.md    -- a "## WNS" section (between generated markers)
 
+It also produces a cross-PDK view of how well the cts/globalroute estimates predict the
+final WNS (each design's per-stage estimate error, normalized by its clock period so the
+PDKs are comparable):
+
+    flow/designs/wns_accuracy.png   -- per-PDK strip plot of normalized estimate error
+    flow/designs/README.md          -- a "## WNS estimate accuracy across PDKs" section
+
 No OpenROAD/ORFS flow run is required -- the data is already in the tree, so the plots
 are deterministic and reproducible. Run from anywhere in the repo:
 
@@ -25,8 +32,10 @@
 """
 
 import argparse
+import glob
 import json
 import os
+import re
 import sys
 
 import matplotlib
@@ -45,6 +54,37 @@
 BEGIN = "<!-- BEGIN WNS (generated by flow/util/plot_wns.py) -->"
 END = "<!-- END WNS -->"
 
+ACC_BEGIN = "<!-- BEGIN WNS-ACCURACY (generated by flow/util/plot_wns.py) -->"
+ACC_END = "<!-- END WNS-ACCURACY -->"
+
+# Estimate stages whose accuracy (vs finish) we report, in flow order.
+EST_STAGES = ["cts", "globalroute"]
+EST_MARKERS = {"cts": "v", "globalroute": "^"}
+
+_PERIOD_RE = (
+    re.compile(r"set\s+clk_period\s+([0-9.]+)"),
+    re.compile(r"create_clock[^\n]*-period\s+([0-9.]+)"),
+)
+
+
+def clock_period(design_dir):
+    """Clock period for a design, parsed from its .sdc, or None if not found.
+
+    Handles the two idioms used across PDKs: `set clk_period <N>` and
+    `create_clock ... -period <N>`. Returns the first match (designs here are
+    single-clock); units are the PDK's native timing unit, same as the WNS values.
+    """
+    for sdc in sorted(glob.glob(os.path.join(design_dir, "*.sdc"))):
+        try:
+            text = open(sdc, errors="ignore").read()
+        except OSError:
+            continue
+        for rx in _PERIOD_RE:
+            m = rx.search(text)
+            if m:
+                return float(m.group(1))
+    return None
+
 
 def designs_dir():
     """flow/designs, located relative to this script (flow/util/plot_wns.py)."""
@@ -153,23 +193,140 @@ def wns_section(pdk, rows):
     return "\n".join(lines) + "\n"
 
 
-def write_readme(readme_path, pdk, section):
-    """Create README or replace the WNS section between markers, preserving prose."""
-    if os.path.isfile(readme_path):
-        with open(readme_path) as f:
+def splice_readme(path, section, begin, end, title):
+    """Create README, or replace the marked section in place (preserving prose)."""
+    if os.path.isfile(path):
+        with open(path) as f:
             text = f.read()
-        if BEGIN in text and END in text:
-            pre = text[: text.index(BEGIN)]
-            post = text[text.index(END) + len(END):]
+        if begin in text and end in text:
+            pre = text[: text.index(begin)]
+            post = text[text.index(end) + len(end):]
             new = pre + section + post.lstrip("\n")
         else:
             new = text.rstrip("\n") + "\n\n" + section
     else:
-        new = f"# {pdk} designs\n\n" + section
-    with open(readme_path, "w") as f:
+        new = (f"# {title}\n\n" if title else "") + section
+    with open(path, "w") as f:
         f.write(new)
 
 
+# --- cross-PDK estimate accuracy --------------------------------------------
+
+def collect_accuracy(pdk_dir):
+    """Per-design normalized estimate error vs finish, in % of the clock period.
+
+    Returns [(design, {stage: err_pct}), ...] for designs that have a clock period and
+    finish + estimate-stage WNS. err_pct = 100 * (stage_ws - finish_ws) / period;
+    positive means the stage was *optimistic* (reported more slack than the final result).
+    Normalizing by clock period makes the error comparable across PDKs with different units.
+    """
+    out = []
+    for design in sorted(os.listdir(pdk_dir)):
+        ddir = os.path.join(pdk_dir, design)
+        rules = os.path.join(ddir, "rules-base.json")
+        if not os.path.isfile(rules):
+            continue
+        period = clock_period(ddir)
+        fin = load_value(rules, FINISH_KEY)
+        if not period or fin is None:
+            continue
+        errs = {}
+        for stage in EST_STAGES:
+            v = load_value(rules, f"{stage}__timing__setup__ws")
+            if v is not None:
+                errs[stage] = 100.0 * (v - fin) / period
+        if errs:
+            out.append((design, errs))
+    return out
+
+
+def _stats(rows, stage):
+    vals = [e[stage] for _, e in rows if stage in e]
+    if not vals:
+        return None
+    mae = sum(abs(v) for v in vals) / len(vals)
+    bias = sum(vals) / len(vals)
+    return len(vals), mae, bias, max(vals, key=abs)
+
+
+def plot_accuracy(acc, out_png):
+    """Strip plot: per-PDK distribution of cts/globalroute estimate error vs finish."""
+    pdks = sorted(acc)
+    colors = {"cts": "#2980b9", "globalroute": "#e67e22"}
+    off = {"cts": -0.18, "globalroute": 0.18}
+
+    fig, ax = plt.subplots(figsize=(max(8, 1.3 * len(pdks) + 2), 5.5))
+    for i, pdk in enumerate(pdks):
+        rows = acc[pdk]
+        for stage in EST_STAGES:
+            pts = [e[stage] for _, e in rows if stage in e]
+            k = len(pts)
+            xs = [
+                i + off[stage] + (0 if k < 2 else (j / (k - 1) - 0.5) * 0.26)
+                for j in range(k)
+            ]
+            ax.scatter(xs, pts, s=28, color=colors[stage], alpha=0.75,
+                       edgecolors="black", linewidths=0.3, zorder=3,
+                       label=stage if i == 0 else None)
+            if pts:  # mean tick
+                m = sum(pts) / k
+                ax.plot([i + off[stage] - 0.12, i + off[stage] + 0.12], [m, m],
+                        color=colors[stage], linewidth=2.5, zorder=4)
+
+    ax.axhline(0, color="black", linewidth=0.9, zorder=1)
+    ax.set_xticks(range(len(pdks)))
+    ax.set_xticklabels([f"{p}\n(n={len(acc[p])})" for p in pdks])
+    ax.set_ylabel("estimate − final WNS  (% of clock period)\n+ optimistic / − pessimistic")
+    ax.set_title("WNS estimate accuracy by stage, across PDKs")
+    ax.grid(axis="y", linestyle=":", alpha=0.5, zorder=0)
+    ax.legend(title="estimate stage", loc="upper left", fontsize=9)
+    fig.savefig(out_png, dpi=150, bbox_inches="tight")
+    plt.close(fig)
+
+
+def accuracy_section(acc):
+    lines = [
+        ACC_BEGIN,
+        "## WNS estimate accuracy across PDKs",
+        "",
+        "How closely the earlier-stage worst-slack estimates (`cts`, `globalroute`) match "
+        "the final (`finish`) WNS, per design, normalized by that design's clock period so "
+        "PDKs with different timing units are comparable. Error is "
+        "`(stage − finish) / clock_period`; **positive = optimistic** (the stage reported "
+        "more slack than the design actually closes with), negative = pessimistic. Clock "
+        "period is parsed from each design's `.sdc`; designs whose period could not be "
+        "parsed are omitted.",
+        "",
+        "![WNS estimate accuracy by stage, across PDKs](wns_accuracy.png)",
+        "",
+        "Mean absolute error (MAE) and mean signed error (bias), in % of clock period:",
+        "",
+        "| PDK | designs | cts MAE | cts bias | grt MAE | grt bias | worst (design) |",
+        "| --- | ---: | ---: | ---: | ---: | ---: | --- |",
+    ]
+    for pdk in sorted(acc):
+        rows = acc[pdk]
+        cs, gs = _stats(rows, "cts"), _stats(rows, "globalroute")
+        # worst single |error| over both stages, for context
+        worst = max(
+            ((abs(e[s]), e[s], d, s) for d, e in rows for s in e),
+            default=(0, 0, "-", ""),
+        )
+        c = f"{cs[1]:.1f}% | {cs[2]:+.1f}%" if cs else " | "
+        g = f"{gs[1]:.1f}% | {gs[2]:+.1f}%" if gs else " | "
+        lines.append(
+            f"| {pdk} | {len(rows)} | {c} | {g} | "
+            f"{worst[1]:+.1f}% ({worst[2]} {worst[3]}) |"
+        )
+    lines += [
+        "",
+        "_Generated by `flow/util/plot_wns.py`; regenerate with "
+        "`python3 flow/util/plot_wns.py`._",
+        ACC_END,
+    ]
+    return "\n".join(lines) + "\n"
+
+
 def main():
     ap = argparse.ArgumentParser(description=__doc__)
     ap.add_argument("--pdk", help="only process this PDK (default: all)")
@@ -195,13 +352,29 @@ def main():
         if not rows:
             continue
         plot_pdk(pdk, rows, os.path.join(pdk_dir, "wns.png"))
-        write_readme(os.path.join(pdk_dir, "README.md"), pdk, wns_section(pdk, rows))
+        splice_readme(os.path.join(pdk_dir, "README.md"), wns_section(pdk, rows),
+                      BEGIN, END, f"{pdk} designs")
         miss = sum(1 for _, v in rows if v["finish"] < 0)
         print(f"{pdk}: {len(rows)} designs ({miss} with negative finish WNS)")
         processed += 1
 
     if not processed:
         sys.exit("no PDKs with rules-base.json WNS data found")
+
+    # Cross-PDK estimate-accuracy view (only meaningful over all PDKs).
+    if not args.pdk:
+        acc = {}
+        for pdk in pdks:
+            rows = collect_accuracy(os.path.join(base, pdk))
+            if rows:
+                acc[pdk] = rows
+        if acc:
+            plot_accuracy(acc, os.path.join(base, "wns_accuracy.png"))
+            splice_readme(os.path.join(base, "README.md"), accuracy_section(acc),
+                          ACC_BEGIN, ACC_END, "ORFS designs")
+            total = sum(len(v) for v in acc.values())
+            print(f"accuracy: {total} designs across {len(acc)} PDKs (clock period found)")
+
     print(f"done: {processed} PDK(s)")