SPDD: close spec drift gaps across Effective Tokens, Forecast, Frontmatter Hash, Fuzzy Schedule, and MCP Scripts (#34719)

Copilot · gh-aw-bot · web-flow · commit cd692d18909a · 2026-05-25T09:57:50.000-07:00
* Initial plan

* docs: complete SPDD spec alignment updates

Co-authored-by: gh-aw-bot &lt;259018956+gh-aw-bot@users.noreply.github.com&gt;

---------

Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: gh-aw-bot &lt;259018956+gh-aw-bot@users.noreply.github.com&gt;
diff --git a/docs/src/content/docs/reference/effective-tokens-specification.md b/docs/src/content/docs/reference/effective-tokens-specification.md
@@ -361,8 +361,39 @@ When sub-agents are not fully observable, implementations MUST still report aggr
 
 ### 8.5 Safeguards
 
-Implementations must prevent unbounded ET accumulation from producing non-finite or
-non-interoperable outputs.
+Implementations MUST apply the following safeguards to prevent unbounded ET accumulation from
+producing non-finite or non-interoperable outputs.
+
+#### S-1: Overflow and Capping
+
+**Threat**: Unbounded multi-invocation ET aggregation can exceed numeric interoperability limits and
+produce values that cannot be represented safely across systems.
+
+**Mitigation**: Implementations MUST enforce the JavaScript-safe numeric ceiling and record
+deterministic overflow state when capping occurs, including the ceiling value in the emitted
+flag/error payload.
+
+Normative requirements: **R-SAFE-002**, **R-SAFE-003**, **R-SAFE-003A**, **R-SAFE-004**
+
+#### S-2: Non-Finite Numeric Rejection
+
+**Threat**: `NaN`, `+Inf`, and `-Inf` values in multipliers or token class weights can silently
+corrupt ET outputs and break downstream serialization/aggregation.
+
+**Mitigation**: Implementations MUST reject non-finite or invalid numeric registry values before ET
+computation begins.
+
+Normative requirements: **R-SAFE-007**, **R-SAFE-008**
+
+#### S-3: Registry Validation Failure Handling
+
+**Threat**: Continuing ET computation after registry validation failure can produce inconsistent,
+partially parsed, or non-reproducible outputs.
+
+**Mitigation**: Implementations MUST fail deterministically with field-level diagnostics and MUST NOT
+continue with partially parsed registry data.
+
+Normative requirements: **R-SAFE-009**, **R-SAFE-010**
 
 **R-SAFE-001**: ET aggregation logic **MUST** detect overflow and non-finite arithmetic states
 (`NaN`, `+Inf`, `-Inf`) before serializing output.
@@ -638,6 +669,42 @@ ET_total = Σ [ m_i × (max(I_i - C_i, 0) + 0.1 C_i + 4 O_i + 4 R_i) ]
 
 ET values are derived from token usage metadata. Implementations SHOULD treat per-invocation token data as potentially sensitive since usage patterns may reveal information about system prompts, model configurations, or user behavior. Aggregate ET values suitable for observability dashboards SHOULD be separated from detailed per-invocation data in access-controlled reporting systems.
 
+### Appendix D: ET Test Vectors
+
+#### ET-TV-001 (Single Invocation Baseline)
+
+Input:
+
+- `model multiplier m = 1.0`
+- `input_tokens = 200`
+- `cached_input_tokens = 50`
+- `output_tokens = 10`
+- `reasoning_tokens = 0`
+
+Expected ET:
+
+```
+base_weighted_tokens = max(200-50,0) + 0.1×50 + 4×10 + 4×0 = 150 + 5 + 40 + 0 = 195
+effective_tokens = 1.0 × 195 = 195
+```
+
+#### ET-TV-002 (Three-Node Graph, Mixed Cached/Output Tokens)
+
+Input invocation set:
+
+1. Root: `m=2.0`, `I=500`, `C=200`, `O=120`, `R=0`
+2. Sub-agent A: `m=1.0`, `I=300`, `C=0`, `O=90`, `R=10`
+3. Sub-agent B: `m=2.0`, `I=150`, `C=50`, `O=80`, `R=0`
+
+Expected ET:
+
+```
+Root:      base = max(500-200,0) + 0.1×200 + 4×120 + 4×0  = 300 + 20 + 480 + 0 = 800;  ET = 2.0×800 = 1600
+Sub-agent: base = max(300-0,0)   + 0.1×0   + 4×90  + 4×10 = 300 + 0  + 360 + 40 = 700;  ET = 1.0×700 = 700
+Sub-agent: base = max(150-50,0)  + 0.1×50  + 4×80  + 4×0  = 100 + 5  + 320 + 0  = 425;  ET = 2.0×425 = 850
+ET_total = 1600 + 700 + 850 = 3150
+```
+
 ---
 
 ## Model Multiplier Registry
@@ -715,6 +782,17 @@ The table below maps the normative sections of this specification to the impleme
 | §7.1 OTel Attribute Requirements | OpenTelemetry span attribute emission for ET metrics | `pkg/cli/token_usage.go`, `pkg/cli/logs_run_processor.go` |
 | §8 Implementation Requirements | Completeness, determinism, versioning, partial visibility safeguards | `pkg/cli/effective_tokens.go`, `pkg/cli/forecast_montecarlo.go` |
 
+### §7.1 OTel Attribute Row-to-Code Mapping
+
+| §7.1 Attribute Key | Implementation Mapping |
+|---|---|
+| `llm.token.effective_total` | `pkg/cli/token_usage.go` → `TokenUsageSummary.TotalEffectiveTokens`, populated by `populateEffectiveTokensWithCustomWeights` |
+| `llm.token.input` | `pkg/cli/token_usage.go` → `TokenUsageEntry.InputTokens` and `ModelTokenUsage.InputTokens`, aggregated in `parseTokenUsageFile` |
+| `llm.token.output` | `pkg/cli/token_usage.go` → `TokenUsageEntry.OutputTokens` and `ModelTokenUsage.OutputTokens`, aggregated in `parseTokenUsageFile` |
+| `llm.token.cached_input` | `pkg/cli/token_usage.go` → `TokenUsageEntry.CacheReadTokens` and `ModelTokenUsage.CacheReadTokens`, aggregated in `parseTokenUsageFile` |
+| `llm.token.base_weighted` | `pkg/cli/effective_tokens.go` → base token weighting in `computeModelEffectiveTokensWithWeights` (pre-multiplier term) |
+| `llm.model.multiplier` | `pkg/cli/effective_tokens.go` → multiplier resolution in `computeModelEffectiveTokensWithWeights` (`mult` selection by model key/prefix) |
+
 ### §4–§8 Sync Procedure
 
 To keep the specification and implementation synchronized:
diff --git a/docs/src/content/docs/reference/forecast-specification.md b/docs/src/content/docs/reference/forecast-specification.md
@@ -171,6 +171,16 @@ yield = observed_runs_per_period × success_rate
 
 Where `success_rate = successful_run_count / total_sampled_run_count`.
 
+Example:
+
+If `successful_run_count = 18`, `total_sampled_run_count = 24`, and
+`observed_runs_per_period = 20`, then:
+
+```
+success_rate = 18 / 24 = 0.75
+yield = observed_runs_per_period × success_rate = 20 × 0.75 = 15
+```
+
 ### 3.10 Bootstrap Resampling
 
 An empirical resampling technique where individual observations are drawn with replacement from the observed sample. Used in Section 7 to model per-run token usage without parametric distribution assumptions.
@@ -189,6 +199,8 @@ A `.lock.yml` file located in `.github/workflows/` that declares a compiled agen
 gh aw forecast [workflow_id...] [flags]
 ```
 
+Security and operational safeguards for this command interface are defined in §10.7.
+
 ### 4.2 Positional Arguments
 
 | Argument | Type | Required | Description |
@@ -822,6 +834,36 @@ workflows):
   defined in Section 10.6; callers MUST treat its absence as equivalent to `false` (per
   §11.5 / **R-IMPL-041**, unknown fields in JSON output MUST be treated as ignorable).
 
+### 10.7 Safeguards
+
+#### 10.7.1 Threat Model
+
+- **Credential scope abuse**: Over-scoped credentials could allow unauthorized repository access.
+- **Artifact privacy leakage**: `aw_info.json` artifacts may contain operationally sensitive ET
+  metadata and prompt-adjacent context.
+- **Rate-limit abuse**: Aggressive polling or unrestricted retries can amplify API pressure and
+  trigger organizational throttling.
+
+#### 10.7.2 Required Mitigations
+
+- **Credential scope**: The forecast command accesses the GitHub Actions API using `gh` CLI
+  credentials. Token permissions MUST include only the minimum required scope (`actions:read` for
+  target repositories).
+- **Artifact privacy**: Implementations MUST NOT log raw artifact payloads at default verbosity and
+  SHOULD redact prompt-adjacent fields in diagnostic output.
+- **Rate-limit abuse controls**: Implementations MUST implement bounded retry/backoff behavior and
+  MUST stop retrying when the retry budget is exhausted.
+- **Remote repository access**: When `--repo` targets a repository the caller does not own, the
+  caller MUST have explicit read access. Implementations MUST NOT bypass repository access controls.
+- **JSON output handling**: The JSON schema can expose model and usage topology; operators SHOULD
+  treat it as internal data and apply least-privilege access controls.
+
+#### 10.7.3 Residual Risk
+
+Even with these safeguards, operators with valid read access can still infer workload intensity
+from forecast outputs. This residual risk is accepted and MUST be managed through repository
+visibility and access-governance controls.
+
 ---
 
 ## 11. Implementation Requirements
@@ -950,6 +992,9 @@ This section maps normative forecast requirements to implementation files.
 
 | Normative Area | Implementation File(s) |
 |---|---|
+| §4.5 Exit codes | `pkg/cli/forecast_command.go` |
+| §6 Data Sampling | `pkg/cli/forecast.go` |
+| §7 Monte Carlo Projection Engine | `pkg/cli/forecast_montecarlo.go` |
 | Monte Carlo engine (Poisson/Bootstrap/Bernoulli) | `pkg/cli/forecast_montecarlo.go` |
 | Forecast command orchestration and output fields | `pkg/cli/forecast.go`, `pkg/cli/forecast_command.go` |
 | Workflow discovery, rate-limit backoff, and run sampling | `pkg/cli/forecast.go` |
@@ -1044,35 +1089,9 @@ Conforming implementations SHOULD:
 2. Treat per-workflow 404/410 responses as recoverable partial failures.
 3. Continue processing unaffected workflows and emit a warning for each raced workflow.
 
-### Appendix F: Safeguards
-
-#### F.1 Threat Model
-
-- **Credential scope abuse**: Over-scoped credentials could allow unauthorized repository access.
-- **Artifact privacy leakage**: `aw_info.json` artifacts may contain operationally sensitive ET
-  metadata and prompt-adjacent context.
-- **Rate-limit abuse**: Aggressive polling or unrestricted retries can amplify API pressure and
-  trigger organizational throttling.
-
-#### F.2 Required Mitigations
-
-- **Credential scope**: The forecast command accesses the GitHub Actions API using `gh` CLI
-  credentials. Token permissions MUST include only the minimum required scope (`actions:read` for
-  target repositories).
-- **Artifact privacy**: Implementations MUST NOT log raw artifact payloads at default verbosity and
-  SHOULD redact prompt-adjacent fields in diagnostic output.
-- **Rate-limit abuse controls**: Implementations MUST implement bounded retry/backoff behavior and
-  MUST stop retrying when the retry budget is exhausted.
-- **Remote repository access**: When `--repo` targets a repository the caller does not own, the
-  caller MUST have explicit read access. Implementations MUST NOT bypass repository access controls.
-- **JSON output handling**: The JSON schema can expose model and usage topology; operators SHOULD
-  treat it as internal data and apply least-privilege access controls.
-
-#### F.3 Residual Risk
+### Appendix F: Safeguards (Moved)
 
-Even with these safeguards, operators with valid read access can still infer workload intensity
-from forecast outputs. This residual risk is accepted and MUST be managed through repository
-visibility and access-governance controls.
+Safeguard requirements for this specification are now defined in §10.7.
 
 ---
 
diff --git a/docs/src/content/docs/reference/frontmatter-hash-specification.md b/docs/src/content/docs/reference/frontmatter-hash-specification.md
@@ -240,6 +240,18 @@ hashes for any corpus member.
 
 ## Implementation Notes
 
+### Caller Operations
+
+`pkg/cli/hash_command.go` MUST invoke the hash API with the following operational sequence:
+
+1. The caller MUST resolve the target workflow markdown file path and fail with a descriptive error
+   when the file cannot be read.
+2. The caller MUST pass workflow content and repository path context to the frontmatter hash
+   implementation so imports can be traversed deterministically.
+3. The caller MUST return the computed 64-character lowercase SHA-256 hash string on success.
+4. The caller MUST surface deterministic hash-computation failures (parse/import/read errors) as
+   command errors without fallback hashing.
+
 ### Go Implementation
 
 The current Go implementation (`pkg/parser/frontmatter_hash.go`) uses a **text-based approach** that diverges from the field-selection model described in Section 2 ("Field Selection") of this specification:
@@ -318,7 +330,10 @@ The Go and JavaScript implementations must produce byte-for-byte identical canon
 
 Very large frontmatter payloads can cause excessive memory use and hash-computation latency during compilation and runtime verification. This can degrade CI reliability and increase stale-lock false positives due to timeout or resource pressure.
 
-**Mitigation**: Implementations SHOULD enforce a maximum cumulative frontmatter input size and MUST fail deterministically with a descriptive error when the limit is exceeded. A limit of 1 MiB for the combined normalized frontmatter input is RECOMMENDED unless repository-specific requirements justify a higher bound.
+**Mitigation**: Implementations MUST enforce a maximum cumulative normalized frontmatter input size
+of **1,048,576 bytes (1 MiB)** and MUST fail deterministically with a descriptive error when that
+limit is exceeded. This limit bounds worst-case memory/latency behavior while remaining well above
+typical workflow frontmatter sizes.
 
 ---
 
@@ -476,3 +491,15 @@ imports:
 
 # Import-based Workflow
 ```
+
+### FH-TV-NEG-001 (Oversized Input Rejection)
+
+Input description:
+
+- A workflow (including imported frontmatter content) whose cumulative normalized frontmatter input
+  exceeds **1,048,576 bytes**.
+
+Expected result:
+
+- Hash computation is rejected before digest generation.
+- Error text includes: `frontmatter input exceeds 1048576-byte limit`.
diff --git a/docs/src/content/docs/reference/fuzzy-schedule-specification.md b/docs/src/content/docs/reference/fuzzy-schedule-specification.md
@@ -1245,6 +1245,8 @@ This section maps the fuzzy schedule specification to implementation files.
 
 | Normative Area | Implementation File(s) |
 |---|---|
+| §3.1 Grammar (`schedule: daily around HH[:MM][am/pm][ timezone]`) | `pkg/parser/schedule_parser.go` (`parseFuzzyScheduleExpression`, tokenizer/grammar helpers) |
+| §6 Scattering algorithm | `pkg/parser/schedule_fuzzy_scatter.go` (`scatterDailyTime`, weighted slot selection and deterministic hashing) |
 | Frontmatter schedule parsing and grammar handling | `pkg/parser/schedule_parser.go` |
 | Deterministic fuzzy scattering and peak-minute avoidance | `pkg/parser/schedule_fuzzy_scatter.go` |
 | Parser/scatter conformance tests | `pkg/parser/schedule_parser_test.go`, `pkg/parser/schedule_fuzzy_scatter_test.go` |
diff --git a/docs/src/content/docs/reference/mcp-scripts-specification.md b/docs/src/content/docs/reference/mcp-scripts-specification.md
@@ -351,6 +351,12 @@ Implementations SHOULD validate:
 4. **Description Length**: Tool descriptions are clear and concise (recommended 10-200 characters)
 5. **Timeout Reasonableness**: Timeout values are reasonable for tool purpose (warn if >600 seconds)
 
+**Sync note (2026-05-25)**: SM-IS-01 enforcement was reviewed against the current runtime path.
+`actions/setup/js/mcp_scripts_mcp_server_http.cjs` calls
+`validateRequiredFields` in `actions/setup/js/mcp_scripts_validation.cjs`, which validates required
+presence but does not yet enforce a per-string 10KB limit. A follow-up implementation issue SHOULD
+be opened before treating SM-IS-01 as fully implemented.
+
 ---
 
 ## 5. Tool Execution
@@ -1486,6 +1492,14 @@ and run `go test ./pkg/workflow/...` to verify conformance.
 | §8 | Large Output Handling | `pkg/workflow/mcp_scripts_generator.go` (output truncation logic), `actions/setup/js/mcp_scripts_mcp_server_http.cjs` (HTTP transport output streaming) |
 | §9 | Integration with MCP Gateway | `pkg/workflow/mcp_scripts_renderer.go` (`renderMCPScriptsMCPConfigWithOptions`), `pkg/workflow/mcp_scripts_generator.go` (`GenerateMCPScriptsMCPServerScript`, `GenerateMCPScriptsToolsConfig`) |
 
+### Security Marker Sync Map
+
+| Marker | Implementation File(s) | Enforcement Path |
+|---|---|---|
+| SM-JS-01 | `pkg/workflow/mcp_scripts_generator.go`, `actions/setup/js/mcp_server_core.cjs` | `GenerateMCPScriptJavaScriptToolScript` emits per-tool JS handlers; `loadToolHandlers` in `mcp_server_core.cjs` executes handlers in isolated subprocesses |
+| SM-IS-01 | `actions/setup/js/mcp_scripts_mcp_server_http.cjs`, `actions/setup/js/mcp_scripts_validation.cjs` | `createMCPServer` → `validateRequiredFields` (presence checks only; 10KB string-length enforcement not currently implemented) |
+| SM-03 | `actions/setup/js/mcp_server_core.cjs`, `actions/setup/js/mcp_scripts_mcp_server_http.cjs` | Tool-call response path serializes handler output before returning MCP `content`; raw passthrough handling is centralized in server transport/handler pipeline |
+
 ### Implementation Notes
 
 - **§3 — Architecture**: The `MCPScriptsConfig` and `MCPScriptToolConfig` structs in