Skip to content

Commit cd692d1

Browse files
Copilotgh-aw-bot
andauthored
SPDD: close spec drift gaps across Effective Tokens, Forecast, Frontmatter Hash, Fuzzy Schedule, and MCP Scripts (#34719)
* Initial plan * docs: complete SPDD spec alignment updates Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
1 parent 9031e3e commit cd692d1

5 files changed

Lines changed: 171 additions & 31 deletions

File tree

docs/src/content/docs/reference/effective-tokens-specification.md

Lines changed: 80 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -361,8 +361,39 @@ When sub-agents are not fully observable, implementations MUST still report aggr
361361

362362
### 8.5 Safeguards
363363

364-
Implementations must prevent unbounded ET accumulation from producing non-finite or
365-
non-interoperable outputs.
364+
Implementations MUST apply the following safeguards to prevent unbounded ET accumulation from
365+
producing non-finite or non-interoperable outputs.
366+
367+
#### S-1: Overflow and Capping
368+
369+
**Threat**: Unbounded multi-invocation ET aggregation can exceed numeric interoperability limits and
370+
produce values that cannot be represented safely across systems.
371+
372+
**Mitigation**: Implementations MUST enforce the JavaScript-safe numeric ceiling and record
373+
deterministic overflow state when capping occurs, including the ceiling value in the emitted
374+
flag/error payload.
375+
376+
Normative requirements: **R-SAFE-002**, **R-SAFE-003**, **R-SAFE-003A**, **R-SAFE-004**
377+
378+
#### S-2: Non-Finite Numeric Rejection
379+
380+
**Threat**: `NaN`, `+Inf`, and `-Inf` values in multipliers or token class weights can silently
381+
corrupt ET outputs and break downstream serialization/aggregation.
382+
383+
**Mitigation**: Implementations MUST reject non-finite or invalid numeric registry values before ET
384+
computation begins.
385+
386+
Normative requirements: **R-SAFE-007**, **R-SAFE-008**
387+
388+
#### S-3: Registry Validation Failure Handling
389+
390+
**Threat**: Continuing ET computation after registry validation failure can produce inconsistent,
391+
partially parsed, or non-reproducible outputs.
392+
393+
**Mitigation**: Implementations MUST fail deterministically with field-level diagnostics and MUST NOT
394+
continue with partially parsed registry data.
395+
396+
Normative requirements: **R-SAFE-009**, **R-SAFE-010**
366397

367398
**R-SAFE-001**: ET aggregation logic **MUST** detect overflow and non-finite arithmetic states
368399
(`NaN`, `+Inf`, `-Inf`) before serializing output.
@@ -638,6 +669,42 @@ ET_total = Σ [ m_i × (max(I_i - C_i, 0) + 0.1 C_i + 4 O_i + 4 R_i) ]
638669

639670
ET values are derived from token usage metadata. Implementations SHOULD treat per-invocation token data as potentially sensitive since usage patterns may reveal information about system prompts, model configurations, or user behavior. Aggregate ET values suitable for observability dashboards SHOULD be separated from detailed per-invocation data in access-controlled reporting systems.
640671

672+
### Appendix D: ET Test Vectors
673+
674+
#### ET-TV-001 (Single Invocation Baseline)
675+
676+
Input:
677+
678+
- `model multiplier m = 1.0`
679+
- `input_tokens = 200`
680+
- `cached_input_tokens = 50`
681+
- `output_tokens = 10`
682+
- `reasoning_tokens = 0`
683+
684+
Expected ET:
685+
686+
```
687+
base_weighted_tokens = max(200-50,0) + 0.1×50 + 4×10 + 4×0 = 150 + 5 + 40 + 0 = 195
688+
effective_tokens = 1.0 × 195 = 195
689+
```
690+
691+
#### ET-TV-002 (Three-Node Graph, Mixed Cached/Output Tokens)
692+
693+
Input invocation set:
694+
695+
1. Root: `m=2.0`, `I=500`, `C=200`, `O=120`, `R=0`
696+
2. Sub-agent A: `m=1.0`, `I=300`, `C=0`, `O=90`, `R=10`
697+
3. Sub-agent B: `m=2.0`, `I=150`, `C=50`, `O=80`, `R=0`
698+
699+
Expected ET:
700+
701+
```
702+
Root: base = max(500-200,0) + 0.1×200 + 4×120 + 4×0 = 300 + 20 + 480 + 0 = 800; ET = 2.0×800 = 1600
703+
Sub-agent: base = max(300-0,0) + 0.1×0 + 4×90 + 4×10 = 300 + 0 + 360 + 40 = 700; ET = 1.0×700 = 700
704+
Sub-agent: base = max(150-50,0) + 0.1×50 + 4×80 + 4×0 = 100 + 5 + 320 + 0 = 425; ET = 2.0×425 = 850
705+
ET_total = 1600 + 700 + 850 = 3150
706+
```
707+
641708
---
642709

643710
## Model Multiplier Registry
@@ -715,6 +782,17 @@ The table below maps the normative sections of this specification to the impleme
715782
| §7.1 OTel Attribute Requirements | OpenTelemetry span attribute emission for ET metrics | `pkg/cli/token_usage.go`, `pkg/cli/logs_run_processor.go` |
716783
| §8 Implementation Requirements | Completeness, determinism, versioning, partial visibility safeguards | `pkg/cli/effective_tokens.go`, `pkg/cli/forecast_montecarlo.go` |
717784

785+
### §7.1 OTel Attribute Row-to-Code Mapping
786+
787+
| §7.1 Attribute Key | Implementation Mapping |
788+
|---|---|
789+
| `llm.token.effective_total` | `pkg/cli/token_usage.go``TokenUsageSummary.TotalEffectiveTokens`, populated by `populateEffectiveTokensWithCustomWeights` |
790+
| `llm.token.input` | `pkg/cli/token_usage.go``TokenUsageEntry.InputTokens` and `ModelTokenUsage.InputTokens`, aggregated in `parseTokenUsageFile` |
791+
| `llm.token.output` | `pkg/cli/token_usage.go``TokenUsageEntry.OutputTokens` and `ModelTokenUsage.OutputTokens`, aggregated in `parseTokenUsageFile` |
792+
| `llm.token.cached_input` | `pkg/cli/token_usage.go``TokenUsageEntry.CacheReadTokens` and `ModelTokenUsage.CacheReadTokens`, aggregated in `parseTokenUsageFile` |
793+
| `llm.token.base_weighted` | `pkg/cli/effective_tokens.go` → base token weighting in `computeModelEffectiveTokensWithWeights` (pre-multiplier term) |
794+
| `llm.model.multiplier` | `pkg/cli/effective_tokens.go` → multiplier resolution in `computeModelEffectiveTokensWithWeights` (`mult` selection by model key/prefix) |
795+
718796
### §4–§8 Sync Procedure
719797

720798
To keep the specification and implementation synchronized:

docs/src/content/docs/reference/forecast-specification.md

Lines changed: 47 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,16 @@ yield = observed_runs_per_period × success_rate
171171

172172
Where `success_rate = successful_run_count / total_sampled_run_count`.
173173

174+
Example:
175+
176+
If `successful_run_count = 18`, `total_sampled_run_count = 24`, and
177+
`observed_runs_per_period = 20`, then:
178+
179+
```
180+
success_rate = 18 / 24 = 0.75
181+
yield = observed_runs_per_period × success_rate = 20 × 0.75 = 15
182+
```
183+
174184
### 3.10 Bootstrap Resampling
175185

176186
An empirical resampling technique where individual observations are drawn with replacement from the observed sample. Used in Section 7 to model per-run token usage without parametric distribution assumptions.
@@ -189,6 +199,8 @@ A `.lock.yml` file located in `.github/workflows/` that declares a compiled agen
189199
gh aw forecast [workflow_id...] [flags]
190200
```
191201

202+
Security and operational safeguards for this command interface are defined in §10.7.
203+
192204
### 4.2 Positional Arguments
193205

194206
| Argument | Type | Required | Description |
@@ -822,6 +834,36 @@ workflows):
822834
defined in Section 10.6; callers MUST treat its absence as equivalent to `false` (per
823835
§11.5 / **R-IMPL-041**, unknown fields in JSON output MUST be treated as ignorable).
824836

837+
### 10.7 Safeguards
838+
839+
#### 10.7.1 Threat Model
840+
841+
- **Credential scope abuse**: Over-scoped credentials could allow unauthorized repository access.
842+
- **Artifact privacy leakage**: `aw_info.json` artifacts may contain operationally sensitive ET
843+
metadata and prompt-adjacent context.
844+
- **Rate-limit abuse**: Aggressive polling or unrestricted retries can amplify API pressure and
845+
trigger organizational throttling.
846+
847+
#### 10.7.2 Required Mitigations
848+
849+
- **Credential scope**: The forecast command accesses the GitHub Actions API using `gh` CLI
850+
credentials. Token permissions MUST include only the minimum required scope (`actions:read` for
851+
target repositories).
852+
- **Artifact privacy**: Implementations MUST NOT log raw artifact payloads at default verbosity and
853+
SHOULD redact prompt-adjacent fields in diagnostic output.
854+
- **Rate-limit abuse controls**: Implementations MUST implement bounded retry/backoff behavior and
855+
MUST stop retrying when the retry budget is exhausted.
856+
- **Remote repository access**: When `--repo` targets a repository the caller does not own, the
857+
caller MUST have explicit read access. Implementations MUST NOT bypass repository access controls.
858+
- **JSON output handling**: The JSON schema can expose model and usage topology; operators SHOULD
859+
treat it as internal data and apply least-privilege access controls.
860+
861+
#### 10.7.3 Residual Risk
862+
863+
Even with these safeguards, operators with valid read access can still infer workload intensity
864+
from forecast outputs. This residual risk is accepted and MUST be managed through repository
865+
visibility and access-governance controls.
866+
825867
---
826868

827869
## 11. Implementation Requirements
@@ -950,6 +992,9 @@ This section maps normative forecast requirements to implementation files.
950992

951993
| Normative Area | Implementation File(s) |
952994
|---|---|
995+
| §4.5 Exit codes | `pkg/cli/forecast_command.go` |
996+
| §6 Data Sampling | `pkg/cli/forecast.go` |
997+
| §7 Monte Carlo Projection Engine | `pkg/cli/forecast_montecarlo.go` |
953998
| Monte Carlo engine (Poisson/Bootstrap/Bernoulli) | `pkg/cli/forecast_montecarlo.go` |
954999
| Forecast command orchestration and output fields | `pkg/cli/forecast.go`, `pkg/cli/forecast_command.go` |
9551000
| Workflow discovery, rate-limit backoff, and run sampling | `pkg/cli/forecast.go` |
@@ -1044,35 +1089,9 @@ Conforming implementations SHOULD:
10441089
2. Treat per-workflow 404/410 responses as recoverable partial failures.
10451090
3. Continue processing unaffected workflows and emit a warning for each raced workflow.
10461091

1047-
### Appendix F: Safeguards
1048-
1049-
#### F.1 Threat Model
1050-
1051-
- **Credential scope abuse**: Over-scoped credentials could allow unauthorized repository access.
1052-
- **Artifact privacy leakage**: `aw_info.json` artifacts may contain operationally sensitive ET
1053-
metadata and prompt-adjacent context.
1054-
- **Rate-limit abuse**: Aggressive polling or unrestricted retries can amplify API pressure and
1055-
trigger organizational throttling.
1056-
1057-
#### F.2 Required Mitigations
1058-
1059-
- **Credential scope**: The forecast command accesses the GitHub Actions API using `gh` CLI
1060-
credentials. Token permissions MUST include only the minimum required scope (`actions:read` for
1061-
target repositories).
1062-
- **Artifact privacy**: Implementations MUST NOT log raw artifact payloads at default verbosity and
1063-
SHOULD redact prompt-adjacent fields in diagnostic output.
1064-
- **Rate-limit abuse controls**: Implementations MUST implement bounded retry/backoff behavior and
1065-
MUST stop retrying when the retry budget is exhausted.
1066-
- **Remote repository access**: When `--repo` targets a repository the caller does not own, the
1067-
caller MUST have explicit read access. Implementations MUST NOT bypass repository access controls.
1068-
- **JSON output handling**: The JSON schema can expose model and usage topology; operators SHOULD
1069-
treat it as internal data and apply least-privilege access controls.
1070-
1071-
#### F.3 Residual Risk
1092+
### Appendix F: Safeguards (Moved)
10721093

1073-
Even with these safeguards, operators with valid read access can still infer workload intensity
1074-
from forecast outputs. This residual risk is accepted and MUST be managed through repository
1075-
visibility and access-governance controls.
1094+
Safeguard requirements for this specification are now defined in §10.7.
10761095

10771096
---
10781097

docs/src/content/docs/reference/frontmatter-hash-specification.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,18 @@ hashes for any corpus member.
240240

241241
## Implementation Notes
242242

243+
### Caller Operations
244+
245+
`pkg/cli/hash_command.go` MUST invoke the hash API with the following operational sequence:
246+
247+
1. The caller MUST resolve the target workflow markdown file path and fail with a descriptive error
248+
when the file cannot be read.
249+
2. The caller MUST pass workflow content and repository path context to the frontmatter hash
250+
implementation so imports can be traversed deterministically.
251+
3. The caller MUST return the computed 64-character lowercase SHA-256 hash string on success.
252+
4. The caller MUST surface deterministic hash-computation failures (parse/import/read errors) as
253+
command errors without fallback hashing.
254+
243255
### Go Implementation
244256

245257
The current Go implementation (`pkg/parser/frontmatter_hash.go`) uses a **text-based approach** that diverges from the field-selection model described in Section 2 ("Field Selection") of this specification:
@@ -318,7 +330,10 @@ The Go and JavaScript implementations must produce byte-for-byte identical canon
318330

319331
Very large frontmatter payloads can cause excessive memory use and hash-computation latency during compilation and runtime verification. This can degrade CI reliability and increase stale-lock false positives due to timeout or resource pressure.
320332

321-
**Mitigation**: Implementations SHOULD enforce a maximum cumulative frontmatter input size and MUST fail deterministically with a descriptive error when the limit is exceeded. A limit of 1 MiB for the combined normalized frontmatter input is RECOMMENDED unless repository-specific requirements justify a higher bound.
333+
**Mitigation**: Implementations MUST enforce a maximum cumulative normalized frontmatter input size
334+
of **1,048,576 bytes (1 MiB)** and MUST fail deterministically with a descriptive error when that
335+
limit is exceeded. This limit bounds worst-case memory/latency behavior while remaining well above
336+
typical workflow frontmatter sizes.
322337

323338
---
324339

@@ -476,3 +491,15 @@ imports:
476491

477492
# Import-based Workflow
478493
```
494+
495+
### FH-TV-NEG-001 (Oversized Input Rejection)
496+
497+
Input description:
498+
499+
- A workflow (including imported frontmatter content) whose cumulative normalized frontmatter input
500+
exceeds **1,048,576 bytes**.
501+
502+
Expected result:
503+
504+
- Hash computation is rejected before digest generation.
505+
- Error text includes: `frontmatter input exceeds 1048576-byte limit`.

docs/src/content/docs/reference/fuzzy-schedule-specification.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1245,6 +1245,8 @@ This section maps the fuzzy schedule specification to implementation files.
12451245

12461246
| Normative Area | Implementation File(s) |
12471247
|---|---|
1248+
| §3.1 Grammar (`schedule: daily around HH[:MM][am/pm][ timezone]`) | `pkg/parser/schedule_parser.go` (`parseFuzzyScheduleExpression`, tokenizer/grammar helpers) |
1249+
| §6 Scattering algorithm | `pkg/parser/schedule_fuzzy_scatter.go` (`scatterDailyTime`, weighted slot selection and deterministic hashing) |
12481250
| Frontmatter schedule parsing and grammar handling | `pkg/parser/schedule_parser.go` |
12491251
| Deterministic fuzzy scattering and peak-minute avoidance | `pkg/parser/schedule_fuzzy_scatter.go` |
12501252
| Parser/scatter conformance tests | `pkg/parser/schedule_parser_test.go`, `pkg/parser/schedule_fuzzy_scatter_test.go` |

docs/src/content/docs/reference/mcp-scripts-specification.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -351,6 +351,12 @@ Implementations SHOULD validate:
351351
4. **Description Length**: Tool descriptions are clear and concise (recommended 10-200 characters)
352352
5. **Timeout Reasonableness**: Timeout values are reasonable for tool purpose (warn if >600 seconds)
353353

354+
**Sync note (2026-05-25)**: SM-IS-01 enforcement was reviewed against the current runtime path.
355+
`actions/setup/js/mcp_scripts_mcp_server_http.cjs` calls
356+
`validateRequiredFields` in `actions/setup/js/mcp_scripts_validation.cjs`, which validates required
357+
presence but does not yet enforce a per-string 10KB limit. A follow-up implementation issue SHOULD
358+
be opened before treating SM-IS-01 as fully implemented.
359+
354360
---
355361

356362
## 5. Tool Execution
@@ -1486,6 +1492,14 @@ and run `go test ./pkg/workflow/...` to verify conformance.
14861492
| §8 | Large Output Handling | `pkg/workflow/mcp_scripts_generator.go` (output truncation logic), `actions/setup/js/mcp_scripts_mcp_server_http.cjs` (HTTP transport output streaming) |
14871493
| §9 | Integration with MCP Gateway | `pkg/workflow/mcp_scripts_renderer.go` (`renderMCPScriptsMCPConfigWithOptions`), `pkg/workflow/mcp_scripts_generator.go` (`GenerateMCPScriptsMCPServerScript`, `GenerateMCPScriptsToolsConfig`) |
14881494

1495+
### Security Marker Sync Map
1496+
1497+
| Marker | Implementation File(s) | Enforcement Path |
1498+
|---|---|---|
1499+
| SM-JS-01 | `pkg/workflow/mcp_scripts_generator.go`, `actions/setup/js/mcp_server_core.cjs` | `GenerateMCPScriptJavaScriptToolScript` emits per-tool JS handlers; `loadToolHandlers` in `mcp_server_core.cjs` executes handlers in isolated subprocesses |
1500+
| SM-IS-01 | `actions/setup/js/mcp_scripts_mcp_server_http.cjs`, `actions/setup/js/mcp_scripts_validation.cjs` | `createMCPServer``validateRequiredFields` (presence checks only; 10KB string-length enforcement not currently implemented) |
1501+
| SM-03 | `actions/setup/js/mcp_server_core.cjs`, `actions/setup/js/mcp_scripts_mcp_server_http.cjs` | Tool-call response path serializes handler output before returning MCP `content`; raw passthrough handling is centralized in server transport/handler pipeline |
1502+
14891503
### Implementation Notes
14901504

14911505
- **§3 — Architecture**: The `MCPScriptsConfig` and `MCPScriptToolConfig` structs in

0 commit comments

Comments
 (0)