Skip to content

Latest commit

 

History

History
172 lines (134 loc) · 6 KB

File metadata and controls

172 lines (134 loc) · 6 KB

GRC Data Contracts

The toolkit's Findings cache is generated by connectors. Everything else in grc-data/ is user-owned state: metrics, risks, exceptions, vendor records, policy lifecycle metadata, and any hand-curated history you want under version control.

This page defines the directory conventions that reporting, dashboards, and program-management workflows are expected to read.

Layout

grc-data/
├── metrics/
│   └── *.json | *.yaml
├── risks/
│   └── *.json | *.yaml
├── exceptions/
│   └── *.json | *.yaml
├── vendors/
│   └── *.json | *.yaml
├── policies/
│   └── *.json | *.yaml
└── incidents/
    └── *.md
  • metrics/ stores time-series KPI and KRI rows.
  • risks/ stores one risk register entry per file.
  • exceptions/ stores compensating-control or temporary exception records.
  • vendors/ stores vendor inventory and review metadata.
  • policies/ stores lifecycle metadata for policy documents that live elsewhere in the repo.
  • incidents/ remains Markdown by convention because postmortems are primarily narrative artifacts.

Schemas

Each JSON file in grc-data/ should match one of the sibling schemas under schemas/:

Directory Schema
grc-data/metrics/ schemas/metric.schema.json
grc-data/risks/ schemas/risk.schema.json
grc-data/exceptions/ schemas/exception.schema.json
grc-data/vendors/ schemas/vendor.schema.json
grc-data/policies/ schemas/policy.schema.json

YAML is acceptable for operator-managed data as long as it parses to the same object shape as the JSON schema. The CI contract fixtures in tests/fixtures/ use JSON so the schemas stay easy to validate with ajv.

Reporting-Friendly Metric IDs

These IDs are not mandatory, but they make the new reporting workflows much easier to compose consistently:

  • automation.coverage_pct
  • automation.controls_automated
  • automation.controls_manual
  • automation.controls_total
  • findings.open_critical
  • findings.open_high
  • risk.residual_total
  • policy.review_overdue

Use dimensions to avoid inventing one metric ID per framework, source, or business unit. Example dimensions:

{
  "framework": "soc2",
  "source": "github-inspector",
  "business_unit": "platform"
}

Risk Register Expectations

Reporting workflows assume the risk register carries:

  • an accountable owner
  • inherent likelihood and impact
  • residual likelihood and impact when known
  • treatment status
  • linked findings or controls when the risk is rooted in current evidence

If you do not have residual scoring yet, omit the residual block rather than inventing numbers. The reporting layer should name that gap instead of faking precision.

Validation

Run the full contract suite locally:

npm run test:contract

Today that validates the committed JSON fixtures for Findings plus the new user-owned state schemas. If you manage grc-data/ in YAML, validate the JSON equivalent before shipping automation that depends on it.

Bootstrap Automation Snapshots

To seed the reporting workflow with real automation history, use:

node plugins/grc-engineer/scripts/record-automation-metrics.js soc2 --controls-total=64 --controls-automated=22 --window-label=2026-W16

For framework plugins that publish framework_metadata.framework_controls_mapped, operators can skip hand-counting the total and provide only the observed automated count:

node plugins/grc-engineer/scripts/record-automation-metrics.js soc2 --controls-automated=22 --from-framework-metadata --window-label=2026-W16

Rows from this path still use measurement_scope=operator-observed. Their metadata records the plugin manifest used as the source for controls_total, so reports can distinguish it from both fully manual snapshots and FedRAMP tooling-capability baselines.

If you want a tooling-capability baseline for FedRAMP evidence coverage, the writer can derive it from the evidence collector config:

node plugins/grc-engineer/scripts/record-automation-metrics.js fedramp-moderate aws --window-label=2026-W16

Treat derived rows as a capability baseline. For leadership reporting, prefer operator-observed counts unless you have validated that the automation is truly live in the target environment.

For scheduler-friendly runs across multiple frameworks, use the bundled example config at plugins/grc-engineer/examples/automation-metrics.yaml with:

node plugins/grc-engineer/scripts/record-automation-metrics.js \
  --config=plugins/grc-engineer/examples/automation-metrics.yaml \
  --window-label=current-week

Scheduled Automation Snapshots

PR #54 introduced the reporting flows that consume automation-history metrics. To keep those reports useful without relying on someone to run the writer by hand, copy the scheduled GitHub Actions template at plugins/grc-engineer/examples/github-actions/automation-metrics-snapshot.yml into .github/workflows/ in the adopting repo.

The template runs record-automation-metrics.js in batch mode with --config=./automation-metrics.yaml, uploads generated grc-data/metrics/ files as an artifact, and opens a pull request when snapshots changed. Reviewers should check:

  • the window.label matches the intended reporting period
  • measurement_scope: tooling-capability rows are treated as toolkit coverage potential, not as proof that automation ran in production
  • measurement_scope: operator-observed rows were validated against the target environment before they feed leadership reporting
  • unexpected deletions or count swings are explained before merge

Teams with mature controls around generated GRC data may change the final step to commit directly to the default branch. If they do, keep the artifact upload or another audit trail so reviewers can reconstruct what the scheduled run generated and when.