[Proposal] Optional AI-Enhanced Assessment Checks

# Proposal: Optional AI-Enhanced Assessment Checks

## Summary

Add optional AI support to improve the quality and accuracy of assessment checks that currently rely on brittle heuristics or require manual human review, and to provide post-evaluation remediation guidance when useful. Users opt in per evaluation with supported provider credentials. AI is never required.

Deterministic evaluation remains the baseline behavior. AI is an optional enhancement layer used only when configured and only for checks where deterministic logic is inconclusive or where advisory remediation guidance would be useful.

## Non-Goals

This proposal does not attempt to do the following:

- Replace deterministic checks as the primary evaluation model.
- Introduce AI as a mandatory dependency for running the plugin.
- Commit to identical behavior across every provider or model family in the initial implementation.
- Solve broader evidence discovery or unrelated workflow changes outside the scope of this plugin proposal.

## Motivation

Of the 57 assessment steps currently implemented in the plugin:

- 2 unconditionally return `NeedsReview`, offering zero automated analysis:
  - `DocumentsTestExecution` (`OSPS-QA-06.02`)
  - `DocumentsTestMaintenancePolicy` (`OSPS-QA-06.03`)
- 8 rely on brittle heuristics that miss valid cases or produce false results.
- 18 currently return `NotRun` and may have future candidates for AI-assisted implementation.

Examples of current gaps:

| Assessment ID | Current Approach | Limitation |
|---|---|---|
| `OSPS-BR-04.01` | Case-sensitive match for `"Change Log"` or `"Changelog"` | Misses `What's Changed`, `Release Notes`, `CHANGELOG`, or links to `CHANGELOG.md` |
| `OSPS-DO-04.01` | Exact heading match for `"Support"` in README | Misses `Getting Help`, `Community Support`, and similar headings |
| `OSPS-SA-01.01` | Filename and directory matching | Finding `docs/` does not prove design documentation exists |
| `OSPS-GV-03.01` | File existence check only | Does not verify quality or completeness of a contribution guide |
| `OSPS-QA-06.02` | Always returns `NeedsReview` | No automated analysis at all |
| `OSPS-QA-06.03` | Always returns `NeedsReview` | No automated analysis at all |

Additionally, the output today is a flat list of per-check verdicts with no advisory remediation guidance. Users who see failed or `NeedsReview` checks must research the OSPS Baseline themselves to understand what to fix and how.

## Proposed Capabilities

### 1. Per-step AI enhancement

Improve selected checks by adding AI when the deterministic result is inconclusive.

- If AI is not configured, behavior is unchanged.
- If AI is configured, deterministic logic still runs first.
- AI is invoked only when a check remains inconclusive or when the heuristic is likely to miss valid evidence.
- Each call returns a simple structured response.
- On timeout or error, the step falls back to the deterministic result.
- AI-assisted results are clearly labeled in output.
- Each AI-assisted check defines which files it sends, how large content is trimmed, and how prompt size is kept bounded.

Initial high-impact candidates:

| Assessment ID | AI-enhanced behavior |
|---|---|
| `OSPS-QA-06.02` | Analyze README and CONTRIBUTING content for test execution instructions |
| `OSPS-QA-06.03` | Analyze repository docs for test maintenance expectations |
| `OSPS-BR-04.01` | Detect changelog content semantically, not just by exact heading match |
| `OSPS-SA-01.01` | Verify whether candidate docs actually contain architecture or design content |

### 2. AI-assisted remediation guidance

After deterministic checks complete, the plugin can optionally generate advisory remediation guidance for `Failed` and `NeedsReview` results.

This capability does not override verdicts or re-evaluate repository evidence. It turns existing results into more actionable guidance by generating:

- a short explanation of why the requirement likely failed
- specific remediation steps tied to the OSPS requirement
- GitHub-specific guidance where relevant
- a small set of recommended next actions

The input to this pass is intentionally narrow: requirement ID, current verdict, existing step message, recommendation text from the control catalog, and minimal supporting metadata when needed.

## Structured Outputs

Per-step AI checks and remediation guidance can use structured outputs when the configured provider supports them. This is the preferred response mode because it makes responses easier to validate and parse.

Initial per-step schema:

```json
{
  "type": "object",
  "properties": {
    "verdict": {
      "type": "string",
      "enum": ["PASS", "FAIL", "UNCERTAIN"]
    },
    "confidence": {
      "type": "number",
      "minimum": 0.0,
      "maximum": 1.0
    },
    "reasoning": {
      "type": "string"
    },
    "evidence_location": {
      "type": "string"
    }
  },
  "required": ["verdict", "confidence", "reasoning"],
  "additionalProperties": false
}
```

## Provider Compatibility

Both capabilities share the same AI client infrastructure, and the SDK will be provider-neutral rather than centered on one AI provider.

The initial support model stays narrow:

| Backend | Scope |
|---|---|
| OpenAI | First-class native adapter |
| Anthropic | First-class native adapter |
| Gemini | First-class native adapter |

The SDK will expose one provider-neutral `Analyze(prompt, content, schema)` contract while keeping the initial backend list intentionally small. Structured outputs, authentication, request formats, and tool-calling behavior vary by provider, so broader compatibility should be added only after the first end-to-end checks prove useful.

## User Configuration

AI support remains fully opt-in through plugin configuration:

```yaml
plugins:
  github-repo:
    vars:
      owner: "my-org"
      repo: "my-repo"
      token: "ghp_..."
      ai_provider: "openai" # openai | anthropic | gemini
      ai_api_key: "sk-..."
      ai_model: "gpt-4o"
      ai_timeout: "30s"
      ai_max_tokens: 256
      ai_max_calls: 20
```

AI is disabled unless `ai_provider`, `ai_model`, and `ai_api_key` are set. In dry-run mode, only `ai_provider` and `ai_model` are required.

## Where the Work Lives

The implementation spans two repos:

| Layer | Repo | What |
|---|---|---|
| AI client infrastructure | `privateerproj/privateer-sdk` | Provider-neutral `AIClient` with native adapters for OpenAI, Anthropic, and Gemini |
| Per-step AI enhancements | `ossf/pvtr-github-repo-scanner` | Step-specific prompts, content extraction, fallback behavior, and `[AI-Assisted]` result labeling |
| Post-evaluation remediation | `ossf/pvtr-github-repo-scanner` | Advisory remediation guidance generated from evaluation results |

## Security Considerations

- API keys are handled like GitHub tokens: read from config and never logged.
- Only publicly accessible repository content and evaluation results are included in prompts.
- AI responses are constrained to structured verdicts for per-step checks.
- AI calls go only to the configured provider endpoint for the selected native backend.

## Cost Considerations

- AI is invoked only when configured and only for inconclusive checks or remediation guidance.
- Focused prompts and bounded `ai_max_tokens` keep calls constrained.
- `ai_max_calls` provides a hard cap per evaluation.

## Implementation Phases

### Phase 1: AI client infrastructure (`privateer-sdk`)

- Add provider-neutral AI client configuration and backend selection
- Add `Analyze(prompt, content, schema)` with structured output support when available and prompt-shaped JSON fallback otherwise
- Add native adapters for OpenAI, Anthropic, and Gemini
- Add shared handling for common failures so all checks fall back the same way
- Add result metadata so AI-assisted results show which provider, model, prompt version, and schema version were used
- Add call-budget controls with `ai_max_calls`
- Add `--dry-run-ai` for prompt inspection and cost estimation without API calls

### Phase 2: High-impact per-step checks (`pvtr-github-repo-scanner`)

- `DocumentsTestExecution`
- `DocumentsTestMaintenancePolicy`
- `EnsureLatestReleaseHasChangelog`
- `HasDesignDocumentation`

### Phase 3: AI-assisted remediation guidance (`pvtr-github-repo-scanner`)

- Add a post-processing pass over the output payload
- Generate advisory remediation guidance for failed and `NeedsReview` results
- Keep deterministic results unchanged

### Phase 4: Medium-impact per-step checks

- `HasSupportDocs`
- `HasContributionGuide`
- `NoBinariesInRepo`
- `CicdSanitizedInputParameters`

### Phase 5: Audit `NotRun` checks for AI-first candidates

Some `NotRun` checks may be better implemented as AI-first checks rather than heuristic checks. That audit should happen only after the first phases prove useful.

## Open Questions

1. Should AI-assisted results be weighted differently from deterministic results in aggregate scoring?
2. Should remediation output live in a separate companion file or be embedded into the standard output?
3. What is the right default budget for `ai_max_calls` once real usage data exists?

Assessment ID	Current Approach	Limitation
`OSPS-BR-04.01`	Case-sensitive match for `"Change Log"` or `"Changelog"`	Misses `What's Changed`, `Release Notes`, `CHANGELOG`, or links to `CHANGELOG.md`
`OSPS-DO-04.01`	Exact heading match for `"Support"` in README	Misses `Getting Help`, `Community Support`, and similar headings
`OSPS-SA-01.01`	Filename and directory matching	Finding `docs/` does not prove design documentation exists
`OSPS-GV-03.01`	File existence check only	Does not verify quality or completeness of a contribution guide
`OSPS-QA-06.02`	Always returns `NeedsReview`	No automated analysis at all
`OSPS-QA-06.03`	Always returns `NeedsReview`	No automated analysis at all

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Optional AI-Enhanced Assessment Checks #273

Proposal: Optional AI-Enhanced Assessment Checks

Summary

Non-Goals

Motivation

Proposed Capabilities

1. Per-step AI enhancement

2. AI-assisted remediation guidance

Structured Outputs

Provider Compatibility

User Configuration

Where the Work Lives

Security Considerations

Cost Considerations

Implementation Phases

Phase 1: AI client infrastructure (`privateer-sdk`)

Phase 2: High-impact per-step checks (`pvtr-github-repo-scanner`)

Phase 3: AI-assisted remediation guidance (`pvtr-github-repo-scanner`)

Phase 4: Medium-impact per-step checks

Phase 5: Audit `NotRun` checks for AI-first candidates

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assessment ID	AI-enhanced behavior
`OSPS-QA-06.02`	Analyze README and CONTRIBUTING content for test execution instructions
`OSPS-QA-06.03`	Analyze repository docs for test maintenance expectations
`OSPS-BR-04.01`	Detect changelog content semantically, not just by exact heading match
`OSPS-SA-01.01`	Verify whether candidate docs actually contain architecture or design content

Backend	Scope
OpenAI	First-class native adapter
Anthropic	First-class native adapter
Gemini	First-class native adapter

Layer	Repo	What
AI client infrastructure	`privateerproj/privateer-sdk`	Provider-neutral `AIClient` with native adapters for OpenAI, Anthropic, and Gemini
Per-step AI enhancements	`ossf/pvtr-github-repo-scanner`	Step-specific prompts, content extraction, fallback behavior, and `[AI-Assisted]` result labeling
Post-evaluation remediation	`ossf/pvtr-github-repo-scanner`	Advisory remediation guidance generated from evaluation results

[Proposal] Optional AI-Enhanced Assessment Checks #273

Description

Proposal: Optional AI-Enhanced Assessment Checks

Summary

Non-Goals

Motivation

Proposed Capabilities

1. Per-step AI enhancement

2. AI-assisted remediation guidance

Structured Outputs

Provider Compatibility

User Configuration

Where the Work Lives

Security Considerations

Cost Considerations

Implementation Phases

Phase 1: AI client infrastructure (privateer-sdk)

Phase 2: High-impact per-step checks (pvtr-github-repo-scanner)

Phase 3: AI-assisted remediation guidance (pvtr-github-repo-scanner)

Phase 4: Medium-impact per-step checks

Phase 5: Audit NotRun checks for AI-first candidates

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Phase 1: AI client infrastructure (`privateer-sdk`)

Phase 2: High-impact per-step checks (`pvtr-github-repo-scanner`)

Phase 3: AI-assisted remediation guidance (`pvtr-github-repo-scanner`)

Phase 5: Audit `NotRun` checks for AI-first candidates