Proposal: Optional AI-Enhanced Assessment Checks
Summary
Add optional AI support to improve the quality and accuracy of assessment checks that currently rely on brittle heuristics or require manual human review, and to provide post-evaluation remediation guidance when useful. Users opt in per evaluation with supported provider credentials. AI is never required.
Deterministic evaluation remains the baseline behavior. AI is an optional enhancement layer used only when configured and only for checks where deterministic logic is inconclusive or where advisory remediation guidance would be useful.
Non-Goals
This proposal does not attempt to do the following:
- Replace deterministic checks as the primary evaluation model.
- Introduce AI as a mandatory dependency for running the plugin.
- Commit to identical behavior across every provider or model family in the initial implementation.
- Solve broader evidence discovery or unrelated workflow changes outside the scope of this plugin proposal.
Motivation
Of the 57 assessment steps currently implemented in the plugin:
- 2 unconditionally return
NeedsReview, offering zero automated analysis:
DocumentsTestExecution (OSPS-QA-06.02)
DocumentsTestMaintenancePolicy (OSPS-QA-06.03)
- 8 rely on brittle heuristics that miss valid cases or produce false results.
- 18 currently return
NotRun and may have future candidates for AI-assisted implementation.
Examples of current gaps:
| Assessment ID |
Current Approach |
Limitation |
OSPS-BR-04.01 |
Case-sensitive match for "Change Log" or "Changelog" |
Misses What's Changed, Release Notes, CHANGELOG, or links to CHANGELOG.md |
OSPS-DO-04.01 |
Exact heading match for "Support" in README |
Misses Getting Help, Community Support, and similar headings |
OSPS-SA-01.01 |
Filename and directory matching |
Finding docs/ does not prove design documentation exists |
OSPS-GV-03.01 |
File existence check only |
Does not verify quality or completeness of a contribution guide |
OSPS-QA-06.02 |
Always returns NeedsReview |
No automated analysis at all |
OSPS-QA-06.03 |
Always returns NeedsReview |
No automated analysis at all |
Additionally, the output today is a flat list of per-check verdicts with no advisory remediation guidance. Users who see failed or NeedsReview checks must research the OSPS Baseline themselves to understand what to fix and how.
Proposed Capabilities
1. Per-step AI enhancement
Improve selected checks by adding AI when the deterministic result is inconclusive.
- If AI is not configured, behavior is unchanged.
- If AI is configured, deterministic logic still runs first.
- AI is invoked only when a check remains inconclusive or when the heuristic is likely to miss valid evidence.
- Each call returns a simple structured response.
- On timeout or error, the step falls back to the deterministic result.
- AI-assisted results are clearly labeled in output.
- Each AI-assisted check defines which files it sends, how large content is trimmed, and how prompt size is kept bounded.
Initial high-impact candidates:
| Assessment ID |
AI-enhanced behavior |
OSPS-QA-06.02 |
Analyze README and CONTRIBUTING content for test execution instructions |
OSPS-QA-06.03 |
Analyze repository docs for test maintenance expectations |
OSPS-BR-04.01 |
Detect changelog content semantically, not just by exact heading match |
OSPS-SA-01.01 |
Verify whether candidate docs actually contain architecture or design content |
2. AI-assisted remediation guidance
After deterministic checks complete, the plugin can optionally generate advisory remediation guidance for Failed and NeedsReview results.
This capability does not override verdicts or re-evaluate repository evidence. It turns existing results into more actionable guidance by generating:
- a short explanation of why the requirement likely failed
- specific remediation steps tied to the OSPS requirement
- GitHub-specific guidance where relevant
- a small set of recommended next actions
The input to this pass is intentionally narrow: requirement ID, current verdict, existing step message, recommendation text from the control catalog, and minimal supporting metadata when needed.
Structured Outputs
Per-step AI checks and remediation guidance can use structured outputs when the configured provider supports them. This is the preferred response mode because it makes responses easier to validate and parse.
Initial per-step schema:
{
"type": "object",
"properties": {
"verdict": {
"type": "string",
"enum": ["PASS", "FAIL", "UNCERTAIN"]
},
"confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0
},
"reasoning": {
"type": "string"
},
"evidence_location": {
"type": "string"
}
},
"required": ["verdict", "confidence", "reasoning"],
"additionalProperties": false
}
Provider Compatibility
Both capabilities share the same AI client infrastructure, and the SDK will be provider-neutral rather than centered on one AI provider.
The initial support model stays narrow:
| Backend |
Scope |
| OpenAI |
First-class native adapter |
| Anthropic |
First-class native adapter |
| Gemini |
First-class native adapter |
The SDK will expose one provider-neutral Analyze(prompt, content, schema) contract while keeping the initial backend list intentionally small. Structured outputs, authentication, request formats, and tool-calling behavior vary by provider, so broader compatibility should be added only after the first end-to-end checks prove useful.
User Configuration
AI support remains fully opt-in through plugin configuration:
plugins:
github-repo:
vars:
owner: "my-org"
repo: "my-repo"
token: "ghp_..."
ai_provider: "openai" # openai | anthropic | gemini
ai_api_key: "sk-..."
ai_model: "gpt-4o"
ai_timeout: "30s"
ai_max_tokens: 256
ai_max_calls: 20
AI is disabled unless ai_provider, ai_model, and ai_api_key are set. In dry-run mode, only ai_provider and ai_model are required.
Where the Work Lives
The implementation spans two repos:
| Layer |
Repo |
What |
| AI client infrastructure |
privateerproj/privateer-sdk |
Provider-neutral AIClient with native adapters for OpenAI, Anthropic, and Gemini |
| Per-step AI enhancements |
ossf/pvtr-github-repo-scanner |
Step-specific prompts, content extraction, fallback behavior, and [AI-Assisted] result labeling |
| Post-evaluation remediation |
ossf/pvtr-github-repo-scanner |
Advisory remediation guidance generated from evaluation results |
Security Considerations
- API keys are handled like GitHub tokens: read from config and never logged.
- Only publicly accessible repository content and evaluation results are included in prompts.
- AI responses are constrained to structured verdicts for per-step checks.
- AI calls go only to the configured provider endpoint for the selected native backend.
Cost Considerations
- AI is invoked only when configured and only for inconclusive checks or remediation guidance.
- Focused prompts and bounded
ai_max_tokens keep calls constrained.
ai_max_calls provides a hard cap per evaluation.
Implementation Phases
Phase 1: AI client infrastructure (privateer-sdk)
- Add provider-neutral AI client configuration and backend selection
- Add
Analyze(prompt, content, schema) with structured output support when available and prompt-shaped JSON fallback otherwise
- Add native adapters for OpenAI, Anthropic, and Gemini
- Add shared handling for common failures so all checks fall back the same way
- Add result metadata so AI-assisted results show which provider, model, prompt version, and schema version were used
- Add call-budget controls with
ai_max_calls
- Add
--dry-run-ai for prompt inspection and cost estimation without API calls
Phase 2: High-impact per-step checks (pvtr-github-repo-scanner)
DocumentsTestExecution
DocumentsTestMaintenancePolicy
EnsureLatestReleaseHasChangelog
HasDesignDocumentation
Phase 3: AI-assisted remediation guidance (pvtr-github-repo-scanner)
- Add a post-processing pass over the output payload
- Generate advisory remediation guidance for failed and
NeedsReview results
- Keep deterministic results unchanged
Phase 4: Medium-impact per-step checks
HasSupportDocs
HasContributionGuide
NoBinariesInRepo
CicdSanitizedInputParameters
Phase 5: Audit NotRun checks for AI-first candidates
Some NotRun checks may be better implemented as AI-first checks rather than heuristic checks. That audit should happen only after the first phases prove useful.
Open Questions
- Should AI-assisted results be weighted differently from deterministic results in aggregate scoring?
- Should remediation output live in a separate companion file or be embedded into the standard output?
- What is the right default budget for
ai_max_calls once real usage data exists?
Proposal: Optional AI-Enhanced Assessment Checks
Summary
Add optional AI support to improve the quality and accuracy of assessment checks that currently rely on brittle heuristics or require manual human review, and to provide post-evaluation remediation guidance when useful. Users opt in per evaluation with supported provider credentials. AI is never required.
Deterministic evaluation remains the baseline behavior. AI is an optional enhancement layer used only when configured and only for checks where deterministic logic is inconclusive or where advisory remediation guidance would be useful.
Non-Goals
This proposal does not attempt to do the following:
Motivation
Of the 57 assessment steps currently implemented in the plugin:
NeedsReview, offering zero automated analysis:DocumentsTestExecution(OSPS-QA-06.02)DocumentsTestMaintenancePolicy(OSPS-QA-06.03)NotRunand may have future candidates for AI-assisted implementation.Examples of current gaps:
OSPS-BR-04.01"Change Log"or"Changelog"What's Changed,Release Notes,CHANGELOG, or links toCHANGELOG.mdOSPS-DO-04.01"Support"in READMEGetting Help,Community Support, and similar headingsOSPS-SA-01.01docs/does not prove design documentation existsOSPS-GV-03.01OSPS-QA-06.02NeedsReviewOSPS-QA-06.03NeedsReviewAdditionally, the output today is a flat list of per-check verdicts with no advisory remediation guidance. Users who see failed or
NeedsReviewchecks must research the OSPS Baseline themselves to understand what to fix and how.Proposed Capabilities
1. Per-step AI enhancement
Improve selected checks by adding AI when the deterministic result is inconclusive.
Initial high-impact candidates:
OSPS-QA-06.02OSPS-QA-06.03OSPS-BR-04.01OSPS-SA-01.012. AI-assisted remediation guidance
After deterministic checks complete, the plugin can optionally generate advisory remediation guidance for
FailedandNeedsReviewresults.This capability does not override verdicts or re-evaluate repository evidence. It turns existing results into more actionable guidance by generating:
The input to this pass is intentionally narrow: requirement ID, current verdict, existing step message, recommendation text from the control catalog, and minimal supporting metadata when needed.
Structured Outputs
Per-step AI checks and remediation guidance can use structured outputs when the configured provider supports them. This is the preferred response mode because it makes responses easier to validate and parse.
Initial per-step schema:
{ "type": "object", "properties": { "verdict": { "type": "string", "enum": ["PASS", "FAIL", "UNCERTAIN"] }, "confidence": { "type": "number", "minimum": 0.0, "maximum": 1.0 }, "reasoning": { "type": "string" }, "evidence_location": { "type": "string" } }, "required": ["verdict", "confidence", "reasoning"], "additionalProperties": false }Provider Compatibility
Both capabilities share the same AI client infrastructure, and the SDK will be provider-neutral rather than centered on one AI provider.
The initial support model stays narrow:
The SDK will expose one provider-neutral
Analyze(prompt, content, schema)contract while keeping the initial backend list intentionally small. Structured outputs, authentication, request formats, and tool-calling behavior vary by provider, so broader compatibility should be added only after the first end-to-end checks prove useful.User Configuration
AI support remains fully opt-in through plugin configuration:
AI is disabled unless
ai_provider,ai_model, andai_api_keyare set. In dry-run mode, onlyai_providerandai_modelare required.Where the Work Lives
The implementation spans two repos:
privateerproj/privateer-sdkAIClientwith native adapters for OpenAI, Anthropic, and Geminiossf/pvtr-github-repo-scanner[AI-Assisted]result labelingossf/pvtr-github-repo-scannerSecurity Considerations
Cost Considerations
ai_max_tokenskeep calls constrained.ai_max_callsprovides a hard cap per evaluation.Implementation Phases
Phase 1: AI client infrastructure (
privateer-sdk)Analyze(prompt, content, schema)with structured output support when available and prompt-shaped JSON fallback otherwiseai_max_calls--dry-run-aifor prompt inspection and cost estimation without API callsPhase 2: High-impact per-step checks (
pvtr-github-repo-scanner)DocumentsTestExecutionDocumentsTestMaintenancePolicyEnsureLatestReleaseHasChangelogHasDesignDocumentationPhase 3: AI-assisted remediation guidance (
pvtr-github-repo-scanner)NeedsReviewresultsPhase 4: Medium-impact per-step checks
HasSupportDocsHasContributionGuideNoBinariesInRepoCicdSanitizedInputParametersPhase 5: Audit
NotRunchecks for AI-first candidatesSome
NotRunchecks may be better implemented as AI-first checks rather than heuristic checks. That audit should happen only after the first phases prove useful.Open Questions
ai_max_callsonce real usage data exists?