feat(ci): implement documentation gap detection pipeline for InfluxDB 3#7058
feat(ci): implement documentation gap detection pipeline for InfluxDB 3#7058jstirnaman wants to merge 4 commits intomasterfrom
Conversation
Adds an automated documentation gap detection pipeline triggered by release events, scoped to InfluxDB 3 Core and Enterprise (Team Monolith). ## New modules **`scripts/docs-cli/lib/doc-location-map.js`** Inverted doc scanner: reads the content tree and extracts API references (operation-link frontmatter, api-endpoint shortcodes, curl commands, bare paths) then matches them against committed OpenAPI specs. Follows source: frontmatter pointers into shared content. Produces three artifact sets: confirmedMap (spec ops with prose coverage), orphaned (stale doc links), uncovered (ops with no coverage). Smoke-tested: 22/42 Core ops covered. **`scripts/docs-cli/lib/gap-severity.js`** Scores each undocumented operation: path-prefix tier × edition scope × change type → critical | high | medium | low. Health/ping/metrics are capped at low regardless of bumps. Write/query paths hitting both editions as new endpoints score critical. **`scripts/docs-cli/lib/gap-reporter.js`** Assembles severity-scored gap reports from a doc-location-map result and an optional spec delta (git diff on committed OpenAPI YAML between version tags — no source repo access required). Outputs structured JSON + markdown summary. Suggests doc paths from adjacent confirmed-map entries. **`scripts/docs-cli/lib/issue-creator.js`** Creates GitHub issues for high/critical gaps using the gh CLI. Builds structured issue bodies with spec claim, severity rationale, engineering verification ask, and definition-of-done checklist. Supports --dry-run mode (prints to stdout). ## Modified files **`scripts/docs-cli/commands/audit.js`** New flags: --doc-location-map, --previous-version, --create-issue, --dry-run. When --doc-location-map is set, runs the inverted scanner after the existing audit and optionally generates a gap report and files issues. **`.github/workflows/influxdb3-release.yml`** New job audit-api-documentation (between release notes and PR creation): uses git diff on committed OpenAPI specs to compute spec delta, runs doc-location-map, generates severity-scored gap report, uploads artifact. create-documentation-pr now shows gap summary in PR body. create-audit-issue now creates one GitHub issue per high/critical gap with full structured body. Release summary includes the new job. **`.github/ISSUE_TEMPLATE/doc-gap-ticket.yml`** GitHub Forms template for manually filing doc gaps: severity, edition scope, change type, operation ID, spec claim, suggested location, engineering verification ask, definition-of-done checklist. https://claude.ai/code/session_01CpE2NxtgSre6spEHLrUw5M
There was a problem hiding this comment.
Pull request overview
Implements an automated “documentation gap detection” pipeline for InfluxDB 3 releases by scanning docs content for API references, diffing committed OpenAPI specs between versions, scoring uncovered operations by severity, and generating artifacts/issues to drive remediation.
Changes:
- Added doc-location mapping + gap reporting modules to compute doc/spec coverage, spec deltas, and severity-scored gap reports.
- Extended
docs auditwith flags to run the inverted scanner, generate gap reports, and (optionally) create gap issues. - Updated the InfluxDB 3 release workflow to run API gap analysis, upload artifacts, and create per-gap GitHub issues.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/docs-cli/lib/doc-location-map.js |
Scans content for API signals and reconciles them against committed OpenAPI operationIds. |
scripts/docs-cli/lib/gap-severity.js |
Defines severity scoring rules and category labeling based on path/tags/scope/change type. |
scripts/docs-cli/lib/gap-reporter.js |
Computes spec deltas between git refs and produces JSON/markdown gap reports with suggestions. |
scripts/docs-cli/lib/issue-creator.js |
Adds a gh-CLI-based issue creator for actionable gaps (high/critical). |
scripts/docs-cli/commands/audit.js |
Adds CLI flags and wiring to run doc-location-map, gap report generation, and issue creation. |
.github/workflows/influxdb3-release.yml |
Introduces an API gap analysis job, threads results into PR body, and creates per-gap issues. |
.github/ISSUE_TEMPLATE/doc-gap-ticket.yml |
Adds a GitHub Forms template for manually filing documentation gaps. |
scripts/docs-cli/lib/gap-severity.js
Outdated
| { | ||
| prefix: '/api/v3/configure/processing_engine_trigger/test', | ||
| tier: 'low', | ||
| cap: 'low', | ||
| }, |
There was a problem hiding this comment.
The PATH_TIER entry for /api/v3/configure/processing_engine_trigger/test will never be applied because an earlier, more general prefix (/api/v3/configure/processing_engine_trigger) matches first. Move the /test entry before the general trigger prefix (or adjust matching) so test endpoints are correctly capped at low severity.
scripts/docs-cli/lib/gap-severity.js
Outdated
|
|
||
| // Token management | ||
| { prefix: '/api/v3/configure/token', tier: 'medium' }, | ||
| { prefix: '/api/v3/configure/token', tier: 'medium' }, |
There was a problem hiding this comment.
Duplicate PATH_TIER entries for /api/v3/configure/token add noise and make future updates error-prone. Remove the duplicate entry so each prefix appears only once.
| { prefix: '/api/v3/configure/token', tier: 'medium' }, |
scripts/docs-cli/lib/gap-severity.js
Outdated
| if ( | ||
| path.startsWith('/api/v3/configure/token') || | ||
| path.startsWith('/api/v3/configure/token') | ||
| ) |
There was a problem hiding this comment.
This conditional checks the same /api/v3/configure/token prefix twice, which is redundant. Remove the duplicate check to keep the category mapping unambiguous and easier to maintain.
| if ( | |
| path.startsWith('/api/v3/configure/token') || | |
| path.startsWith('/api/v3/configure/token') | |
| ) | |
| if (path.startsWith('/api/v3/configure/token')) |
| * @module issue-creator | ||
| */ | ||
|
|
||
| import { execSync, execFileSync } from 'child_process'; |
There was a problem hiding this comment.
execFileSync is imported but never used. Remove the unused import or switch the gh invocations to execFileSync (recommended) to avoid dead code and lint failures.
| const labelArgs = labels.map((l) => `--label "${l}"`).join(' '); | ||
| execSync( | ||
| `gh issue create --title "${title.replace(/"/g, '\\"')}" --body "${body.replace(/"/g, '\\"').replace(/\n/g, '\\n')}" ${labelArgs}`, | ||
| { stdio: ['pipe', 'inherit', 'pipe'] } | ||
| ); | ||
| console.log(` ✓ Created: ${title}`); | ||
| } catch (err) { | ||
| // If label doesn't exist in repo, retry without product labels | ||
| try { | ||
| execSync( | ||
| `gh issue create --title "${title.replace(/"/g, '\\"')}" --body "${body.replace(/"/g, '\\"').replace(/\n/g, '\\n')}" --label "documentation" --label "doc-gap"`, |
There was a problem hiding this comment.
The gh issue create call is built as a shell string and only escapes double-quotes/newlines. This is both fragile (issue bodies will likely contain literal \n instead of real newlines) and unsafe (shell expansion like $()/backticks can execute if spec-derived fields contain them). Use execFileSync('gh', [...]) (no shell) and pass the body as an argument with real newlines, or write the body to a temp file and use --body-file.
| const labelArgs = labels.map((l) => `--label "${l}"`).join(' '); | |
| execSync( | |
| `gh issue create --title "${title.replace(/"/g, '\\"')}" --body "${body.replace(/"/g, '\\"').replace(/\n/g, '\\n')}" ${labelArgs}`, | |
| { stdio: ['pipe', 'inherit', 'pipe'] } | |
| ); | |
| console.log(` ✓ Created: ${title}`); | |
| } catch (err) { | |
| // If label doesn't exist in repo, retry without product labels | |
| try { | |
| execSync( | |
| `gh issue create --title "${title.replace(/"/g, '\\"')}" --body "${body.replace(/"/g, '\\"').replace(/\n/g, '\\n')}" --label "documentation" --label "doc-gap"`, | |
| const issueArgs = [ | |
| 'issue', | |
| 'create', | |
| '--title', | |
| title, | |
| '--body', | |
| body, | |
| ...labels.flatMap((label) => ['--label', label]), | |
| ]; | |
| execFileSync('gh', issueArgs, { stdio: ['pipe', 'inherit', 'pipe'] }); | |
| console.log(` ✓ Created: ${title}`); | |
| } catch (err) { | |
| // If label doesn't exist in repo, retry without product labels | |
| try { | |
| execFileSync( | |
| 'gh', | |
| [ | |
| 'issue', | |
| 'create', | |
| '--title', | |
| title, | |
| '--body', | |
| body, | |
| '--label', | |
| 'documentation', | |
| '--label', | |
| 'doc-gap', | |
| ], |
| const mapResult = await runDocLocationMap(influxProduct, { | ||
| ...(previousVersion && { previousVersion }), | ||
| }); |
There was a problem hiding this comment.
runDocLocationMap() doesn't accept or use a previousVersion option, so this argument currently has no effect despite the flag help text implying delta scoping. If you want delta-scoped scanning, compute a spec delta first and pass filterOperationIds, or remove the unused option to avoid confusing behavior.
| const mapResult = await runDocLocationMap(influxProduct, { | |
| ...(previousVersion && { previousVersion }), | |
| }); | |
| const mapResult = await runDocLocationMap(influxProduct); |
| } else if (arg === '--create-issue') { | ||
| createIssueFlag = true; | ||
| } else if (arg === '--dry-run') { |
There was a problem hiding this comment.
--create-issue is documented as requiring --doc-location-map, but if a user passes --create-issue alone the command silently does nothing. Add a validation error when createIssueFlag is true but runDocLocationMapFlag is false (and similarly for --dry-run without --create-issue if desired).
| } else if (arg === '--create-issue') { | |
| createIssueFlag = true; | |
| } else if (arg === '--dry-run') { | |
| } else if (arg === '--create-issue') { | |
| const hasDocLocationMapFlag = | |
| runDocLocationMapFlag || positionals.slice(i + 1).includes('--doc-location-map'); | |
| if (!hasDocLocationMapFlag) { | |
| console.error('Error: --create-issue requires --doc-location-map'); | |
| process.exit(1); | |
| } | |
| createIssueFlag = true; | |
| } else if (arg === '--dry-run') { | |
| const hasCreateIssueFlag = | |
| createIssueFlag || positionals.slice(i + 1).includes('--create-issue'); | |
| if (!hasCreateIssueFlag) { | |
| console.error('Error: --dry-run requires --create-issue'); | |
| process.exit(1); | |
| } |
| try { | ||
| if (toRef === 'HEAD') { | ||
| newContent = execSync(`cat "${join(repoRoot, specRelPath)}"`, { | ||
| encoding: 'utf-8', | ||
| }); | ||
| } else { |
There was a problem hiding this comment.
Using execSync('cat ...') to read the spec is unnecessary and breaks on Windows. Read the file via fs.readFileSync/fs.promises.readFile instead (and avoid invoking a shell).
| const results = new Map(); // operationId → [signal entries] | ||
| const orphanedRefs = []; // { docPath, operationId } where operationId not in spec | ||
|
|
There was a problem hiding this comment.
orphanedRefs is initialized and returned from scanContentFiles() but is never populated or consumed (the caller only uses coverageMap). Remove it (and related comments) or implement it fully to avoid misleading dead code.
| ${{ needs.audit-api-documentation.outputs.gap_report_generated == 'true' && format('🔴 Critical: **{0}** | 🟠 High: **{1}**', needs.audit-api-documentation.outputs.critical_count, needs.audit-api-documentation.outputs.high_count) || 'Gap analysis did not run or was skipped.' }} | ||
|
|
||
| See the [gap report artifact](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}) for the full list of uncovered operations with severity scores and suggested doc locations. | ||
| High/critical gaps have been automatically filed as individual GitHub issues. |
There was a problem hiding this comment.
The PR body text says high/critical gaps "have been automatically filed" as issues, but issue creation happens in the separate create-audit-issue job which may not have run yet (or could fail) when the PR is created. Consider rephrasing to be conditional/future-tense or moving per-gap issue creation before PR creation so the statement is accurate.
| High/critical gaps have been automatically filed as individual GitHub issues. | |
| High/critical gaps will be automatically filed as individual GitHub issues if the issue-creation job runs successfully. |
- gap-severity: reorder PATH_TIER so test endpoint prefix matches before
its parent trigger prefix; remove duplicate /configure/token entry;
collapse redundant token check in deriveCategoryLabel to single startsWith
- issue-creator: replace execSync shell-string gh calls with execFileSync +
temp body file to prevent shell injection from spec-derived field values;
remove unused execFileSync import and replace with fs writeFileSync/unlinkSync
- audit: add early validation that --create-issue requires --doc-location-map;
wire --previous-version to computeSpecDelta so filterOperationIds is built
from the spec delta and passed to runDocLocationMap (was silently ignored)
- gap-reporter: replace execSync('cat ...') with readFileSync (cross-platform,
no subprocess); add readFileSync to existing fs import
- doc-location-map: remove never-populated orphanedRefs from scanContentFiles
(orphan tracking is handled correctly by reconcile())
- influxdb3-release.yml: add create-documentation-pr to create-audit-issue
needs so issue creation always runs after PR creation; fix inaccurate PR body
claim that issues 'have been filed' before create-audit-issue has run
https://claude.ai/code/session_01CpE2NxtgSre6spEHLrUw5M
Adds an automated documentation gap detection pipeline triggered by
release events, scoped to InfluxDB 3 Core and Enterprise (owned by Team Monolith).
New modules
scripts/docs-cli/lib/doc-location-map.jsInverted doc scanner: reads the content tree and extracts API references
(operation-link frontmatter, api-endpoint shortcodes, curl commands, bare
paths) then matches them against committed OpenAPI specs. Follows source:
frontmatter pointers into shared content. Produces three artifact sets:
confirmedMap (spec ops with prose coverage), orphaned (stale doc links),
uncovered (ops with no coverage). Smoke-tested: 22/42 Core ops covered.
scripts/docs-cli/lib/gap-severity.jsScores each undocumented operation: path-prefix tier × edition scope ×
change type → critical | high | medium | low. Health/ping/metrics are
capped at low regardless of bumps. Write/query paths hitting both editions
as new endpoints score critical.
scripts/docs-cli/lib/gap-reporter.jsAssembles severity-scored gap reports from a doc-location-map result and
an optional spec delta (git diff on committed OpenAPI YAML between version
tags — no source repo access required). Outputs structured JSON + markdown
summary. Suggests doc paths from adjacent confirmed-map entries.
scripts/docs-cli/lib/issue-creator.jsCreates GitHub issues for high/critical gaps using the gh CLI. Builds
structured issue bodies with spec claim, severity rationale, engineering
verification ask, and definition-of-done checklist. Supports --dry-run
mode (prints to stdout).
Modified files
scripts/docs-cli/commands/audit.jsNew flags: --doc-location-map, --previous-version, --create-issue,
--dry-run. When --doc-location-map is set, runs the inverted scanner after
the existing audit and optionally generates a gap report and files issues.
.github/workflows/influxdb3-release.ymlNew job audit-api-documentation (between release notes and PR creation):
uses git diff on committed OpenAPI specs to compute spec delta, runs
doc-location-map, generates severity-scored gap report, uploads artifact.
create-documentation-pr now shows gap summary in PR body. create-audit-issue
now creates one GitHub issue per high/critical gap with full structured
body. Release summary includes the new job.
.github/ISSUE_TEMPLATE/doc-gap-ticket.ymlGitHub Forms template for manually filing doc gaps: severity, edition
scope, change type, operation ID, spec claim, suggested location,
engineering verification ask, definition-of-done checklist.
https://claude.ai/code/session_01CpE2NxtgSre6spEHLrUw5M