feat: auto-bootstrap docs snapshot + safe refresh UX#24
Conversation
WalkthroughAdds an auto-bootstrap and managed refresh system for the Palantir docs snapshot (data/docs.parquet), structured progress/event reporting for snapshot and rescrape flows, new refresh commands, tests, and a manual GitHub Actions workflow that can regenerate the snapshot and open a PR. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Plugin as Plugin (MCP)
participant SnapMgr as Snapshot Manager
participant Net as Network
participant FS as File System
rect rgba(100, 150, 255, 0.5)
Note over User,Plugin: Auto-bootstrap or /refresh-docs flow (snapshot)
Plugin->>SnapMgr: ensureDocsParquet({force: false/true})
SnapMgr->>FS: stat dbPath
alt exists and valid (force=false)
FS-->>SnapMgr: ok
SnapMgr-->>Plugin: {dbPath, source: existing, changed:false}
else
SnapMgr->>Net: try download URLs
Net-->>SnapMgr: (success / fail)
alt download succeeds
SnapMgr->>FS: writeBufferAtomic -> docs.parquet
SnapMgr-->>Plugin: {dbPath, source: download, changed:true}
else download fails
SnapMgr->>FS: copy bundled snapshot
FS-->>SnapMgr: copied / fail
alt copy succeeds
SnapMgr-->>Plugin: {dbPath, source: bundled-copy, changed:true}
else
SnapMgr-->>Plugin: error (all sources failed)
end
end
end
Plugin-->>User: progress/events + final result
end
rect rgba(255, 150, 100, 0.5)
Note over User,Plugin: /refresh-docs-rescrape (unsafe)
User->>Plugin: /refresh-docs-rescrape
Plugin->>Plugin: warn: unsafe/experimental
Plugin->>Net: fetchAllDocs(..., onProgress)
Net-->>Plugin: progress events (discovered, progress, page-failed)
Plugin->>FS: write docs.parquet
Plugin-->>User: structured summary (fetched/failed/bytes)
end
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 134-148: Change the heading text "First run behavior" to the
hyphenated form "First-run behavior" in README.md; locate the heading that
currently reads "### First run behavior" and update it to "### First-run
behavior" (preserve heading level and surrounding text).
In `@src/__tests__/index.test.ts`:
- Around line 333-364: The test is using an any[] for output.parts; define a
small OutputPart shape and use it instead: create a local type alias (e.g. type
OutputPart = { type: 'text'; text: string }) and replace occurrences of `const
output = { parts: [] as any[] }` with `const output: { parts: OutputPart[] } = {
parts: [] }` in the test case that uses hookFn and the two other similar cases
(the instances around the current block and the ones at the other two
locations), ensuring the variable name `output` and the assertions against
output.parts remain unchanged.
In `@src/docs/__tests__/snapshot.test.ts`:
- Around line 17-35: The test deletes OPENCODE_PALANTIR_DOCS_SNAPSHOT_URL and
OPENCODE_PALANTIR_DOCS_SNAPSHOT_URLS in beforeEach but does not restore them,
which can leak state; capture their original values at the start of beforeEach
(e.g., save originals like originalSnapshotUrl and originalSnapshotUrls) and
then restore them in afterEach (set
process.env.OPENCODE_PALANTIR_DOCS_SNAPSHOT_URL = originalSnapshotUrl or delete
it if undefined, likewise for OPENCODE_PALANTIR_DOCS_SNAPSHOT_URLS), updating
the existing beforeEach/afterEach blocks in snapshot.test.ts to save and restore
these env vars while keeping the current cleanup (globalThis.fetch,
vi.restoreAllMocks, fs.rmSync).
🧹 Nitpick comments (3)
src/docs/fetch.ts (1)
49-51: Consider extractingformatErrorto a shared utility module.This helper is duplicated identically in
src/index.ts(line 32) andsrc/docs/snapshot.ts(line 40). Extracting it to a shared utilities file would improve maintainability.src/index.ts (1)
165-184: Potential stale state ifresetDb()is called during pending initialization.If
resetDb()is called whiledbInitPromiseis still pending, the.then()callback will still execute and setdbInstanceto the created database afterresetDb()has cleared it. This could leave the state inconsistent.In practice, the current usage pattern (calling
resetDb()afterensureDocsAvailable()orfetchAllDocs()completes) should be safe, but the code isn't defensive against misuse.🛡️ Suggested defensive fix
+ let dbInitGeneration = 0; + function resetDb(): void { if (dbInstance) closeDatabase(dbInstance); dbInstance = null; dbInitPromise = null; + dbInitGeneration += 1; } async function getDb(): Promise<ParquetStore> { if (dbInstance) return dbInstance; if (dbInitPromise) return dbInitPromise; + const generation = dbInitGeneration; dbInitPromise = createDatabase(dbPath) .then((created) => { - dbInstance = created; + if (dbInitGeneration === generation) { + dbInstance = created; + } return created; }) .finally(() => { dbInitPromise = null; }); return dbInitPromise; }src/docs/snapshot.ts (1)
92-102: Atomic write operations don't clean up temp files on failure.If
fs.renamefails inwriteBufferAtomicorcopyFileAtomic, the temporary file will be left behind. This could accumulate orphaned.tmp.*files over time.🧹 Suggested fix with cleanup on failure
async function writeBufferAtomic(dbPath: string, bytes: Uint8Array): Promise<void> { const tmp = tempPathFor(dbPath); - await fs.writeFile(tmp, bytes); - await fs.rename(tmp, dbPath); + try { + await fs.writeFile(tmp, bytes); + await fs.rename(tmp, dbPath); + } catch (err) { + await fs.unlink(tmp).catch(() => {}); + throw err; + } } async function copyFileAtomic(sourcePath: string, dbPath: string): Promise<void> { const tmp = tempPathFor(dbPath); - await fs.copyFile(sourcePath, tmp); - await fs.rename(tmp, dbPath); + try { + await fs.copyFile(sourcePath, tmp); + await fs.rename(tmp, dbPath); + } catch (err) { + await fs.unlink(tmp).catch(() => {}); + throw err; + } }
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/__tests__/index.test.ts`:
- Around line 295-332: The tests import plugin at module scope so src/index.ts
captures direct bindings to ensureDocsParquet and fetchAllDocs, making later
vi.spyOn calls ineffective; fix each affected test by resetting modules
(vi.resetModules()), then locally importing the modules that export
ensureDocsParquet and fetchAllDocs, apply vi.spyOn or mockImplementation on
those module exports, and only after that import the plugin (e.g. await
import('.../src/index') or require) so the plugin captures the spied/mocked
functions; update the tests around the uses of ensureDocsParquet and
fetchAllDocs (and the affected test blocks referenced) to follow this pattern.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/__tests__/index.test.ts (1)
433-436: Redundantspy.mockRestore()call.The explicit
spy.mockRestore()on line 435 is unnecessary sincevi.restoreAllMocks()is already called inafterEach(line 35), which will restore all mocks including this spy.♻️ Suggested cleanup
expect(spy).not.toHaveBeenCalled(); expect(output.parts).toHaveLength(0); - spy.mockRestore(); });
Summary
data/docs.parquetvia a new concurrency-safeensureDocsParquet()path used by startup + doc tools/refresh-docsthe recommended prebuilt snapshot refresh command and add/refresh-docs-rescrapeas an explicit unsafe fallbackconsole.logoutput with structured progress events/output and tighten refresh/result summariesRefresh Docs Snapshotworkflow (workflow_dispatch) that regeneratesdata/docs.parquetand opens a PR with fetch countsdata/docs.parquetin release-please configIssue
Closes #19
Validation
mise run formatmise run lintmise run testmise run build/refresh-docsoutput includes structured snapshot source/indexed counts.github/workflows/*.ymlSummary by CodeRabbit
New Features
/refresh-docs-rescrape, improved/refresh-docs, and automatic docs snapshot bootstrap with progress/status reporting.Documentation
Tests
Chores