Skip to content

feat: auto-bootstrap docs snapshot + safe refresh UX#24

Merged
anand-testcompare merged 3 commits into
mainfrom
19-auto-download-palantir-docs-database-on-plugin-install-post-install-hook
Feb 16, 2026
Merged

feat: auto-bootstrap docs snapshot + safe refresh UX#24
anand-testcompare merged 3 commits into
mainfrom
19-auto-download-palantir-docs-database-on-plugin-install-post-install-hook

Conversation

@anand-testcompare

@anand-testcompare anand-testcompare commented Feb 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • auto-bootstrap data/docs.parquet via a new concurrency-safe ensureDocsParquet() path used by startup + doc tools
  • make /refresh-docs the recommended prebuilt snapshot refresh command and add /refresh-docs-rescrape as an explicit unsafe fallback
  • replace noisy refresh console.log output with structured progress events/output and tighten refresh/result summaries
  • add manual Refresh Docs Snapshot workflow (workflow_dispatch) that regenerates data/docs.parquet and opens a PR with fetch counts
  • protect release flow from snapshot-only changes by excluding data/docs.parquet in release-please config
  • update README + tests (including new snapshot unit tests and fresh auto-bootstrap integration coverage)

Issue

Closes #19

Validation

  • mise run format
  • mise run lint
  • mise run test
  • mise run build
  • fresh-project packed-plugin smoke (new temp repo): command registration + first-run docs bootstrap + tool execution
  • fresh-project packed-plugin command smoke: /refresh-docs output includes structured snapshot source/indexed counts
  • workflow YAML parse check via Ruby YAML loader across .github/workflows/*.yml

Summary by CodeRabbit

  • New Features

    • Added experimental /refresh-docs-rescrape, improved /refresh-docs, and automatic docs snapshot bootstrap with progress/status reporting.
  • Documentation

    • README updated with first-run behavior, new refresh commands, and refreshed docs workflow guidance.
  • Tests

    • Expanded tests for snapshot bootstrapping, rescrape flows, progress events, concurrency, and error handling.
  • Chores

    • Added a manual workflow to refresh docs snapshots and excluded the docs data file from release processing.

@anand-testcompare anand-testcompare linked an issue Feb 16, 2026 that may be closed by this pull request
11 tasks
@coderabbitai

coderabbitai Bot commented Feb 16, 2026

Copy link
Copy Markdown

Walkthrough

Adds an auto-bootstrap and managed refresh system for the Palantir docs snapshot (data/docs.parquet), structured progress/event reporting for snapshot and rescrape flows, new refresh commands, tests, and a manual GitHub Actions workflow that can regenerate the snapshot and open a PR.

Changes

Cohort / File(s) Summary
CI Workflow & Release Config
.github/workflows/refresh-docs-snapshot.yml, release-please-config.json
Adds a manual workflow_dispatch job to rescrape docs and open a PR with a generated summary; excludes data/docs.parquet from release processing.
Documentation
README.md
Updates docs to describe first-run auto-bootstrap behavior and documents two refresh commands: /refresh-docs (recommended snapshot) and /refresh-docs-rescrape (unsafe/experimental rescrape).
Snapshot Manager (new)
src/docs/snapshot.ts
New module exposing ensureDocsParquet() with prioritized download URLs, bundled-copy fallback, atomic write/copy, size validation, deduplication of concurrent callers, event emission, and rich error aggregation.
Fetch & CLI
src/docs/fetch.ts, src/docs/fetch-cli.ts
Makes fetchAllDocs accept options (progressEvery, onProgress), emits structured progress events, generates a runtime summary, and optionally writes summary to disk when configured.
Core Integration
src/index.ts
Integrates snapshot lifecycle and single-flight DB init, registers refresh-docs and refresh-docs-rescrape commands, implements progress/status formatting for snapshot/rescrape flows, and makes doc tools join/await snapshot bootstrap.
Tests
src/__tests__/configHook.test.ts, src/__tests__/index.test.ts, src/docs/__tests__/fetch.test.ts, src/docs/__tests__/snapshot.test.ts
Adds and updates tests to cover auto-bootstrap, command registration, progress callbacks, concurrency/deduplication, bundled fallback, rescrape reporting, and failure modes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Plugin as Plugin (MCP)
    participant SnapMgr as Snapshot Manager
    participant Net as Network
    participant FS as File System

    rect rgba(100, 150, 255, 0.5)
    Note over User,Plugin: Auto-bootstrap or /refresh-docs flow (snapshot)

    Plugin->>SnapMgr: ensureDocsParquet({force: false/true})
    SnapMgr->>FS: stat dbPath
    alt exists and valid (force=false)
        FS-->>SnapMgr: ok
        SnapMgr-->>Plugin: {dbPath, source: existing, changed:false}
    else
        SnapMgr->>Net: try download URLs
        Net-->>SnapMgr: (success / fail)
        alt download succeeds
            SnapMgr->>FS: writeBufferAtomic -> docs.parquet
            SnapMgr-->>Plugin: {dbPath, source: download, changed:true}
        else download fails
            SnapMgr->>FS: copy bundled snapshot
            FS-->>SnapMgr: copied / fail
            alt copy succeeds
                SnapMgr-->>Plugin: {dbPath, source: bundled-copy, changed:true}
            else
                SnapMgr-->>Plugin: error (all sources failed)
            end
        end
    end
    Plugin-->>User: progress/events + final result
    end

    rect rgba(255, 150, 100, 0.5)
    Note over User,Plugin: /refresh-docs-rescrape (unsafe)

    User->>Plugin: /refresh-docs-rescrape
    Plugin->>Plugin: warn: unsafe/experimental
    Plugin->>Net: fetchAllDocs(..., onProgress)
    Net-->>Plugin: progress events (discovered, progress, page-failed)
    Plugin->>FS: write docs.parquet
    Plugin-->>User: structured summary (fetched/failed/bytes)
    end
Loading

Possibly related PRs

Poem

🐇 I hopped to fetch a snapshot bright,
No noisy logs to wake the night,
Two safe paths—download or brave rescrape,
Atomic writes and dedupe escape,
A tidy PR blooms—now time for a bite!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title 'feat: auto-bootstrap docs snapshot + safe refresh UX' directly and clearly summarizes the main changes: automatic docs snapshot bootstrapping and improved refresh user experience.
Linked Issues check ✅ Passed The pull request addresses all core coding requirements from issue #19: auto-bootstrap via ensureDocsParquet(), /refresh-docs and /refresh-docs-rescrape commands registered, structured progress output replacing console.log, concurrency-safe deduplication, CI workflow for manual snapshot refresh, and release-please config exclusion.
Out of Scope Changes check ✅ Passed All changes align with issue #19 scope: snapshot bootstrap implementation, command registration, progress event handling, CI workflow, release config, tests, and documentation updates. No unrelated modifications detected.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 19-auto-download-palantir-docs-database-on-plugin-install-post-install-hook

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 134-148: Change the heading text "First run behavior" to the
hyphenated form "First-run behavior" in README.md; locate the heading that
currently reads "### First run behavior" and update it to "### First-run
behavior" (preserve heading level and surrounding text).

In `@src/__tests__/index.test.ts`:
- Around line 333-364: The test is using an any[] for output.parts; define a
small OutputPart shape and use it instead: create a local type alias (e.g. type
OutputPart = { type: 'text'; text: string }) and replace occurrences of `const
output = { parts: [] as any[] }` with `const output: { parts: OutputPart[] } = {
parts: [] }` in the test case that uses hookFn and the two other similar cases
(the instances around the current block and the ones at the other two
locations), ensuring the variable name `output` and the assertions against
output.parts remain unchanged.

In `@src/docs/__tests__/snapshot.test.ts`:
- Around line 17-35: The test deletes OPENCODE_PALANTIR_DOCS_SNAPSHOT_URL and
OPENCODE_PALANTIR_DOCS_SNAPSHOT_URLS in beforeEach but does not restore them,
which can leak state; capture their original values at the start of beforeEach
(e.g., save originals like originalSnapshotUrl and originalSnapshotUrls) and
then restore them in afterEach (set
process.env.OPENCODE_PALANTIR_DOCS_SNAPSHOT_URL = originalSnapshotUrl or delete
it if undefined, likewise for OPENCODE_PALANTIR_DOCS_SNAPSHOT_URLS), updating
the existing beforeEach/afterEach blocks in snapshot.test.ts to save and restore
these env vars while keeping the current cleanup (globalThis.fetch,
vi.restoreAllMocks, fs.rmSync).
🧹 Nitpick comments (3)
src/docs/fetch.ts (1)

49-51: Consider extracting formatError to a shared utility module.

This helper is duplicated identically in src/index.ts (line 32) and src/docs/snapshot.ts (line 40). Extracting it to a shared utilities file would improve maintainability.

src/index.ts (1)

165-184: Potential stale state if resetDb() is called during pending initialization.

If resetDb() is called while dbInitPromise is still pending, the .then() callback will still execute and set dbInstance to the created database after resetDb() has cleared it. This could leave the state inconsistent.

In practice, the current usage pattern (calling resetDb() after ensureDocsAvailable() or fetchAllDocs() completes) should be safe, but the code isn't defensive against misuse.

🛡️ Suggested defensive fix
+  let dbInitGeneration = 0;
+
   function resetDb(): void {
     if (dbInstance) closeDatabase(dbInstance);
     dbInstance = null;
     dbInitPromise = null;
+    dbInitGeneration += 1;
   }

   async function getDb(): Promise<ParquetStore> {
     if (dbInstance) return dbInstance;
     if (dbInitPromise) return dbInitPromise;

+    const generation = dbInitGeneration;
     dbInitPromise = createDatabase(dbPath)
       .then((created) => {
-        dbInstance = created;
+        if (dbInitGeneration === generation) {
+          dbInstance = created;
+        }
         return created;
       })
       .finally(() => {
         dbInitPromise = null;
       });
     return dbInitPromise;
   }
src/docs/snapshot.ts (1)

92-102: Atomic write operations don't clean up temp files on failure.

If fs.rename fails in writeBufferAtomic or copyFileAtomic, the temporary file will be left behind. This could accumulate orphaned .tmp.* files over time.

🧹 Suggested fix with cleanup on failure
 async function writeBufferAtomic(dbPath: string, bytes: Uint8Array): Promise<void> {
   const tmp = tempPathFor(dbPath);
-  await fs.writeFile(tmp, bytes);
-  await fs.rename(tmp, dbPath);
+  try {
+    await fs.writeFile(tmp, bytes);
+    await fs.rename(tmp, dbPath);
+  } catch (err) {
+    await fs.unlink(tmp).catch(() => {});
+    throw err;
+  }
 }

 async function copyFileAtomic(sourcePath: string, dbPath: string): Promise<void> {
   const tmp = tempPathFor(dbPath);
-  await fs.copyFile(sourcePath, tmp);
-  await fs.rename(tmp, dbPath);
+  try {
+    await fs.copyFile(sourcePath, tmp);
+    await fs.rename(tmp, dbPath);
+  } catch (err) {
+    await fs.unlink(tmp).catch(() => {});
+    throw err;
+  }
 }

Comment thread README.md
Comment thread src/__tests__/index.test.ts
Comment thread src/docs/__tests__/snapshot.test.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/__tests__/index.test.ts`:
- Around line 295-332: The tests import plugin at module scope so src/index.ts
captures direct bindings to ensureDocsParquet and fetchAllDocs, making later
vi.spyOn calls ineffective; fix each affected test by resetting modules
(vi.resetModules()), then locally importing the modules that export
ensureDocsParquet and fetchAllDocs, apply vi.spyOn or mockImplementation on
those module exports, and only after that import the plugin (e.g. await
import('.../src/index') or require) so the plugin captures the spied/mocked
functions; update the tests around the uses of ensureDocsParquet and
fetchAllDocs (and the affected test blocks referenced) to follow this pattern.

Comment thread src/__tests__/index.test.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/__tests__/index.test.ts (1)

433-436: Redundant spy.mockRestore() call.

The explicit spy.mockRestore() on line 435 is unnecessary since vi.restoreAllMocks() is already called in afterEach (line 35), which will restore all mocks including this spy.

♻️ Suggested cleanup
     expect(spy).not.toHaveBeenCalled();
     expect(output.parts).toHaveLength(0);
-    spy.mockRestore();
   });

@anand-testcompare anand-testcompare merged commit 021b715 into main Feb 16, 2026
5 checks passed
@anand-testcompare anand-testcompare deleted the 19-auto-download-palantir-docs-database-on-plugin-install-post-install-hook branch February 16, 2026 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-download Palantir docs database on plugin install (post-install hook)

1 participant