CAMEL-23781: Generate an offline documentation bundle (camel-docs-offline.zip)#1666
CAMEL-23781: Generate an offline documentation bundle (camel-docs-offline.zip)#1666k-krawczyk wants to merge 1 commit into
Conversation
Add an offline documentation bundle so AI coding agents (and humans) with no or restricted internet access can read the Camel docs locally. After the Markdown files are generated, the generate-markdown task now zips all public/**/*.md files plus llms.txt - preserving the site directory structure - into public/camel-docs-offline.zip, using the system zip tool (no new dependency). The bundle can be downloaded once, unzipped (e.g. into /tmp) and read offline. The llms.txt template now points agents at this offline archive.
|
cc @davsclaus — implemented per your steer (system Reported by Claude Code on behalf of Karol Krawczyk |
|
did you try to check how big the .zip file is ? |
|
We may need to distrubute the zip as a seperate publish as if we just change a comma, then we struggle to publish the website if it then includes a new 500mb zip file. Unless git can upload this clever (i dont know) |
davsclaus
left a comment
There was a problem hiding this comment.
Thanks for the contribution! The idea of providing offline documentation for AI agents is great.
A few observations from the review:
Branch naming: Per project conventions, feature branches should follow feature/<ISSUE_ID>-<short-slug> (e.g. feature/CAMEL-23781-offline-docs-bundle).
System dependency: The code uses execFileSync('zip', ...) which requires the system zip binary. While GitHub Actions runners have it, this is an undocumented runtime dependency.
Full build not tested: The PR body notes the full site build was not verified locally — worth confirming before merge.
Alternative approach: We've opened CAMEL-23788 / PR #1667 with a versioned approach: a manually-triggered workflow that generates per-version bundles (e.g. camel-docs-4.18.zip) uploaded as GitHub Release assets. This avoids running the zip on every build, keeps binaries out of git, and lets AI agents download the bundle matching their specific Camel version. Might be worth comparing the two approaches.
This review does not replace specialized tools such as CodeRabbit, Sourcery, or SonarCloud.
This review was generated by an AI agent and may contain inaccuracies. Please verify all suggestions before applying.
| try { | ||
| // run from public/ so paths stay relative to the site root; include only .md files and llms.txt | ||
| execFileSync('zip', ['-r', '-q', BUNDLE_NAME, '.', '-i', '*.md', 'llms.txt'], { | ||
| cwd: PUBLIC_DIR, |
There was a problem hiding this comment.
This requires the system zip binary to be installed. While common on Linux/macOS (and GitHub Actions runners), it's an undocumented dependency. If zip is missing, the error is caught and logged but the build continues silently — which could be confusing when the bundle is expected but missing.
Consider adding a pre-check:
| cwd: PUBLIC_DIR, | |
| // verify zip is available | |
| try { | |
| execFileSync('zip', ['--version'], { stdio: 'pipe' }); | |
| } catch { | |
| console.warn(`Skipping ${BUNDLE_NAME}: 'zip' command not found`); | |
| return; | |
| } | |
| // run from public/ so paths stay relative to the site root; include only .md files and llms.txt | |
| execFileSync('zip', ['-r', '-q', BUNDLE_NAME, '.', '-i', '*.md', 'llms.txt'], { |
| @@ -0,0 +1,46 @@ | |||
| const fs = require('fs'); | |||
There was a problem hiding this comment.
This runs on every build (wired into generate-markdown). For a large doc set, zipping thousands of .md files adds processing time even for local dev builds where the bundle isn't needed. Consider gating it behind an environment variable (e.g. CAMEL_ENV=production) or making it a separate gulp task.
|
🚀 Preview is available at |
|
@davsclaus good call — I measured it against the live site rather than guessing:
So uncompressed is ~250–480 MB (that's the ~500 MB you expected), and the So I agree it's too big to commit into
I'm happy to rework this PR towards (1). Which distribution mechanism do you prefer? Reported by Claude Code on behalf of Karol Krawczyk |
|
See #1667 for another approach to publish on github-website |
|
Agreed — #1667's versioned, release-asset approach is the better fit. It's exactly the direction my size analysis pointed to: ~70–130 MB compressed is too much to ship in Reported by Claude Code on behalf of Karol Krawczyk |
|
Thanks @k-krawczyk take a look at the PR - and we can try to make a release of 4.20.0 etc to see if it works ;) |
|
closing this in favour of #1667 |
CAMEL-23781
Companies may restrict their AI coding agents from accessing the internet (or allow-list access only after a slow approval). This adds an offline documentation bundle so agents — and humans — can read the Camel docs locally without reaching
camel.apache.org.Changes
gulp/helpers/offline-bundle.js: after the Markdown files are generated, zips allpublic/**/*.mdfiles pluspublic/llms.txt— preserving the site directory structure — intopublic/camel-docs-offline.zip. Uses the systemziptool, so no new dependency is added.generate-markdowntask, right afterllms.txtis generated.llms-txt-template.mdnow points agents at the offline archive, so an AI that fetchesllms.txtdiscovers the offline option.Usage: download
https://camel.apache.org/camel-docs-offline.zip, unzip it locally (for example into/tmp), and read the.mdfiles from there.Testing
The helper was verified in isolation (generate → extract round-trip): the archive preserves the nested directory structure, includes
llms.txt, excludes.html/binary assets, and does not include itself. A full site build (Antora + Hugo) was not run locally.Reported by Claude Code on behalf of Karol Krawczyk