Skip to content

CAMEL-23781: Generate an offline documentation bundle (camel-docs-offline.zip)#1666

Closed
k-krawczyk wants to merge 1 commit into
apache:mainfrom
k-krawczyk:CAMEL-23781-offline-docs-bundle
Closed

CAMEL-23781: Generate an offline documentation bundle (camel-docs-offline.zip)#1666
k-krawczyk wants to merge 1 commit into
apache:mainfrom
k-krawczyk:CAMEL-23781-offline-docs-bundle

Conversation

@k-krawczyk

Copy link
Copy Markdown

CAMEL-23781

Companies may restrict their AI coding agents from accessing the internet (or allow-list access only after a slow approval). This adds an offline documentation bundle so agents — and humans — can read the Camel docs locally without reaching camel.apache.org.

Changes

  • New gulp/helpers/offline-bundle.js: after the Markdown files are generated, zips all public/**/*.md files plus public/llms.txt — preserving the site directory structure — into public/camel-docs-offline.zip. Uses the system zip tool, so no new dependency is added.
  • Wired into the generate-markdown task, right after llms.txt is generated.
  • llms-txt-template.md now points agents at the offline archive, so an AI that fetches llms.txt discovers the offline option.

Usage: download https://camel.apache.org/camel-docs-offline.zip, unzip it locally (for example into /tmp), and read the .md files from there.

Testing

The helper was verified in isolation (generate → extract round-trip): the archive preserves the nested directory structure, includes llms.txt, excludes .html/binary assets, and does not include itself. A full site build (Antora + Hugo) was not run locally.

Reported by Claude Code on behalf of Karol Krawczyk

Add an offline documentation bundle so AI coding agents (and humans) with no
or restricted internet access can read the Camel docs locally.

After the Markdown files are generated, the generate-markdown task now zips all
public/**/*.md files plus llms.txt - preserving the site directory structure -
into public/camel-docs-offline.zip, using the system zip tool (no new dependency).
The bundle can be downloaded once, unzipped (e.g. into /tmp) and read offline.

The llms.txt template now points agents at this offline archive.
@k-krawczyk

Copy link
Copy Markdown
Author

cc @davsclaus — implemented per your steer (system zip, named camel-docs-offline.zip). The bundle keeps all docs as Markdown so it's readable by both agents and humans offline.

Reported by Claude Code on behalf of Karol Krawczyk

@davsclaus

Copy link
Copy Markdown
Contributor

did you try to check how big the .zip file is ?

@davsclaus

Copy link
Copy Markdown
Contributor

We may need to distrubute the zip as a seperate publish as if we just change a comma, then we struggle to publish the website if it then includes a new 500mb zip file. Unless git can upload this clever (i dont know)

@davsclaus davsclaus left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! The idea of providing offline documentation for AI agents is great.

A few observations from the review:

Branch naming: Per project conventions, feature branches should follow feature/<ISSUE_ID>-<short-slug> (e.g. feature/CAMEL-23781-offline-docs-bundle).

System dependency: The code uses execFileSync('zip', ...) which requires the system zip binary. While GitHub Actions runners have it, this is an undocumented runtime dependency.

Full build not tested: The PR body notes the full site build was not verified locally — worth confirming before merge.

Alternative approach: We've opened CAMEL-23788 / PR #1667 with a versioned approach: a manually-triggered workflow that generates per-version bundles (e.g. camel-docs-4.18.zip) uploaded as GitHub Release assets. This avoids running the zip on every build, keeps binaries out of git, and lets AI agents download the bundle matching their specific Camel version. Might be worth comparing the two approaches.

This review does not replace specialized tools such as CodeRabbit, Sourcery, or SonarCloud.

This review was generated by an AI agent and may contain inaccuracies. Please verify all suggestions before applying.

try {
// run from public/ so paths stay relative to the site root; include only .md files and llms.txt
execFileSync('zip', ['-r', '-q', BUNDLE_NAME, '.', '-i', '*.md', 'llms.txt'], {
cwd: PUBLIC_DIR,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires the system zip binary to be installed. While common on Linux/macOS (and GitHub Actions runners), it's an undocumented dependency. If zip is missing, the error is caught and logged but the build continues silently — which could be confusing when the bundle is expected but missing.

Consider adding a pre-check:

Suggested change
cwd: PUBLIC_DIR,
// verify zip is available
try {
execFileSync('zip', ['--version'], { stdio: 'pipe' });
} catch {
console.warn(`Skipping ${BUNDLE_NAME}: 'zip' command not found`);
return;
}
// run from public/ so paths stay relative to the site root; include only .md files and llms.txt
execFileSync('zip', ['-r', '-q', BUNDLE_NAME, '.', '-i', '*.md', 'llms.txt'], {

@@ -0,0 +1,46 @@
const fs = require('fs');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs on every build (wired into generate-markdown). For a large doc set, zipping thousands of .md files adds processing time even for local dev builds where the bundle isn't needed. Consider gating it behind an environment variable (e.g. CAMEL_ENV=production) or making it a separate gulp task.

@github-actions

Copy link
Copy Markdown
Contributor

🚀 Preview is available at

@k-krawczyk

Copy link
Copy Markdown
Author

@davsclaus good call — I measured it against the live site rather than guessing:

  • ~5,570 doc pages in the sitemap (already spanning multiple versions: next, 4.18.x, 4.14.x, …)
  • average .md ~40–86 KB (component pages with big option tables are the large ones)
  • real zip ratio measured on a 12-page sample: ~26.5%

So uncompressed is ~250–480 MB (that's the ~500 MB you expected), and the .zip lands around ~70–130 MB. Plain zip compresses each file independently, so the cross-version duplication doesn't shrink it much — a solid .tar.gz would be smaller. If the on-disk build keeps more versions than the sitemap exposes, it'd scale up proportionally.

So I agree it's too big to commit into public/ and redeploy on every change. Options:

  1. Build it only on release (or a scheduled job) and publish it as a GitHub Release asset, with llms.txt pointing at that URL. The project already consumes release binaries via the github-release-binary yarn plugin, so this fits the existing distribution model.
  2. Ship .tar.gz instead of .zip to roughly halve the size.
  3. Split into smaller per-area bundles.

I'm happy to rework this PR towards (1). Which distribution mechanism do you prefer?

Reported by Claude Code on behalf of Karol Krawczyk

@davsclaus

davsclaus commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

See #1667 for another approach to publish on github-website

@k-krawczyk

Copy link
Copy Markdown
Author

Agreed — #1667's versioned, release-asset approach is the better fit. It's exactly the direction my size analysis pointed to: ~70–130 MB compressed is too much to ship in public/ on every build, and per-version bundles let agents grab only the version they need. I'm happy to defer to #1667 and help review/test it, and to fold in any useful llms.txt wording. Thanks @davsclaus!

Reported by Claude Code on behalf of Karol Krawczyk

@davsclaus

Copy link
Copy Markdown
Contributor

Thanks @k-krawczyk take a look at the PR - and we can try to make a release of 4.20.0 etc to see if it works ;)

@davsclaus

Copy link
Copy Markdown
Contributor

closing this in favour of #1667

@davsclaus davsclaus closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants