Commit b780739
authored
feat(annotate): support HTML files and URL annotation (#545)
* fix(annotate): sanitize dangerous link protocols in markdown renderer
Block javascript:, data:, and vbscript: URLs in InlineMarkdown link
rendering. Links with dangerous protocols render as plain text instead
of clickable anchors. Uses a blocklist approach so existing links with
custom protocols (obsidian://, vscode://, Windows C:\ paths) continue
to work.
For provenance purposes, this commit was AI assisted.
* feat(annotate): add HTML-to-markdown and URL-to-markdown utilities
- html-to-markdown.ts: Turndown wrapper with GFM table rule, strips
script/style/noscript tags
- url-to-markdown.ts: Jina Reader (free, returns markdown) with
fetch+Turndown fallback. Warns on Jina failure, auto-skips Jina for
local/private URLs (localhost, 192.168.*, 10.*, etc.)
- config.ts: add jina setting and resolveUseJina() with priority chain
--no-jina flag > PLANNOTATOR_JINA env > config.json > default true
For provenance purposes, this commit was AI assisted.
* feat(annotate): support HTML files and URLs in annotate command
Extend the annotate subcommand to accept .html/.htm local files
(converted via Turndown) and https:// URLs (fetched via Jina Reader
with fetch+Turndown fallback). URL content is fetched terminal-side
before opening the browser.
Add --no-jina global flag to disable Jina Reader per-invocation.
Add 10MB file size guard for local HTML files.
For provenance purposes, this commit was AI assisted.
* feat(annotate): HTML files in folder browser and on-demand conversion
- Widen file browser glob to include .html/.htm alongside markdown
- handleDoc converts HTML files via Turndown on demand when selected
- hasMarkdownFiles accepts optional extensions param for folder validation
- Add sourceInfo field to annotate server API response
- Add _site/, public/, out/, .docusaurus/, .jekyll-cache/,
storybook-static/ to FILE_BROWSER_EXCLUDED
For provenance purposes, this commit was AI assisted.
* feat(annotate): source attribution badge for HTML/URL annotations
Show a subtle badge in DocBadges displaying the URL hostname or HTML
filename for converted content. Thread sourceInfo from API response
through App → Viewer → DocBadges.
Also update Pi extension to accept HTML-only folders in annotate mode.
For provenance purposes, this commit was AI assisted.
* test: update CLI help text assertion for HTML/URL annotate support
For provenance purposes, this commit was AI assisted.
* fix(annotate): address PR review findings
Security:
- Add project-root containment check for HTML files in /api/doc handler
using exported isWithinProjectRoot() from resolve-file.ts
- Blocks path traversal via absolute paths or ../ escapes
isLocalUrl fixes:
- Add bracketed IPv6 loopback [::1] detection
- Replace hostname.startsWith('10.') with proper IPv4 regex to avoid
matching public hostnames like 10.example.com
Revert Pi extension change:
- Pi server doesn't implement HTML file browsing or conversion yet
- Keep Pi folder validation markdown-only until both implementations
are updated per CLAUDE.md guidelines
Cleanup:
- Remove dead el.children || el.childNodes fallback in table rule
- Extract hostnameOrFallback() helper to @plannotator/shared/project
replacing duplicated try/catch IIFEs in DocBadges and index.ts
For provenance purposes, this commit was AI assisted.
* feat(annotate): Pi extension HTML annotation parity
Bring the Pi extension to full parity with the Bun server for HTML
annotation support:
- Vendor html-to-markdown and url-to-markdown via vendor.sh
- walkMarkdownFiles now scans .html/.htm alongside markdown
- handleDocRequest converts HTML files on-demand via Turndown with
isWithinProjectRoot containment check
- serverAnnotate includes sourceInfo in /api/plan response
- index.ts supports URL detection (Jina Reader + fallback), HTML file
detection with Turndown conversion, folder HTML validation, and 10MB
file size guard
- openMarkdownAnnotation accepts and threads sourceInfo
- Add turndown as a Pi extension dependency
For provenance purposes, this commit was AI assisted.
* fix(pi): Obsidian vault walks stay markdown-only, add try/catch for HTML
- Add extensions param to walkMarkdownFiles (default: HTML-inclusive)
- Obsidian callers pass /\.mdx?$/i to match Bun server behavior
- Add try/catch around HTML file reads in handleDocRequest
For provenance purposes, this commit was AI assisted.
* fix(annotate): address second review — base-block traversal, metadata IP, dead code
Security:
- Add isWithinProjectRoot check to the base-relative block for HTML
files in both Bun and Pi /api/doc handlers. Previously HTML files
served via the base query param bypassed the containment guard.
- Add 169.254.0.0/16 (link-local / cloud metadata) to isLocalUrl
private IP ranges
Cleanup:
- Remove dead hostname === "[::1]" check (WHATWG URL parser strips
brackets; hostname === "::1" already handles it)
- Remove dead parent?.childNodes fallback in table cell() function
For provenance purposes, this commit was AI assisted.
* refactor(annotate): replace custom table rules with turndown-plugin-gfm
Drop ~60 lines of hand-rolled GFM table conversion that had a bug
(tables without explicit <thead> produced invalid GFM). Use the
official turndown-plugin-gfm plugin (24KB) which correctly handles
all table patterns plus adds strikethrough and task list support.
For provenance purposes, this commit was AI assisted.
* fix(annotate): handle all CommonMark backslash escapes in InlineMarkdown
Expand the backslash escape regex to cover all CommonMark-defined
escapable characters (. ) - # > + | { } &), not just the subset
the parser uses for formatting. Fixes literal backslashes appearing
in rendered output for Turndown-escaped content like "1\." → "1.".
For provenance purposes, this commit was AI assisted.
* fix(annotate): prevent SSRF via redirect to private/local URLs
Replace redirect: "follow" with redirect: "manual" in fetchViaTurndown
and validate each redirect hop against isLocalUrl. Blocks attacks where
an external URL redirects to cloud metadata endpoints (169.254.169.254)
or other private IPs. Limits redirect chain to 10 hops.
For provenance purposes, this commit was AI assisted.
* chore: update lockfile for turndown-plugin-gfm in Pi extension
bun install needed to resolve turndown-plugin-gfm in the Pi extension
workspace after adding it to apps/pi-extension/package.json.
For provenance purposes, this commit was AI assisted.
* fix(annotate): switch to @joplin/turndown-plugin-gfm, fix TS errors
Replace unmaintained turndown-plugin-gfm (2017, v1.0.2) with the
actively maintained Joplin fork (2025, v1.0.64, 16KB).
Fix TypeScript errors that broke CI:
- Add @ts-expect-error for untyped @joplin/turndown-plugin-gfm import
- Restructure fetchViaTurndown redirect loop to avoid uninitialized
variable — first fetch before loop, loop only for redirects
For provenance purposes, this commit was AI assisted.
* fix(annotate): use proper declarations.d.ts instead of ts-expect-error
Add declarations.d.ts for @joplin/turndown-plugin-gfm with typed
function signatures, remove the ts-expect-error suppression.
For provenance purposes, this commit was AI assisted.
* fix: explicitly include declarations.d.ts in shared tsconfig
CI's tsc wasn't finding the ambient module declaration with implicit
include. Add explicit include to ensure declarations.d.ts is always
picked up regardless of environment.
For provenance purposes, this commit was AI assisted.
* fix: use ts-expect-error for @joplin/turndown-plugin-gfm types
CI's tsc does not pick up ambient declarations.d.ts files despite
local tsc finding them — likely a module resolution discrepancy
between environments. Revert to @ts-expect-error which passes in
both CI and local typecheck.
For provenance purposes, this commit was AI assisted.
* fix(annotate): body size limit for URL fetches, redirect error, file: protocol
- Add 10MB body size limit to both Jina and fetch+Turndown URL paths,
matching the local HTML file guard. Streams response body and aborts
if limit exceeded.
- Distinguish "Too many redirects" from a genuine 3xx response after
redirect loop exhaustion.
- Add file: to the dangerous protocol blocklist in sanitizeLinkUrl.
For provenance purposes, this commit was AI assisted.
* fix(annotate): HTML folder outside cwd, HTML linked doc navigation
- Remove containment check from base-relative block for HTML files in
both Bun and Pi /api/doc handlers. Matches markdown behavior so HTML
files in annotated folders outside cwd are served correctly.
Standalone block (no base) retains its cwd check as fallback.
- Widen isLocalMd → isLocalDoc to treat .html/.htm links as linked
documents. Clicking [Next](next.html) in a converted page now opens
it via /api/doc with Turndown conversion instead of a new browser tab.
For provenance purposes, this commit was AI assisted.
* fix(annotate): full loopback range, drain redirect bodies, document env vars
- Expand loopback check from just 127.0.0.1 to the full 127.0.0.0/8
range so all loopback addresses skip Jina Reader
- Cancel redirect response body before re-fetching to avoid leaking
TCP connections back to the pool
- Document PLANNOTATOR_JINA and JINA_API_KEY in CLAUDE.md env var table
For provenance purposes, this commit was AI assisted.
* fix(annotate): IPv6 loopback, readBodyWithLimit fallback, env var docs, comments
- Add [::1] back to isLocalUrl — WHATWG URL hostname getter preserves
brackets for IPv6 (verified: Bun and Node both return "[::1]").
Add comment explaining the empirical verification so future reviewers
don't re-flag.
- Fix readBodyWithLimit null-body fallback to still enforce the 10MB
limit via text length check instead of silently falling through.
- Document PLANNOTATOR_JINA and JINA_API_KEY in AGENTS.md env var table
(CLAUDE.md is a symlink to AGENTS.md).
- Add comments to base-relative blocks in both Bun and Pi handleDoc
explaining the intentional lack of containment check (matches
pre-existing markdown behavior, base is set server-side).
For provenance purposes, this commit was AI assisted.
* fix(annotate): block IPv4-mapped IPv6 and private IPv6 ranges in isLocalUrl
Add PRIVATE_IPV6 regex matching bracketed IPv6 private/reserved ranges:
- ::ffff: (IPv4-mapped — embeds private IPv4 as hex, e.g. [::ffff:c0a8:1])
- fe80: (link-local)
- fc00::/7 (unique-local, covers fc00:: through fdff::)
Closes the redirect-SSRF bypass where a public URL redirects to a
private address expressed as IPv4-mapped IPv6, e.g.
http://[::ffff:169.254.169.254]/latest/meta-data/
For provenance purposes, this commit was AI assisted.
* fix(annotate): document IPv6 hostname verification, sourceInfo type, annotate flow
- Expand isLocalUrl comment with full empirical verification table
showing actual hostname getter output for every IPv6 format in both
Bun and Node — prevents false-positive review findings about brackets
- Add sourceInfo to /api/plan response type in App.tsx for type safety
- Update CLAUDE.md annotate flow diagram to reflect HTML/URL/folder
input types
For provenance purposes, this commit was AI assisted.
* fix(annotate): escape \(, cancel response bodies on error, doc sourceInfo
- Add ( to backslash escape regex alongside existing ) — Turndown
emits \( in link-adjacent contexts
- Cancel response body before throwing on !res.ok in both fetchViaJina
and fetchViaTurndown error paths (redirect loop already did this)
- Document sourceInfo field in AGENTS.md annotate server API table
For provenance purposes, this commit was AI assisted.
* fix(annotate): skip base injection for URL annotations, body cleanup
- Skip dirname(filePath) base injection when filePath is a URL in both
Bun and Pi annotate servers. dirname on a URL string produces a
nonsensical filesystem path, causing linked doc clicks to 404.
URL annotations now let links open normally instead.
- Cancel response body before throwing on content-type mismatch and
content-length overflow in fetchViaTurndown/readBodyWithLimit.
- Fix double parseInt in readBodyWithLimit content-length check.
- Correct AGENTS.md flow diagram: OpenCode not yet implemented for
HTML/URL annotation.
For provenance purposes, this commit was AI assisted.
* feat(annotate): OpenCode HTML file and URL annotation support
Add URL detection (Jina Reader + fallback), HTML file detection with
Turndown conversion, 10MB file size guard, and sourceInfo threading
to OpenCode's handleAnnotateCommand. Uses the same shared utilities
as the Bun CLI and Pi extension.
OpenCode uses the Bun server directly (startAnnotateServer from
@plannotator/server/annotate), so no server-side changes needed —
only the command handler routing was missing.
Note: folder annotation mode is not added (OpenCode didn't have it
before this PR for markdown either — separate scope).
For provenance purposes, this commit was AI assisted.
* chore(annotate): update slash command description, align fetch log messages
- OpenCode plannotator-annotate.md description now mentions HTML/URL
- Align fetch progress messages across all three clients: all now show
"(via Jina Reader)" or "(via fetch+Turndown)" consistently
For provenance purposes, this commit was AI assisted.
* fix(annotate): skip conversion for .md URLs, wikilink HTML targets, cleanup
- URLs ending in .md/.mdx are fetched raw — no Jina, no Turndown.
Content is already markdown. Removes text/plain from fetchViaTurndown
content-type whitelist since .md URLs are now short-circuited.
- Wikilink regex widened to preserve .html/.htm targets instead of
appending .md (e.g. [[page.html]] no longer becomes page.html.md)
- Remove redundant existsSync before statSync in OpenCode handler
For provenance purposes, this commit was AI assisted.
* test(annotate): add htmlToMarkdown conversion tests
Tests cover the core conversion utility that all three clients depend on:
- Basic HTML → markdown (headings, paragraphs, links, code blocks)
- Tables with and without <thead> (the GFM plugin bug that was caught)
- Script/style/noscript stripping
- Strikethrough (GFM)
- Empty HTML handling
- Dangerous links preserved (sanitization is in the renderer, not here)
For provenance purposes, this commit was AI assisted.
* fix(annotate): check content-type before treating .md URLs as raw markdown
URLs ending in .md/.mdx (e.g. GitHub's viewer page for README.md)
may return HTML instead of raw markdown. fetchRawText now checks the
response content-type — if the server returns HTML, returns null so
the caller falls through to Jina/Turndown for proper conversion.
For provenance purposes, this commit was AI assisted.
* fix(annotate): add SSRF redirect protection to fetchRawText
fetchRawText (for .md/.mdx URLs) was using default redirect: "follow"
with no isLocalUrl validation on redirect hops — a .md URL redirecting
to 169.254.169.254 would be followed and credentials returned as
"markdown". Now uses redirect: "manual" with per-hop isLocalUrl checks,
matching fetchViaTurndown's SSRF protection.
For provenance purposes, this commit was AI assisted.1 parent aa73295 commit b780739
27 files changed
Lines changed: 821 additions & 132 deletions
File tree
- apps
- hook/server
- opencode-plugin
- commands
- pi-extension
- server
- packages
- editor
- server
- ui/components
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
| 107 | + | |
| 108 | + | |
107 | 109 | | |
108 | 110 | | |
109 | 111 | | |
| |||
152 | 154 | | |
153 | 155 | | |
154 | 156 | | |
155 | | - | |
| 157 | + | |
156 | 158 | | |
157 | 159 | | |
158 | | - | |
| 160 | + | |
159 | 161 | | |
160 | | - | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
161 | 167 | | |
162 | 168 | | |
163 | 169 | | |
164 | | - | |
| 170 | + | |
165 | 171 | | |
166 | 172 | | |
167 | 173 | | |
| |||
247 | 253 | | |
248 | 254 | | |
249 | 255 | | |
250 | | - | |
| 256 | + | |
251 | 257 | | |
252 | 258 | | |
253 | 259 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
68 | 70 | | |
69 | 71 | | |
70 | 72 | | |
71 | 73 | | |
72 | 74 | | |
73 | | - | |
| 75 | + | |
74 | 76 | | |
75 | 77 | | |
76 | 78 | | |
77 | 79 | | |
| 80 | + | |
78 | 81 | | |
79 | 82 | | |
80 | 83 | | |
| |||
109 | 112 | | |
110 | 113 | | |
111 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
112 | 120 | | |
113 | 121 | | |
114 | 122 | | |
| |||
451 | 459 | | |
452 | 460 | | |
453 | 461 | | |
454 | | - | |
| 462 | + | |
455 | 463 | | |
456 | 464 | | |
457 | 465 | | |
| |||
468 | 476 | | |
469 | 477 | | |
470 | 478 | | |
471 | | - | |
472 | | - | |
473 | | - | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | 479 | | |
481 | 480 | | |
482 | 481 | | |
483 | 482 | | |
| 483 | + | |
484 | 484 | | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | | - | |
496 | | - | |
497 | | - | |
498 | | - | |
| 485 | + | |
| 486 | + | |
499 | 487 | | |
500 | | - | |
501 | | - | |
502 | | - | |
503 | | - | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
504 | 496 | | |
| 497 | + | |
| 498 | + | |
505 | 499 | | |
506 | 500 | | |
507 | | - | |
508 | | - | |
509 | | - | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
510 | 511 | | |
511 | 512 | | |
512 | | - | |
513 | | - | |
514 | | - | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
515 | 560 | | |
516 | 561 | | |
517 | 562 | | |
| |||
523 | 568 | | |
524 | 569 | | |
525 | 570 | | |
| 571 | + | |
526 | 572 | | |
527 | 573 | | |
528 | 574 | | |
| |||
543 | 589 | | |
544 | 590 | | |
545 | 591 | | |
546 | | - | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
547 | 595 | | |
548 | 596 | | |
549 | 597 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
25 | 29 | | |
26 | 30 | | |
27 | 31 | | |
| |||
149 | 153 | | |
150 | 154 | | |
151 | 155 | | |
152 | | - | |
| 156 | + | |
153 | 157 | | |
154 | 158 | | |
155 | 159 | | |
156 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
157 | 163 | | |
158 | | - | |
159 | | - | |
| 164 | + | |
| 165 | + | |
160 | 166 | | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
172 | 182 | | |
173 | | - | |
174 | | - | |
175 | | - | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
176 | 223 | | |
177 | 224 | | |
178 | 225 | | |
179 | 226 | | |
180 | 227 | | |
| 228 | + | |
181 | 229 | | |
182 | 230 | | |
183 | 231 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
0 commit comments