feat(browser): remove silent html truncation, add --as json#1102
Merged
feat(browser): remove silent html truncation, add --as json#1102
Conversation
`browser get html` had two agent-hostile defaults:
1. A silent 50000-char cap on the returned HTML — agents that got a
truncated page had no signal they were looking at half the DOM.
2. Only raw HTML string output, forcing agents to re-parse for
structured extraction.
Changes:
- Default output is now the full outerHTML, no truncation
- `--max <n>` opts in to a character cap; when the cap actually
trips, the HTML is prepended with
`<!-- opencli: truncated N of M chars; re-run without --max ... -->`
so agents always see the signal
- `--as json` returns `{selector, matched, tree}` where `tree` is
`{tag, attrs, text, children}` recursively. `matched` is the full
count of selector matches so agents know when more elements exist
beyond the first. `text` is the node's own direct text children,
whitespace-collapsed; child elements live in `children`.
- `--selector` not matching any element now emits structured
`{error:{code:"selector_not_found", ...}}` with a non-zero exit
code, in both raw and json modes (was `(empty)` stdout previously,
indistinguishable from empty element)
- Invalid `--as` / negative `--max` emit structured
`invalid_format` / `invalid_max` error codes
Extracted the tree serializer as `src/browser/html-tree.ts` so the
JS expression can be unit-tested against a DOM stub.
…t --max Both edges previously bypassed the structured-error contract introduced in #1102, which agents rely on for branching: - Invalid CSS selector: querySelector(All) would throw SyntaxError through page.evaluate into the generic exception path. Wrap the lookup in try/catch inside page context for both raw and --as json paths; surface as {error:{code:"invalid_selector", message}} + non-zero exit. - --max validation: parseInt silently accepted "1.5" -> 1 and "10abc" -> 10. Switch to a strict /^\\d+$/ check so fractional, negative, and non-numeric values all return {error:{code:"invalid_max"}}; validation runs up front so bad values never reach the page. Covered by new unit tests in cli.test.ts (fractional, non-numeric, invalid selector on raw + json) and html-tree.test.ts (SyntaxError -> invalidSelector envelope). Co-authored-by: freemandealer <freeman.zhang1992@gmail.com>
luxiaolei
pushed a commit
to luxiaolei/OpenCLI
that referenced
this pull request
Apr 21, 2026
…r#1102) * feat(browser): remove silent html truncation, add --as json tree output `browser get html` had two agent-hostile defaults: 1. A silent 50000-char cap on the returned HTML — agents that got a truncated page had no signal they were looking at half the DOM. 2. Only raw HTML string output, forcing agents to re-parse for structured extraction. Changes: - Default output is now the full outerHTML, no truncation - `--max <n>` opts in to a character cap; when the cap actually trips, the HTML is prepended with `<!-- opencli: truncated N of M chars; re-run without --max ... -->` so agents always see the signal - `--as json` returns `{selector, matched, tree}` where `tree` is `{tag, attrs, text, children}` recursively. `matched` is the full count of selector matches so agents know when more elements exist beyond the first. `text` is the node's own direct text children, whitespace-collapsed; child elements live in `children`. - `--selector` not matching any element now emits structured `{error:{code:"selector_not_found", ...}}` with a non-zero exit code, in both raw and json modes (was `(empty)` stdout previously, indistinguishable from empty element) - Invalid `--as` / negative `--max` emit structured `invalid_format` / `invalid_max` error codes Extracted the tree serializer as `src/browser/html-tree.ts` so the JS expression can be unit-tested against a DOM stub. * fix(browser get html): structured errors for invalid selector & strict --max Both edges previously bypassed the structured-error contract introduced in jackwener#1102, which agents rely on for branching: - Invalid CSS selector: querySelector(All) would throw SyntaxError through page.evaluate into the generic exception path. Wrap the lookup in try/catch inside page context for both raw and --as json paths; surface as {error:{code:"invalid_selector", message}} + non-zero exit. - --max validation: parseInt silently accepted "1.5" -> 1 and "10abc" -> 10. Switch to a strict /^\\d+$/ check so fractional, negative, and non-numeric values all return {error:{code:"invalid_max"}}; validation runs up front so bad values never reach the page. Covered by new unit tests in cli.test.ts (fractional, non-numeric, invalid selector on raw + json) and html-tree.test.ts (SyntaxError -> invalidSelector envelope). Co-authored-by: freemandealer <freeman.zhang1992@gmail.com> --------- Co-authored-by: freemandealer <freeman.zhang1992@gmail.com> (cherry picked from commit 6cf5cb2)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per #opencli-browser discussion (follow-up to #1100),
browser get htmlhad two agent-hostile defaults:Changes
outerHTML.slice(0, 50000), no markerouterHTML, no cap--max <n>; truncation prepends<!-- opencli: truncated N of M chars ... -->--as json→{selector, matched, tree}tree of{tag, attrs, text, children}(empty)(indistinguishable from empty element){error:{code:\"selector_not_found\"}}+ exit 1--as/--maxinvalid_format/invalid_maxerror codes--as jsonshape{ \"selector\": \".hero\", \"matched\": 1, \"tree\": { \"tag\": \"div\", \"attrs\": { \"class\": \"hero\", \"id\": \"x\" }, \"text\": \"Hello\", \"children\": [ { \"tag\": \"span\", \"attrs\": {}, \"text\": \"world\", \"children\": [] } ] } }matchedis the fullquerySelectorAllcount, not just 1 — agents know when more elements existtextis direct text-children concatenated + whitespace-collapsed; element children carry their own text, ordering between text and elements is not preserved (agents who need ordering should use raw HTML mode)attrspass through untouchedCode layout
src/browser/html-tree.ts— the tree JS expression, unit-tested against a DOM stubsrc/cli.ts— command rewrite, structured error paths shared withbrowser networkTest plan
src/browser/html-tree.test.ts— serializer on simple elements, whitespace collapse in direct text, recursion with attrs, multi-match first-wins with matched count, zero-match null tree (5 tests)src/cli.test.tsbrowser get html suite — full default,--maxwith truncation marker, negative--max,--as jsonenvelope,selector_not_found(raw and json), bad--asformat (7 tests)npm test— 221 files, 1661 tests, 2 skipped, all greennpm run build— clean TypeScript build + manifest