Skip to content

Commit 5f31f75

Browse files
committed
fix snapshot/get bug;update install cmd
1 parent 38c7be5 commit 5f31f75

14 files changed

Lines changed: 1016 additions & 47 deletions

File tree

README-CN.md

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
- 📦 **双重身份** — 既是 CLI 工具,也是可 `go get` 导入的 Go Library
1717
-**启动飞快**~50ms(Go 二进制)对比 ~500ms(Node.js 方案)
1818
- 🔢 **简洁的元素引用**`click 5`
19-
- 🔍 **内置 OCR**可选的 Tesseract 集成,处理图片密集页面
19+
- 🔍 **可选 OCR**通过 `-tags=ocr` 编译启用 Tesseract,处理图片密集页面
2020
- 🌐 **~86 个命令** — 完整对标 agent-browser v0.19.0
2121

2222
> 📖 阅读完整的[快照格式规范](docs/snapshot-format.md),了解详细的设计决策、BNF 语法和示例。([English Version](docs/snapshot-format-en.md))
@@ -49,10 +49,17 @@ mv ko-browser-linux-amd64 /usr/local/bin/kbr
4949
### 源码编译
5050

5151
```bash
52-
go install github.com/libi/ko-browser@latest
53-
mv $(go env GOPATH)/bin/ko-browser $(go env GOPATH)/bin/kbr # 可选重命名
52+
# 安装 kbr 二进制(无 CGO 依赖,无需 Tesseract)
53+
go install github.com/libi/ko-browser/cmd/kbr@latest
54+
55+
# 带 OCR 支持(需要先安装 Tesseract)
56+
CGO_ENABLED=1 go install -tags=ocr github.com/libi/ko-browser/cmd/kbr@latest
5457
```
5558

59+
> **OCR 是可选功能。** 默认编译零 CGO 依赖,开箱即用。
60+
> 仅在需要 `kbr snapshot --ocr` 处理图片密集页面时才需要 `-tags=ocr`
61+
> OCR 依赖 Tesseract:`brew install tesseract`(macOS)/ `apt install libtesseract-dev`(Linux)。
62+
5663
### 安装浏览器
5764

5865
```bash
@@ -332,15 +339,16 @@ kbr 按以下优先级加载配置(低 → 高):
332339

333340
```
334341
kbr
335-
├── browser/ ★ 公开包 — 核心浏览器 API(可 go get 导入)
336-
├── axtree/ ★ 公开包 — AX Tree 提取、过滤、格式化
337-
├── selector/ ★ 公开包 — 元素选择器解析(ID/CSS/XPath)
338-
├── ocr/ ★ 公开包 — 可选 Tesseract OCR 引擎
339-
├── cmd/ CLI 层 — cobra 命令定义
340-
└── internal/ 内部包 — 守护进程、会话管理(仅 CLI 使用)
342+
├── cmd/kbr/ ★ CLI 入口 — `go install .../cmd/kbr@latest`
343+
├── browser/ ★ 公开包 — 核心浏览器 API(可 go get 导入)
344+
├── axtree/ ★ 公开包 — AX Tree 提取、过滤、格式化
345+
├── selector/ ★ 公开包 — 元素选择器解析(ID/CSS/XPath)
346+
├── ocr/ ★ 公开包 — Tesseract OCR 引擎(需 build tag: ocr)
347+
├── cmd/ CLI 层 — cobra 命令定义
348+
└── internal/ 内部包 — 守护进程、会话管理(仅 CLI 使用)
341349
```
342350

343-
`browser/``axtree/``selector/``ocr/` 均为公开包,可通过 `go get` 导入。`internal/` 仅供 CLI 守护进程使用。
351+
`browser/``axtree/``selector/``ocr/` 均为公开包,可通过 `go get` 导入。`internal/` 仅供 CLI 守护进程使用。OCR 需编译时添加 `-tags=ocr`
344352

345353
---
346354

@@ -351,7 +359,8 @@ kbr
351359
```bash
352360
git clone https://github.com/libi/ko-browser.git
353361
cd ko-browser
354-
go build -o kbr .
362+
go build -o kbr ./cmd/kbr/ # 不含 OCR
363+
go build -tags=ocr -o kbr ./cmd/kbr/ # 含 OCR
355364
go test ./tests/ -v -timeout 180s
356365
```
357366

README.md

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
- 📦 **Dual-use** — works as a CLI tool AND a Go library (`go get`)
3030
-**Fast startup**~50ms (Go binary) vs ~500ms (Node.js-based tools)
3131
- 🔢 **Simple element references**`click 5`
32-
- 🔍 **Built-in OCR**optional Tesseract integration for image-heavy pages
32+
- 🔍 **Optional OCR** — Tesseract integration via `-tags=ocr` build flag for image-heavy pages
3333
- 🌐 **~86 commands** — full parity with agent-browser v0.19.0
3434

3535
### Snapshot Format Comparison
@@ -74,10 +74,17 @@ mv ko-browser-linux-amd64 /usr/local/bin/kbr
7474
### From source
7575

7676
```bash
77-
go install github.com/libi/ko-browser@latest
78-
mv $(go env GOPATH)/bin/ko-browser $(go env GOPATH)/bin/kbr # optional rename
77+
# Install kbr binary directly (no CGO, no Tesseract needed)
78+
go install github.com/libi/ko-browser/cmd/kbr@latest
79+
80+
# With OCR support (requires Tesseract to be installed)
81+
CGO_ENABLED=1 go install -tags=ocr github.com/libi/ko-browser/cmd/kbr@latest
7982
```
8083

84+
> **OCR is optional.** The default build has zero CGO dependencies and works everywhere.
85+
> Only add `-tags=ocr` if you need `kbr snapshot --ocr` for image-heavy pages.
86+
> This requires Tesseract: `brew install tesseract` (macOS) / `apt install libtesseract-dev` (Linux).
87+
8188
### Install Chrome (if not already installed)
8289

8390
```bash
@@ -704,15 +711,16 @@ kbr loads configuration in this priority (low → high):
704711

705712
```
706713
kbr
707-
├── browser/ ★ Public Go library — core browser API
708-
├── axtree/ ★ Public — AX Tree extraction, filtering, formatting
709-
├── selector/ ★ Public — element selector parsing (ID/CSS/XPath)
710-
├── ocr/ ★ Public — optional Tesseract OCR engine
711-
├── cmd/ CLI — cobra command definitions
712-
└── internal/ CLI-only — daemon, session management
714+
├── cmd/kbr/ ★ CLI entry point — `go install .../cmd/kbr@latest`
715+
├── browser/ ★ Public Go library — core browser API
716+
├── axtree/ ★ Public — AX Tree extraction, filtering, formatting
717+
├── selector/ ★ Public — element selector parsing (ID/CSS/XPath)
718+
├── ocr/ ★ Public — Tesseract OCR engine (build tag: ocr)
719+
├── cmd/ CLI — cobra command definitions
720+
└── internal/ CLI-only — daemon, session management
713721
```
714722

715-
The `browser/`, `axtree/`, `selector/`, and `ocr/` packages are all public and importable via `go get`. The `internal/` package is only used by the CLI daemon.
723+
The `browser/`, `axtree/`, `selector/`, and `ocr/` packages are all public and importable via `go get`. The `internal/` package is only used by the CLI daemon. OCR requires `-tags=ocr` at build time.
716724

717725
---
718726

@@ -723,7 +731,8 @@ Contributions are welcome! Please feel free to submit a Pull Request.
723731
```bash
724732
git clone https://github.com/libi/ko-browser.git
725733
cd ko-browser
726-
go build -o kbr .
734+
go build -o kbr ./cmd/kbr/ # without OCR
735+
go build -tags=ocr -o kbr ./cmd/kbr/ # with OCR
727736
go test ./tests/ -v -timeout 180s
728737
```
729738

browser/fallback.go

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,52 @@ func (b *Browser) interactiveOrdinal(id int) (int, error) {
5555
return targetOrdinal, nil
5656
}
5757

58+
// allElementOrdinal returns the ordinal position (0-based) of the element in a
59+
// TreeWalker traversal of all DOM elements. Returns -1 if not found.
60+
// This is used as a last-resort fallback when backend node IDs are stale and
61+
// the element is not interactive.
62+
func (b *Browser) allElementOrdinal(id int) int {
63+
if b.lastSnap == nil {
64+
return -1
65+
}
66+
67+
displayID := 0
68+
ordinal := -1
69+
targetOrdinal := -1
70+
71+
var walk func(nodes []*axtree.Node) bool
72+
walk = func(nodes []*axtree.Node) bool {
73+
for _, node := range nodes {
74+
if isRootRole(node.Role) {
75+
if walk(node.Children) {
76+
return true
77+
}
78+
continue
79+
}
80+
81+
displayID++
82+
// Count only element-like roles (skip pure text nodes)
83+
roleLower := strings.ToLower(node.Role)
84+
if roleLower != "statictext" && roleLower != "inlinetext" {
85+
ordinal++
86+
}
87+
if displayID == id {
88+
if roleLower != "statictext" && roleLower != "inlinetext" {
89+
targetOrdinal = ordinal
90+
}
91+
return true
92+
}
93+
if walk(node.Children) {
94+
return true
95+
}
96+
}
97+
return false
98+
}
99+
100+
walk(b.lastSnap.Nodes)
101+
return targetOrdinal
102+
}
103+
58104
func isInteractiveRole(role string) bool {
59105
switch strings.ToLower(role) {
60106
case "button", "link", "textbox", "searchbox", "checkbox", "radio", "combobox", "listbox", "menuitem", "menuitemcheckbox", "menuitemradio", "option", "switch", "slider", "spinbutton", "tab", "treeitem", "scrollbar":

browser/query.go

Lines changed: 72 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,34 @@ func (b *Browser) GetURL() (string, error) {
2121
}
2222

2323
// GetText returns the inner text of the element with the given display ID.
24+
// For form elements (input, textarea, select), it returns the value or placeholder.
2425
func (b *Browser) GetText(id int) (string, error) {
25-
return b.evaluateOnElement(id, `function() { return this.innerText || this.textContent || ''; }`)
26+
return b.evaluateOnElement(id, `function() {
27+
var tag = (this.tagName || '').toUpperCase();
28+
if (tag === 'INPUT' || tag === 'TEXTAREA' || tag === 'SELECT') {
29+
if (typeof this.value === 'string' && this.value !== '') return this.value;
30+
if (tag === 'SELECT' && this.selectedOptions && this.selectedOptions.length > 0) {
31+
return Array.from(this.selectedOptions).map(function(o) { return o.textContent; }).join(', ');
32+
}
33+
return this.placeholder || '';
34+
}
35+
return this.innerText || this.textContent || '';
36+
}`)
2637
}
2738

2839
// GetHTML returns the inner HTML of the element with the given display ID.
40+
// For void elements (input, img, br, etc.) where innerHTML is always empty,
41+
// it returns outerHTML instead.
2942
func (b *Browser) GetHTML(id int) (string, error) {
30-
return b.evaluateOnElement(id, `function() { return this.innerHTML || ''; }`)
43+
return b.evaluateOnElement(id, `function() {
44+
var html = this.innerHTML;
45+
if (html === '' || html === undefined) {
46+
var tag = (this.tagName || '').toUpperCase();
47+
var voidTags = {'INPUT':1,'IMG':1,'BR':1,'HR':1,'META':1,'LINK':1,'AREA':1,'BASE':1,'COL':1,'EMBED':1,'SOURCE':1,'TRACK':1,'WBR':1};
48+
if (voidTags[tag]) return this.outerHTML || '';
49+
}
50+
return html || '';
51+
}`)
3152
}
3253

3354
// GetValue returns the value of a form element with the given display ID.
@@ -160,11 +181,22 @@ func (b *Browser) evaluateString(expression string) (string, error) {
160181

161182
// evaluateOnElement resolves a display ID to a remote object and calls a JS function on it.
162183
// Returns the string result.
184+
// If the backend node ID is stale (DOM has changed since last snapshot), it will
185+
// refresh the snapshot and retry once before falling back to interactive ordinal.
163186
func (b *Browser) evaluateOnElement(id int, function string, args ...any) (string, error) {
164187
ctx, cancel := b.operationContext()
165188
defer cancel()
166189

167190
remoteObj, _, err := b.resolveRemoteObject(ctx, id)
191+
if err != nil {
192+
// Backend node ID may be stale; refresh snapshot and retry once
193+
if _, snapErr := b.Snapshot(); snapErr == nil {
194+
if retryObj, _, retryErr := b.resolveRemoteObject(ctx, id); retryErr == nil {
195+
remoteObj = retryObj
196+
err = nil
197+
}
198+
}
199+
}
168200
if err != nil {
169201
// Fallback: use interactiveOrdinal if available
170202
return b.evaluateOnElementFallback(id, function, args...)
@@ -183,10 +215,18 @@ func (b *Browser) evaluateOnElement(id int, function string, args ...any) (strin
183215
}
184216
return nil
185217
}))
218+
// If callFunctionOn failed (e.g. node was collected), try fallback
219+
if err != nil {
220+
if fallbackResult, fallbackErr := b.evaluateOnElementFallback(id, function, args...); fallbackErr == nil {
221+
return fallbackResult, nil
222+
}
223+
}
186224
return result, err
187225
}
188226

189227
// evaluateOnElementFallback uses querySelectorAll to find the element.
228+
// It first tries via interactive ordinal for interactive elements, then falls back
229+
// to a general tree-walker approach for any element.
190230
func (b *Browser) evaluateOnElementFallback(id int, function string, args ...any) (string, error) {
191231
// Build args JSON array for JS
192232
argsJSON := "[]"
@@ -199,18 +239,40 @@ func (b *Browser) evaluateOnElementFallback(id int, function string, args ...any
199239
}
200240

201241
ordinal, err := b.interactiveOrdinal(id)
202-
if err != nil {
203-
// Not interactive, try all elements via tree walk
242+
if err == nil {
243+
// Interactive element: use querySelectorAll
244+
expression := `(() => {
245+
const elements = Array.from(document.querySelectorAll(` + mustJSON(interactiveQuery) + `));
246+
const el = elements[` + mustJSON(ordinal) + `];
247+
if (!el) throw new Error('element not found');
248+
const fn = ` + function + `;
249+
const args = ` + argsJSON + `;
250+
return fn.apply(el, args);
251+
})()`
252+
253+
return b.evaluateString(expression)
254+
}
255+
256+
// Non-interactive element: use TreeWalker to find by position
257+
// Walk all element/text nodes and match by ordinal position in the AX tree
258+
allOrdinal := b.allElementOrdinal(id)
259+
if allOrdinal < 0 {
204260
return "", fmt.Errorf("element %d not found for query", id)
205261
}
206262

207263
expression := `(() => {
208-
const elements = Array.from(document.querySelectorAll(` + mustJSON(interactiveQuery) + `));
209-
const el = elements[` + mustJSON(ordinal) + `];
210-
if (!el) throw new Error('element not found');
211-
const fn = ` + function + `;
212-
const args = ` + argsJSON + `;
213-
return fn.apply(el, args);
264+
const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_ELEMENT);
265+
let idx = -1;
266+
let node;
267+
while ((node = walker.nextNode())) {
268+
idx++;
269+
if (idx === ` + mustJSON(allOrdinal) + `) {
270+
const fn = ` + function + `;
271+
const args = ` + argsJSON + `;
272+
return fn.apply(node, args);
273+
}
274+
}
275+
throw new Error('element not found at ordinal ' + ` + mustJSON(allOrdinal) + `);
214276
})()`
215277

216278
return b.evaluateString(expression)
File renamed without changes.

internal/axtree/format.go

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -113,16 +113,21 @@ func formatNodeWithOptions(buf *strings.Builder, node *Node, depth int, counter
113113

114114
isInteractive := interactiveRoles[roleLower]
115115

116-
// InteractiveOnly: skip non-interactive nodes but still recurse their children
116+
// InteractiveOnly: skip non-interactive nodes but still recurse their children.
117+
// We still increment the counter to keep IDs consistent with BuildIDMap,
118+
// and use depth+1 to preserve the tree hierarchy.
117119
if opts.InteractiveOnly && !isInteractive {
120+
*counter++
118121
for _, child := range node.Children {
119-
formatNodeWithOptions(buf, child, depth, counter, opts)
122+
formatNodeWithOptions(buf, child, depth+1, counter, opts)
120123
}
121124
return
122125
}
123126

124-
// Compact: skip structural wrappers without names that have children
127+
// Compact: skip structural wrappers without names that have children.
128+
// We still increment the counter to keep IDs consistent with BuildIDMap.
125129
if opts.Compact && !isInteractive && node.Name == "" && len(node.Children) > 0 {
130+
*counter++
126131
for _, child := range node.Children {
127132
formatNodeWithOptions(buf, child, depth, counter, opts)
128133
}
@@ -131,8 +136,12 @@ func formatNodeWithOptions(buf *strings.Builder, node *Node, depth int, counter
131136

132137
*counter++
133138

134-
// MaxDepth: if we've exceeded the limit, still count but don't print
139+
// MaxDepth: if we've exceeded the limit, still count children but don't print.
140+
// We must recurse to keep counter consistent with BuildIDMap.
135141
if opts.MaxDepth > 0 && depth >= opts.MaxDepth {
142+
for _, child := range node.Children {
143+
countNodeOnly(child, counter)
144+
}
136145
return
137146
}
138147

@@ -173,10 +182,28 @@ func formatNodeWithOptions(buf *strings.Builder, node *Node, depth int, counter
173182
buf.WriteByte('\n')
174183

175184
// Recurse into children
176-
if opts.MaxDepth <= 0 || depth+1 < opts.MaxDepth {
185+
for _, child := range node.Children {
186+
formatNodeWithOptions(buf, child, depth+1, counter, opts)
187+
}
188+
}
189+
190+
// countNodeOnly increments the counter for a node and all its descendants
191+
// without producing any output. Used to keep IDs consistent when nodes are
192+
// hidden by MaxDepth or other filters.
193+
func countNodeOnly(node *Node, counter *int) {
194+
if node == nil {
195+
return
196+
}
197+
roleLower := strings.ToLower(node.Role)
198+
if rootRoles[roleLower] {
177199
for _, child := range node.Children {
178-
formatNodeWithOptions(buf, child, depth+1, counter, opts)
200+
countNodeOnly(child, counter)
179201
}
202+
return
203+
}
204+
*counter++
205+
for _, child := range node.Children {
206+
countNodeOnly(child, counter)
180207
}
181208
}
182209

0 commit comments

Comments
 (0)