|
| 1 | +# Snow CLI User Guide - Custom Search Engine |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Snow CLI's web search (the `web-search` MCP tool) is driven by a pluggable |
| 6 | +search engine layer. Built-in engines are `duckduckgo` and `bing`, both of |
| 7 | +which scrape results from a headless browser (no official API used). |
| 8 | + |
| 9 | +If you want to use a different search provider, you can drop a JavaScript |
| 10 | +file into your user directory and Snow CLI will register it automatically — |
| 11 | +no build step, no source code modification. |
| 12 | + |
| 13 | +Use this feature when you want to: |
| 14 | + |
| 15 | +- Use a regional search provider that isn't shipped by default |
| 16 | +- Search your company's internal knowledge base or intranet |
| 17 | +- Customize how an existing provider is scraped (e.g. fix a selector after a |
| 18 | + layout change) |
| 19 | +- Temporarily mask a built-in engine without deleting any file |
| 20 | + |
| 21 | +> The example below uses a fictional provider `example-search.com` purely |
| 22 | +> to illustrate the engine contract. You are responsible for complying with |
| 23 | +> each target site's Terms of Service and `robots.txt` when writing a real |
| 24 | +> plugin. |
| 25 | +
|
| 26 | +## Plugin Directory |
| 27 | + |
| 28 | +Snow CLI loads search engine plugins from: |
| 29 | + |
| 30 | +```bash |
| 31 | +~/.snow/plugin/search_engines/ |
| 32 | +``` |
| 33 | + |
| 34 | +Supported file extensions: |
| 35 | + |
| 36 | +- `.js` |
| 37 | +- `.mjs` (recommended for plain ES Modules) |
| 38 | +- `.cjs` |
| 39 | + |
| 40 | +Notes: |
| 41 | + |
| 42 | +- Plugins are loaded from the user directory only. |
| 43 | +- Snow CLI sorts plugin files by filename and loads them on first web search. |
| 44 | +- Restart Snow CLI after adding or modifying a plugin file (the engine |
| 45 | + registry caches loaded modules for the lifetime of the process). |
| 46 | +- Built-in engines (`duckduckgo`, `bing`) are always registered first; a |
| 47 | + plugin engine with the same `id` overrides the built-in one. |
| 48 | + |
| 49 | +## Export Formats |
| 50 | + |
| 51 | +A plugin module can export in any of these forms (the first non-empty match |
| 52 | +wins, all of them are scanned): |
| 53 | + |
| 54 | +```js |
| 55 | +export default { ... } |
| 56 | +``` |
| 57 | + |
| 58 | +```js |
| 59 | +export const searchEngine = { ... } |
| 60 | +``` |
| 61 | + |
| 62 | +```js |
| 63 | +export const searchEngines = [{ ... }, { ... }] |
| 64 | +``` |
| 65 | + |
| 66 | +If multiple plugin files register the same engine `id`, the file loaded |
| 67 | +later (alphabetically) overrides the earlier one. |
| 68 | + |
| 69 | +## Engine Structure |
| 70 | + |
| 71 | +Every engine must satisfy this shape (TypeScript-style for clarity, but |
| 72 | +plugin files are plain JavaScript): |
| 73 | + |
| 74 | +```ts |
| 75 | +interface SearchEngine { |
| 76 | + id: string; // stable identifier, e.g. 'my-engine' |
| 77 | + name: string; // human readable, shown in the picker |
| 78 | + enable?: boolean; // optional, defaults to true |
| 79 | + search( |
| 80 | + page: Page, // a Puppeteer Page already opened for you |
| 81 | + query: string, // the user's query string |
| 82 | + maxResults: number, // how many results to return at most |
| 83 | + ): Promise<SearchResult[]>; |
| 84 | +} |
| 85 | + |
| 86 | +interface SearchResult { |
| 87 | + title: string; |
| 88 | + url: string; |
| 89 | + snippet: string; |
| 90 | + displayUrl: string; |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +Field description: |
| 95 | + |
| 96 | +- `id`: the value users put into `~/.snow/proxy-config.json`'s |
| 97 | + `searchEngine` field and what the picker stores. Keep it stable. |
| 98 | +- `name`: shown in the proxy config picker. Free-form. |
| 99 | +- `enable` (optional): defaults to `true`. Set to `false` to temporarily |
| 100 | + disable an engine without deleting its file. A disabled engine is invisible |
| 101 | + to `getSearchEngine`, `listSearchEngines`, and the UI picker. |
| 102 | + - Bonus trick: declaring `{id: 'bing', enable: false, search() {}}` in a |
| 103 | + plugin will mask the built-in `bing` engine, because the loader removes |
| 104 | + the same-id entry from the registry when it sees `enable: false`. |
| 105 | +- `search(page, query, maxResults)`: the actual work. Snow CLI: |
| 106 | + |
| 107 | + - launches/connects the browser for you (respects `~/.snow/proxy-config.json`) |
| 108 | + - opens a fresh `Page` and passes it in |
| 109 | + - closes the page after `search()` returns |
| 110 | + |
| 111 | + Your engine should: |
| 112 | + |
| 113 | + - navigate to its own search URL via `page.goto(...)` |
| 114 | + - wait for the DOM to settle |
| 115 | + - extract up to `maxResults` results via `page.evaluate(...)` |
| 116 | + - return them as an array of `SearchResult` |
| 117 | + |
| 118 | + Never call `browser.close()` / `page.close()` yourself — the page is |
| 119 | + owned by the caller. |
| 120 | + |
| 121 | +## Lifecycle and Configuration |
| 122 | + |
| 123 | +1. Drop the plugin file under `~/.snow/plugin/search_engines/`. |
| 124 | +2. Start (or restart) Snow CLI. |
| 125 | +3. Open the proxy configuration screen (`/settings` → Proxy and Browser |
| 126 | + Settings, or the dedicated entry point in your build) — your engine will |
| 127 | + appear in the "Search Engine" picker by its `name`. |
| 128 | +4. Select your engine, save. The choice is persisted in |
| 129 | + `~/.snow/proxy-config.json` as: |
| 130 | + |
| 131 | + ```json |
| 132 | + { |
| 133 | + "enabled": false, |
| 134 | + "port": 7890, |
| 135 | + "searchEngine": "my-engine" |
| 136 | + } |
| 137 | + ``` |
| 138 | + |
| 139 | +5. Any subsequent `web-search` MCP call will use your engine. |
| 140 | + |
| 141 | +## Example: A Minimal Plugin Template |
| 142 | + |
| 143 | +Below is a complete, runnable template that targets a fictional provider |
| 144 | +`example-search.com`. Replace the URL, selectors, and id with the values |
| 145 | +that match your real target. Treat the selectors here as **placeholders** |
| 146 | +— every search page has a different DOM, you must inspect yours. |
| 147 | + |
| 148 | +```js |
| 149 | +// ~/.snow/plugin/search_engines/my-engine.mjs |
| 150 | + |
| 151 | +const cleanText = text => |
| 152 | + (text || '') |
| 153 | + .replace(/\s+/g, ' ') |
| 154 | + .replace(/[\u200B-\u200D\uFEFF]/g, '') |
| 155 | + .trim(); |
| 156 | + |
| 157 | +export default { |
| 158 | + id: 'my-engine', |
| 159 | + name: 'My Search Engine', |
| 160 | + // Set to `false` to temporarily disable this engine without deleting the |
| 161 | + // file. Disabled engines are invisible to the picker and `getSearchEngine`. |
| 162 | + enable: true, |
| 163 | + |
| 164 | + async search(page, query, maxResults) { |
| 165 | + // 1. Build the search URL for your target provider. The example below |
| 166 | + // uses a fictional host purely to illustrate the shape. |
| 167 | + const encodedQuery = encodeURIComponent(query); |
| 168 | + const searchUrl = |
| 169 | + `https://example-search.com/search?q=${encodedQuery}` + |
| 170 | + `&n=${Math.max(maxResults, 10)}`; |
| 171 | + |
| 172 | + // 2. Navigate. Prefer `domcontentloaded` over `networkidle2` because |
| 173 | + // real search pages keep loading telemetry forever. |
| 174 | + try { |
| 175 | + await page.goto(searchUrl, { |
| 176 | + waitUntil: 'domcontentloaded', |
| 177 | + timeout: 30000, |
| 178 | + }); |
| 179 | + } catch { |
| 180 | + // Navigation timeout — try whatever already painted. |
| 181 | + } |
| 182 | + |
| 183 | + // 3. Wait for a representative result selector. Never throw — return |
| 184 | + // an empty list and let the caller fall back. |
| 185 | + try { |
| 186 | + await page.waitForSelector('.results .result-item', {timeout: 10000}); |
| 187 | + } catch { |
| 188 | + // Best effort — extraction may still find something. |
| 189 | + } |
| 190 | + |
| 191 | + // 4. Extract inside the browser context. |
| 192 | + const raw = await page.evaluate(maxLimit => { |
| 193 | + const out = []; |
| 194 | + const items = document.querySelectorAll('.results .result-item'); |
| 195 | + const isHttpUrl = u => /^https?:\/\//i.test(u); |
| 196 | + |
| 197 | + for (const item of items) { |
| 198 | + if (out.length >= maxLimit) break; |
| 199 | + |
| 200 | + // Filter ads if the provider marks them. |
| 201 | + if (item.classList.contains('is-ad')) continue; |
| 202 | + |
| 203 | + const linkEl = item.querySelector('a.result-title'); |
| 204 | + if (!linkEl) continue; |
| 205 | + |
| 206 | + const href = linkEl.getAttribute('href') || ''; |
| 207 | + if (!isHttpUrl(href)) continue; |
| 208 | + |
| 209 | + const title = (linkEl.textContent || '').trim(); |
| 210 | + if (!title) continue; |
| 211 | + |
| 212 | + const snippetEl = item.querySelector('.result-snippet'); |
| 213 | + const snippet = snippetEl ? (snippetEl.textContent || '').trim() : ''; |
| 214 | + |
| 215 | + const citeEl = item.querySelector('cite, .result-host'); |
| 216 | + const displayUrl = citeEl ? (citeEl.textContent || '').trim() : ''; |
| 217 | + |
| 218 | + out.push({title, url: href, snippet, displayUrl}); |
| 219 | + } |
| 220 | + return out; |
| 221 | + }, maxResults); |
| 222 | + |
| 223 | + // 5. Normalize and return. |
| 224 | + return raw.map(r => ({ |
| 225 | + title: cleanText(r.title), |
| 226 | + url: r.url || '', |
| 227 | + snippet: cleanText(r.snippet), |
| 228 | + displayUrl: cleanText(r.displayUrl), |
| 229 | + })); |
| 230 | + }, |
| 231 | +}; |
| 232 | +``` |
| 233 | + |
| 234 | +To adapt this template to a real provider you need to figure out, for each |
| 235 | +provider you target: |
| 236 | + |
| 237 | +- the search URL pattern (often `?q=` or `?wd=` or `?query=`, plus a |
| 238 | + result-count parameter); |
| 239 | +- a stable container selector for organic results; |
| 240 | +- the title / link selector inside each container; |
| 241 | +- the snippet selector; |
| 242 | +- the display-URL / host selector; |
| 243 | +- how the provider marks ads or sponsored results, so you can skip them. |
| 244 | + |
| 245 | +Open the provider's result page in a regular browser, use DevTools to |
| 246 | +inspect the DOM, then plug the selectors into the template above. |
| 247 | + |
| 248 | +## Writing Your Own Engine: Checklist |
| 249 | + |
| 250 | +1. **Pick a stable `id`**. Once users save it into `proxy-config.json`, |
| 251 | + renaming will break their config. |
| 252 | +2. **Open the target search URL with `domcontentloaded`**, not |
| 253 | + `networkidle2`. Most search pages keep loading telemetry scripts forever |
| 254 | + and `networkidle2` will time out before results are usable. |
| 255 | +3. **Wrap `page.goto` in `try/catch`**. A navigation timeout is recoverable |
| 256 | + — the DOM may already contain enough to extract. |
| 257 | +4. **Always use `page.waitForSelector` with a timeout**. Never `throw` if |
| 258 | + it fails; return an empty list and let the caller fall back. |
| 259 | +5. **Extract inside `page.evaluate`**. The callback runs in the browser, so |
| 260 | + you have full DOM access but must `return` only structured-cloneable |
| 261 | + plain objects. |
| 262 | +6. **Filter ads / sponsored results**. Each provider marks them differently |
| 263 | + — check the DOM yourself. |
| 264 | +7. **Normalize text** (`cleanText` helper above) — collapse whitespace and |
| 265 | + strip zero-width characters. |
| 266 | +8. **Never call `browser.close()` or `page.close()`**. The page is owned by |
| 267 | + `WebSearchService`. |
| 268 | +9. **Don't import Node-only modules into `page.evaluate`'s callback** — it |
| 269 | + runs inside the browser. |
| 270 | + |
| 271 | +## Multi-Engine Plugins |
| 272 | + |
| 273 | +You can register multiple engines from a single file: |
| 274 | + |
| 275 | +```js |
| 276 | +export const searchEngines = [ |
| 277 | + {id: 'engine-a', name: 'Engine A', async search(...) { /* ... */ }}, |
| 278 | + {id: 'engine-b', name: 'Engine B', async search(...) { /* ... */ }}, |
| 279 | +]; |
| 280 | +``` |
| 281 | + |
| 282 | +This is convenient for plugins that share a `cleanText` helper or a common |
| 283 | +result-extraction routine. |
| 284 | + |
| 285 | +## Troubleshooting |
| 286 | + |
| 287 | +- **The plugin does not appear in the picker.** |
| 288 | + |
| 289 | + - Make sure the file extension is `.js` / `.mjs` / `.cjs`. |
| 290 | + - Check the Snow CLI startup logs for `[websearch] failed to load search |
| 291 | +engine plugin "..."`. Syntax errors fail loudly. |
| 292 | + - Make sure your export is a plain object with `{id, name, search}` — the |
| 293 | + loader logs `did not export a valid SearchEngine` when validation fails. |
| 294 | + |
| 295 | +- **Search always returns 0 results.** |
| 296 | + |
| 297 | + - The provider probably updated its DOM. Open the page manually in a |
| 298 | + browser and inspect the new selectors. |
| 299 | + - Increase the `page.waitForSelector` timeout. |
| 300 | + - Some providers redirect bot traffic to a captcha page — try setting a |
| 301 | + realistic `User-Agent` via `page.setUserAgent(...)` at the start of |
| 302 | + `search()` (`WebSearchService` already sets one before delegating, but |
| 303 | + you can override). |
| 304 | + |
| 305 | +- **I want to disable a built-in engine.** |
| 306 | + - Create a plugin file with `{id: 'bing', name: 'Bing', enable: false, |
| 307 | +async search() { return []; }}`. The loader will see `enable: false` |
| 308 | + and remove the same-id entry from the registry. |
| 309 | + |
| 310 | +## Related |
| 311 | + |
| 312 | +- [Proxy and Browser Settings](./03.Proxy%20and%20Browser%20Settings.md) |
| 313 | +- [Custom StatusLine Guide](./21.Custom%20StatusLine%20Guide.md) — same |
| 314 | + plugin-loading philosophy applied to the status line |
0 commit comments