|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: Tool Search |
| 4 | +nav_order: 9 |
| 5 | +description: Scale to hundreds of tools without blowing up your token budget. Defer tool schemas and let the model search for what it needs. |
| 6 | +redirect_from: |
| 7 | + - /guides/tool-search |
| 8 | +--- |
| 9 | + |
| 10 | +# {{ page.title }} |
| 11 | +{: .d-inline-block .no_toc } |
| 12 | + |
| 13 | +New in 1.15 |
| 14 | +{: .label .label-green } |
| 15 | + |
| 16 | +{{ page.description }} |
| 17 | +{: .fs-6 .fw-300 } |
| 18 | + |
| 19 | +## Table of contents |
| 20 | +{: .no_toc .text-delta } |
| 21 | + |
| 22 | +1. TOC |
| 23 | +{:toc} |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +After reading this guide, you will know: |
| 28 | + |
| 29 | +* When tool search helps (and when it doesn't). |
| 30 | +* How to mark tools as deferred. |
| 31 | +* How the model discovers and loads deferred tools at runtime. |
| 32 | +* How providers differ: Anthropic's native path vs. client-side emulation. |
| 33 | +* How to plug in a custom search function (e.g. embeddings). |
| 34 | + |
| 35 | +## When to use it |
| 36 | + |
| 37 | +When a `RubyLLM::Chat` is wired to many tools — especially across one or more MCP servers — every tool's full JSON Schema ships in the system-prompt prefix on every turn. Three real costs follow: |
| 38 | + |
| 39 | +1. **Token bloat.** Hundreds of tools can add tens of thousands of tokens per request. |
| 40 | +2. **Prompt-cache eviction.** Adding or removing tools changes the prefix and invalidates the cache. |
| 41 | +3. **Selection accuracy.** Models choose worse tools when the menu is long. |
| 42 | + |
| 43 | +Tool search solves this by withholding the schemas of "deferred" tools from the prompt prefix and exposing a built-in `search_tools` function the model can call to load the schemas it actually needs. |
| 44 | + |
| 45 | +Reach for it when: |
| 46 | + |
| 47 | +* You use MCP servers with more than ~20 tools. |
| 48 | +* You have a handful of rarely-needed heavy tools (deep-research, large schemas) bloating your prefix. |
| 49 | +* You run on Anthropic and want to preserve prompt cache across long conversations. |
| 50 | + |
| 51 | +Skip it when: |
| 52 | + |
| 53 | +* You have fewer than a dozen tools and no prompt-cache concerns. |
| 54 | +* All your tools are always relevant to every turn. |
| 55 | + |
| 56 | +## Marking tools deferred |
| 57 | + |
| 58 | +### Per-call (best for MCP bulk registration) |
| 59 | + |
| 60 | +```ruby |
| 61 | +chat = RubyLLM.chat(model: "claude-sonnet-4-6") |
| 62 | +chat.with_tools(*mcp_client.tools, defer: true) |
| 63 | +``` |
| 64 | + |
| 65 | +### Per-class (best for tools that are intrinsically heavy) |
| 66 | + |
| 67 | +```ruby |
| 68 | +class DeepResearchTool < RubyLLM::Tool |
| 69 | + description "Runs a multi-step web research task." |
| 70 | + deferred |
| 71 | + param :query, desc: "Research question" |
| 72 | + def execute(query:) |
| 73 | + # ... |
| 74 | + end |
| 75 | +end |
| 76 | + |
| 77 | +chat.with_tool(DeepResearchTool) |
| 78 | +``` |
| 79 | + |
| 80 | +The per-call value wins when both are specified, so you can override: |
| 81 | + |
| 82 | +```ruby |
| 83 | +chat.with_tool(DeepResearchTool, defer: false) # force it onto the active list |
| 84 | +``` |
| 85 | + |
| 86 | +## How the model discovers deferred tools |
| 87 | + |
| 88 | +When the catalog contains at least one deferred tool, `RubyLLM::Chat` automatically adds a built-in `search_tools` function to the active tool list. The model sees a compact description like: |
| 89 | + |
| 90 | +> **search_tools** — Search and load deferred tools by keyword, or load specific tools by name using `select:Name1,Name2`. |
| 91 | +
|
| 92 | +The typical flow: |
| 93 | + |
| 94 | +``` |
| 95 | +user : "Delete the stale S3 bucket" |
| 96 | +assistant : tool_call(search_tools, query: "delete s3 bucket") |
| 97 | +tool : { loaded: [:s3_delete_bucket], descriptions: {...} } |
| 98 | +assistant : tool_call(s3_delete_bucket, bucket: "stale-bucket-2023") |
| 99 | +tool : "deleted" |
| 100 | +assistant : "Done — stale-bucket-2023 has been deleted." |
| 101 | +``` |
| 102 | + |
| 103 | +Once a tool is promoted by `search_tools`, it stays active for the rest of the conversation. |
| 104 | + |
| 105 | +## The `select:` shortcut |
| 106 | + |
| 107 | +For deterministic loading — "when event X happens, load tools Y and Z" — prefix the query with `select:` and list the names exactly. No ranking, no LLM reasoning required: |
| 108 | + |
| 109 | +```ruby |
| 110 | +chat.tools[:search_tools].execute(query: "select:s3_delete_bucket,s3_list_buckets") |
| 111 | +``` |
| 112 | + |
| 113 | +You can also drive `select:` entirely from application code when you know which tools a task needs, skipping the discovery turn. |
| 114 | + |
| 115 | +## Anthropic: native server-side search |
| 116 | + |
| 117 | +On Anthropic models, ruby_llm forwards `defer_loading: true` to the API on every deferred tool and appends the native `tool_search_tool_bm25_20251119` server-side search primitive. The practical win: deferred tool schemas never enter the cached system-prompt prefix, so prompt caching stays intact across turns. |
| 118 | + |
| 119 | +## Other providers: client-side emulation |
| 120 | + |
| 121 | +On OpenAI, Gemini, Bedrock, and providers that inherit from them (Azure, DeepSeek, Mistral, OpenRouter, Perplexity, xAI, Ollama, GPUStack, Vertex AI), ruby_llm emits deferred tools as name-plus-description stubs with an empty parameters schema. The model cannot invoke a stub directly — it must call `search_tools` first to load the full schema, at which point the tool becomes callable normally. |
| 122 | + |
| 123 | +The default client-side ranker is a pure-Ruby BM25 implementation over `"#{name} #{description}"`. No external gem, no embedding infrastructure. |
| 124 | + |
| 125 | +## Custom search |
| 126 | + |
| 127 | +If BM25 over name + description is not enough — for example, you want to rank by your own tool metadata or use embeddings — supply your own ranker: |
| 128 | + |
| 129 | +```ruby |
| 130 | +# Per-chat |
| 131 | +chat.with_tool_search do |query, candidates, max:| |
| 132 | + MyEmbeddingIndex.rank(candidates, query, k: max) # returns Array<Symbol> |
| 133 | +end |
| 134 | + |
| 135 | +# Or, globally, once per process |
| 136 | +RubyLLM.configure do |c| |
| 137 | + c.tool_search_function = lambda do |query, candidates, max:| |
| 138 | + MyEmbeddingIndex.rank(candidates, query, k: max) |
| 139 | + end |
| 140 | +end |
| 141 | +``` |
| 142 | + |
| 143 | +The block receives the query string, the hash of candidate tools (keyed by name symbol), and `max:`. It must return an array of tool-name symbols ordered by relevance. |
| 144 | + |
| 145 | +## Kill switch |
| 146 | + |
| 147 | +Tool search is on by default. To disable it globally — in which case `defer: true` and the `deferred` DSL become no-ops and a one-time warning is logged — set: |
| 148 | + |
| 149 | +```ruby |
| 150 | +RubyLLM.configure { |c| c.tool_search_enabled = false } |
| 151 | +``` |
| 152 | + |
| 153 | +## Observing activation |
| 154 | + |
| 155 | +```ruby |
| 156 | +chat.on_tool_search do |query:, results:| |
| 157 | + Rails.logger.info("tool_search: #{query} → #{results.join(', ')}") |
| 158 | +end |
| 159 | + |
| 160 | +chat.tool_catalog.loaded_tools # => #<Set: {:s3_delete_bucket}> |
| 161 | +chat.tool_catalog.deferred_tools.keys # all tools still hidden |
| 162 | +``` |
| 163 | + |
| 164 | +## Configuration reference |
| 165 | + |
| 166 | +| Location | Setting | Default | Purpose | |
| 167 | +|---|---|---|---| |
| 168 | +| `RubyLLM.config` | `tool_search_enabled` | `true` | Global on/off kill switch | |
| 169 | +| `RubyLLM.config` | `tool_search_function` | `nil` (uses BM25) | Global default ranker | |
| 170 | +| `RubyLLM::Tool` | `deferred` (class DSL) | not set | Marks every instance deferred by default | |
| 171 | +| `RubyLLM::Chat#with_tool` | `defer:` kwarg | `nil` (inherits class) | Per-registration override | |
| 172 | +| `RubyLLM::Chat#with_tools` | `defer:` kwarg | `nil` | Bulk per-registration override | |
| 173 | +| `RubyLLM::Chat#with_tool_search` | block | — | Per-chat ranker | |
| 174 | +| `RubyLLM::Chat#on_tool_search` | block | — | Activation callback | |
| 175 | +| `RubyLLM::Chat#tool_catalog` | reader | — | Inspect deferred / loaded sets | |
| 176 | + |
| 177 | +## Further reading |
| 178 | + |
| 179 | +* [Tools guide]({% link _core_features/tools.md %}) |
| 180 | +* [Agents guide]({% link _core_features/agents.md %}) |
| 181 | +* [Anthropic tool search tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool) |
0 commit comments