You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mark tools as deferred — via the class-level `deferred` DSL or the
per-call `defer: true` kwarg — and RubyLLM forwards `defer_loading: true`
per tool plus the `tool_search_tool_bm25_20251119` primitive to the
Anthropic API. Claude's server-side search loads the tools it actually
needs via `tool_reference` blocks; the new parser promotes those tools
from `chat.tool_catalog` into `chat.tools` so the normal dispatch path
can call them on the next turn. `chat.on_tool_search` exposes which
tools were loaded.
Non-Anthropic providers log a one-time warning and treat `defer:` as a
regular registration; `RubyLLM.config.tool_search_enabled = false` is a
global kill switch with the same behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/_advanced/upgrading.md
+10Lines changed: 10 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,6 +21,16 @@ redirect_from:
21
21
{:toc}
22
22
23
23
---
24
+
# Upgrade to 1.15
25
+
26
+
## How to Upgrade
27
+
28
+
1.15 adds tool search in a fully additive way. No generator, no migration — upgrade the gem and continue using RubyLLM as before.
29
+
30
+
## What's New in 1.15
31
+
32
+
-**Tool Search (Anthropic)** — `RubyLLM::Chat#with_tool` / `#with_tools` accept a new `defer:` keyword argument, and `RubyLLM::Tool` exposes a class-level `deferred` DSL. On Anthropic this translates to the native `defer_loading: true` flag plus the `tool_search_tool_bm25_20251119` primitive: deferred tools stay out of the system-prompt prefix and Claude loads the ones it actually needs server-side. On other providers `defer:` is ignored with a one-time warning. If you don't use `defer:` or `deferred`, nothing changes. See [Tool Search]({% link _core_features/tool-search.md %}).
description: Keep large tool catalogs out of Claude's prompt prefix. Mark tools as deferred and let Anthropic's server-side tool-search primitive load them on demand.
6
+
redirect_from:
7
+
- /guides/tool-search
8
+
---
9
+
10
+
# {{ page.title }}
11
+
{: .d-inline-block .no_toc }
12
+
13
+
New in 1.15
14
+
{: .label .label-green }
15
+
16
+
{{ page.description }}
17
+
{: .fs-6 .fw-300 }
18
+
19
+
## Table of contents
20
+
{: .no_toc .text-delta }
21
+
22
+
1. TOC
23
+
{:toc}
24
+
25
+
---
26
+
27
+
After reading this guide, you will know:
28
+
29
+
* When deferred tool loading helps.
30
+
* How to mark tools as deferred.
31
+
* How Anthropic loads deferred tools at runtime.
32
+
* How to observe which tools the model loaded.
33
+
34
+
## When to use it
35
+
36
+
When a `RubyLLM::Chat` is wired to many tools — especially across one or more MCP servers — every tool's full JSON Schema ships in the system-prompt prefix on every turn. Three real costs follow:
37
+
38
+
1.**Token bloat.** Hundreds of tools can add tens of thousands of tokens per request.
39
+
2.**Prompt-cache eviction.** Adding or removing tools changes the prefix and invalidates the cache.
40
+
3.**Selection accuracy.** Models choose worse tools when the menu is long.
41
+
42
+
This translates Anthropic's [tool search tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool) feature: mark tools as `deferred` and RubyLLM forwards `defer_loading: true` to Anthropic's API, which hides the schemas from Claude until a server-side BM25 primitive loads the tools the conversation actually needs.
43
+
44
+
**This feature currently only supports Anthropic.** On other providers, `defer: true` is silently coerced to regular registration (a warning is logged once).
45
+
46
+
## Marking tools as deferred
47
+
48
+
### Per-class DSL
49
+
50
+
```ruby
51
+
classDeepResearchTool < RubyLLM::Tool
52
+
description "Runs a multi-step web search..."
53
+
deferred # class-level DSL
54
+
55
+
param :query, desc:"..."
56
+
defexecute(query:); ...; end
57
+
end
58
+
```
59
+
60
+
### Per-call, for bulk registration (MCP case)
61
+
62
+
```ruby
63
+
chat =RubyLLM.chat(model:"claude-sonnet-4-6")
64
+
chat.with_tools(*mcp_client.tools, defer:true)
65
+
```
66
+
67
+
Per-call `defer: true` overrides a non-deferred class; `defer: false` overrides a `deferred` class.
68
+
69
+
## How Claude loads deferred tools
70
+
71
+
On Anthropic, `defer: true` translates to two things in the request payload:
72
+
73
+
1.`defer_loading: true` on each deferred tool's function entry.
74
+
2. A `tool_search_tool_bm25_20251119` primitive appended to the tools array.
75
+
76
+
Claude then runs the search server-side, loads the matching tools via a `tool_reference` mechanism, and calls them directly. RubyLLM parses the `tool_search_tool_result` blocks and moves the referenced tools from `chat.tool_catalog.deferred_tools` into the active `chat.tools` so the next turn can dispatch them normally.
77
+
78
+
## Observing what was loaded
79
+
80
+
```ruby
81
+
chat.on_tool_search do |event|
82
+
# event.query # nil for Anthropic-native — Claude runs the search server-side
83
+
# event.results # Array of promoted tool name Symbols
chat.tool_catalog.deferred_tools # Hash of deferred tool name => Tool
93
+
chat.tool_catalog.loaded_tools # Set of promoted tool name symbols
94
+
```
95
+
96
+
## Kill switch
97
+
98
+
```ruby
99
+
RubyLLM.configure do |c|
100
+
c.tool_search_enabled =false# default true
101
+
end
102
+
```
103
+
104
+
When false, `defer: true` is coerced to regular registration and a warning is logged once per chat.
105
+
106
+
## Non-Anthropic providers
107
+
108
+
On OpenAI, Gemini, and Bedrock, `defer: true` is ignored and a warning is logged once — the tool registers normally. A follow-up release may add client-side emulation for these providers.
Copy file name to clipboardExpand all lines: docs/_core_features/tools.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -509,6 +509,8 @@ end
509
509
510
510
For MCP server integration, check out the community-maintained [`ruby_llm-mcp`](https://github.com/patvice/ruby_llm-mcp) gem.
511
511
512
+
When a chat is wired to many tools — especially across MCP servers — see [Tool Search]({% link _core_features/tool-search.md %}) for how to defer tool schemas and let the model load only the ones it needs.
513
+
512
514
## Debugging Tools
513
515
514
516
Set the `RUBYLLM_DEBUG` environment variable to see detailed logging, including tool calls and results.
0 commit comments