Skip to content

Commit 635503b

Browse files
swistaczekclaude
andcommitted
Translate Anthropic tool search feature (defer_loading + native BM25)
Mark tools as deferred — via the class-level `deferred` DSL or the per-call `defer: true` kwarg — and RubyLLM forwards `defer_loading: true` per tool plus the `tool_search_tool_bm25_20251119` primitive to the Anthropic API. Claude's server-side search loads the tools it actually needs via `tool_reference` blocks; the new parser promotes those tools from `chat.tool_catalog` into `chat.tools` so the normal dispatch path can call them on the next turn. `chat.on_tool_search` exposes which tools were loaded. Non-Anthropic providers log a one-time warning and treat `defer:` as a regular registration; `RubyLLM.config.tool_search_enabled = false` is a global kill switch with the same behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4371a1b commit 635503b

23 files changed

Lines changed: 1184 additions & 20 deletions

docs/_advanced/upgrading.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,16 @@ redirect_from:
2121
{:toc}
2222

2323
---
24+
# Upgrade to 1.15
25+
26+
## How to Upgrade
27+
28+
1.15 adds tool search in a fully additive way. No generator, no migration — upgrade the gem and continue using RubyLLM as before.
29+
30+
## What's New in 1.15
31+
32+
- **Tool Search (Anthropic)**`RubyLLM::Chat#with_tool` / `#with_tools` accept a new `defer:` keyword argument, and `RubyLLM::Tool` exposes a class-level `deferred` DSL. On Anthropic this translates to the native `defer_loading: true` flag plus the `tool_search_tool_bm25_20251119` primitive: deferred tools stay out of the system-prompt prefix and Claude loads the ones it actually needs server-side. On other providers `defer:` is ignored with a one-time warning. If you don't use `defer:` or `deferred`, nothing changes. See [Tool Search]({% link _core_features/tool-search.md %}).
33+
2434
# Upgrade to 1.14
2535

2636
## How to Upgrade

docs/_core_features/tool-search.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
layout: default
3+
title: Tool Search
4+
nav_order: 9
5+
description: Keep large tool catalogs out of Claude's prompt prefix. Mark tools as deferred and let Anthropic's server-side tool-search primitive load them on demand.
6+
redirect_from:
7+
- /guides/tool-search
8+
---
9+
10+
# {{ page.title }}
11+
{: .d-inline-block .no_toc }
12+
13+
New in 1.15
14+
{: .label .label-green }
15+
16+
{{ page.description }}
17+
{: .fs-6 .fw-300 }
18+
19+
## Table of contents
20+
{: .no_toc .text-delta }
21+
22+
1. TOC
23+
{:toc}
24+
25+
---
26+
27+
After reading this guide, you will know:
28+
29+
* When deferred tool loading helps.
30+
* How to mark tools as deferred.
31+
* How Anthropic loads deferred tools at runtime.
32+
* How to observe which tools the model loaded.
33+
34+
## When to use it
35+
36+
When a `RubyLLM::Chat` is wired to many tools — especially across one or more MCP servers — every tool's full JSON Schema ships in the system-prompt prefix on every turn. Three real costs follow:
37+
38+
1. **Token bloat.** Hundreds of tools can add tens of thousands of tokens per request.
39+
2. **Prompt-cache eviction.** Adding or removing tools changes the prefix and invalidates the cache.
40+
3. **Selection accuracy.** Models choose worse tools when the menu is long.
41+
42+
This translates Anthropic's [tool search tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool) feature: mark tools as `deferred` and RubyLLM forwards `defer_loading: true` to Anthropic's API, which hides the schemas from Claude until a server-side BM25 primitive loads the tools the conversation actually needs.
43+
44+
**This feature currently only supports Anthropic.** On other providers, `defer: true` is silently coerced to regular registration (a warning is logged once).
45+
46+
## Marking tools as deferred
47+
48+
### Per-class DSL
49+
50+
```ruby
51+
class DeepResearchTool < RubyLLM::Tool
52+
description "Runs a multi-step web search..."
53+
deferred # class-level DSL
54+
55+
param :query, desc: "..."
56+
def execute(query:); ...; end
57+
end
58+
```
59+
60+
### Per-call, for bulk registration (MCP case)
61+
62+
```ruby
63+
chat = RubyLLM.chat(model: "claude-sonnet-4-6")
64+
chat.with_tools(*mcp_client.tools, defer: true)
65+
```
66+
67+
Per-call `defer: true` overrides a non-deferred class; `defer: false` overrides a `deferred` class.
68+
69+
## How Claude loads deferred tools
70+
71+
On Anthropic, `defer: true` translates to two things in the request payload:
72+
73+
1. `defer_loading: true` on each deferred tool's function entry.
74+
2. A `tool_search_tool_bm25_20251119` primitive appended to the tools array.
75+
76+
Claude then runs the search server-side, loads the matching tools via a `tool_reference` mechanism, and calls them directly. RubyLLM parses the `tool_search_tool_result` blocks and moves the referenced tools from `chat.tool_catalog.deferred_tools` into the active `chat.tools` so the next turn can dispatch them normally.
77+
78+
## Observing what was loaded
79+
80+
```ruby
81+
chat.on_tool_search do |event|
82+
# event.query # nil for Anthropic-native — Claude runs the search server-side
83+
# event.results # Array of promoted tool name Symbols
84+
Rails.logger.info("tool_search loaded: #{event.results}")
85+
end
86+
```
87+
88+
Inspect state:
89+
90+
```ruby
91+
chat.tool_catalog # => #<RubyLLM::ToolCatalog deferred=42 loaded=3>
92+
chat.tool_catalog.deferred_tools # Hash of deferred tool name => Tool
93+
chat.tool_catalog.loaded_tools # Set of promoted tool name symbols
94+
```
95+
96+
## Kill switch
97+
98+
```ruby
99+
RubyLLM.configure do |c|
100+
c.tool_search_enabled = false # default true
101+
end
102+
```
103+
104+
When false, `defer: true` is coerced to regular registration and a warning is logged once per chat.
105+
106+
## Non-Anthropic providers
107+
108+
On OpenAI, Gemini, and Bedrock, `defer: true` is ignored and a warning is logged once — the tool registers normally. A follow-up release may add client-side emulation for these providers.
109+
110+
## Further reading
111+
112+
* [Anthropic tool search tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool)
113+
* [Tools guide]({% link _core_features/tools.md %})

docs/_core_features/tools.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -509,6 +509,8 @@ end
509509

510510
For MCP server integration, check out the community-maintained [`ruby_llm-mcp`](https://github.com/patvice/ruby_llm-mcp) gem.
511511

512+
When a chat is wired to many tools — especially across MCP servers — see [Tool Search]({% link _core_features/tool-search.md %}) for how to defer tool schemas and let the model load only the ones it needs.
513+
512514
## Debugging Tools
513515

514516
Set the `RUBYLLM_DEBUG` environment variable to see detailed logging, including tool calls and results.

lib/ruby_llm/chat.rb

Lines changed: 80 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ module RubyLLM
55
class Chat
66
include Enumerable
77

8-
attr_reader :model, :messages, :tools, :tool_prefs, :params, :headers, :schema
8+
attr_reader :model, :messages, :tools, :tool_prefs, :params, :headers, :schema, :tool_catalog
99

1010
def initialize(model: nil, provider: nil, assume_model_exists: false, context: nil)
1111
if assume_model_exists && !provider
@@ -19,6 +19,7 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
1919
@temperature = nil
2020
@messages = []
2121
@tools = {}
22+
@tool_catalog = ToolCatalog.new
2223
@tool_prefs = { choice: nil, calls: nil }
2324
@params = {}
2425
@headers = {}
@@ -28,7 +29,8 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
2829
new_message: nil,
2930
end_message: nil,
3031
tool_call: nil,
31-
tool_result: nil
32+
tool_result: nil,
33+
tool_search: nil
3234
}
3335
end
3436

@@ -51,18 +53,18 @@ def with_instructions(instructions, append: false, replace: nil)
5153
self
5254
end
5355

54-
def with_tool(tool, choice: nil, calls: nil)
55-
unless tool.nil?
56-
tool_instance = tool.is_a?(Class) ? tool.new : tool
57-
@tools[tool_instance.name.to_sym] = tool_instance
58-
end
56+
def with_tool(tool, defer: nil, choice: nil, calls: nil)
57+
register_tool(tool, defer: defer) unless tool.nil?
5958
update_tool_options(choice:, calls:)
6059
self
6160
end
6261

63-
def with_tools(*tools, replace: false, choice: nil, calls: nil)
64-
@tools.clear if replace
65-
tools.compact.each { |tool| with_tool tool }
62+
def with_tools(*tools, replace: false, defer: nil, choice: nil, calls: nil)
63+
if replace
64+
@tools.clear
65+
@tool_catalog = ToolCatalog.new
66+
end
67+
tools.compact.each { |tool| with_tool tool, defer: defer }
6668
update_tool_options(choice:, calls:)
6769
self
6870
end
@@ -132,14 +134,19 @@ def on_tool_result(&block)
132134
self
133135
end
134136

137+
def on_tool_search(&block)
138+
@on[:tool_search] = block
139+
self
140+
end
141+
135142
def each(&)
136143
messages.each(&)
137144
end
138145

139146
def complete(&) # rubocop:disable Metrics/PerceivedComplexity
140147
response = @provider.complete(
141148
messages,
142-
tools: @tools,
149+
tools: effective_tools,
143150
tool_prefs: @tool_prefs,
144151
temperature: @temperature,
145152
model: @model,
@@ -161,6 +168,7 @@ def complete(&) # rubocop:disable Metrics/PerceivedComplexity
161168
end
162169

163170
add_message response
171+
promote_from_tool_references(response)
164172
@on[:end_message]&.call(response)
165173

166174
if response.tool_call?
@@ -176,6 +184,25 @@ def add_message(message_or_attributes)
176184
message
177185
end
178186

187+
# Promotes deferred tools that a provider's native tool-search primitive
188+
# loaded via +message.tool_references+. The resulting +SearchEvent+
189+
# carries +query: nil+ to signal the native path.
190+
def promote_from_tool_references(message)
191+
names = Array(message.tool_references)
192+
return self if names.empty? || @tool_catalog.empty?
193+
194+
promoted = names.filter_map do |name|
195+
tool = @tool_catalog.promote(name)
196+
next unless tool
197+
198+
@tools[tool.name.to_sym] = tool
199+
tool.name.to_sym
200+
end
201+
202+
@on[:tool_search]&.call(Tool::SearchEvent.new(nil, promoted)) unless promoted.empty?
203+
self
204+
end
205+
179206
def reset_messages!
180207
@messages.clear
181208
end
@@ -329,6 +356,48 @@ def content_like?(object)
329356
object.is_a?(Content) || object.is_a?(Content::Raw)
330357
end
331358

359+
def effective_tools
360+
active = @tools.transform_values { |t| Tool::Registration.new(t, deferred: false) }
361+
return active if @tool_catalog.empty?
362+
363+
deferred = @tool_catalog.available.transform_values { |t| Tool::Registration.new(t, deferred: true) }
364+
deferred.merge(active)
365+
end
366+
367+
def register_tool(tool, defer:)
368+
tool_instance = tool.is_a?(Class) ? tool.new : tool
369+
370+
if defer_allowed?(tool_instance, defer)
371+
@tool_catalog.add(tool_instance)
372+
else
373+
@tools[tool_instance.name.to_sym] = tool_instance
374+
end
375+
end
376+
377+
def defer_allowed?(tool, explicit)
378+
return false unless explicit.nil? ? tool.deferred? : explicit == true
379+
380+
unless @config.tool_search_enabled
381+
warn_deferred_ignored('tool_search_enabled is false')
382+
return false
383+
end
384+
385+
unless @provider.respond_to?(:supports_deferred_loading?) && @provider.supports_deferred_loading?
386+
warn_deferred_ignored("provider #{@provider.slug} does not support deferred tool loading")
387+
return false
388+
end
389+
390+
true
391+
end
392+
393+
def warn_deferred_ignored(reason)
394+
@deferred_warnings ||= Set.new
395+
return if @deferred_warnings.include?(reason)
396+
397+
@deferred_warnings << reason
398+
RubyLLM.logger.warn("Ignoring defer: true — #{reason}")
399+
end
400+
332401
def append_system_instruction(instructions)
333402
system_messages, non_system_messages = @messages.partition { |msg| msg.role == :system }
334403
system_messages << Message.new(role: :system, content: instructions)

lib/ruby_llm/configuration.rb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@ def defaults = @defaults ||= {}
5656
option :log_stream_debug, -> { ENV['RUBYLLM_STREAM_DEBUG'] == 'true' }
5757
option :log_regexp_timeout, -> { Regexp.respond_to?(:timeout) ? (Regexp.timeout || 1.0) : nil }
5858

59+
option :tool_search_enabled, true
60+
5961
def initialize
6062
self.class.send(:defaults).each do |key, default|
6163
value = default.respond_to?(:call) ? instance_exec(&default) : default

lib/ruby_llm/message.rb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ module RubyLLM
55
class Message
66
ROLES = %i[system user assistant tool].freeze
77

8-
attr_reader :role, :model_id, :tool_calls, :tool_call_id, :raw, :thinking, :tokens
8+
attr_reader :role, :model_id, :tool_calls, :tool_call_id, :raw, :thinking, :tokens, :tool_references
99
attr_writer :content
1010

1111
def initialize(options = {})
@@ -24,6 +24,7 @@ def initialize(options = {})
2424
)
2525
@raw = options[:raw]
2626
@thinking = options[:thinking]
27+
@tool_references = Array(options[:tool_references])
2728

2829
ensure_valid_role
2930
end

lib/ruby_llm/providers/anthropic.rb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ def headers
2222
}
2323
end
2424

25+
def supports_deferred_loading?
26+
true
27+
end
28+
2529
class << self
2630
def capabilities
2731
Anthropic::Capabilities

lib/ruby_llm/providers/anthropic/chat.rb

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ def build_base_payload(chat_messages, model, stream, thinking)
6565

6666
def add_optional_fields(payload, system_content:, tools:, tool_prefs:, temperature:, schema: nil) # rubocop:disable Metrics/ParameterLists
6767
if tools.any?
68-
payload[:tools] = tools.values.map { |t| Tools.function_for(t) }
68+
payload[:tools] = Tools.format_tools(tools)
6969
unless tool_prefs[:choice].nil? && tool_prefs[:calls].nil?
7070
payload[:tool_choice] = Tools.build_tool_choice(tool_prefs)
7171
end
@@ -90,8 +90,10 @@ def parse_completion_response(response)
9090
thinking_content = extract_thinking_content(content_blocks)
9191
thinking_signature = extract_thinking_signature(content_blocks)
9292
tool_use_blocks = Tools.find_tool_uses(content_blocks)
93+
tool_references = Tools.find_tool_references(content_blocks)
9394

94-
build_message(data, text_content, thinking_content, thinking_signature, tool_use_blocks, response)
95+
build_message(data, text_content, thinking_content, thinking_signature, tool_use_blocks, tool_references,
96+
response)
9597
end
9698

9799
def extract_text_content(blocks)
@@ -111,7 +113,7 @@ def extract_thinking_signature(blocks)
111113
thinking_block&.dig('signature') || thinking_block&.dig('data')
112114
end
113115

114-
def build_message(data, content, thinking, thinking_signature, tool_use_blocks, response) # rubocop:disable Metrics/ParameterLists
116+
def build_message(data, content, thinking, thinking_signature, tool_use_blocks, tool_references, response) # rubocop:disable Metrics/ParameterLists
115117
usage = data['usage'] || {}
116118
cached_tokens = usage['cache_read_input_tokens']
117119
cache_creation_tokens = usage['cache_creation_input_tokens']
@@ -128,6 +130,7 @@ def build_message(data, content, thinking, thinking_signature, tool_use_blocks,
128130
content: content,
129131
thinking: Thinking.build(text: thinking, signature: thinking_signature),
130132
tool_calls: Tools.parse_tool_calls(tool_use_blocks),
133+
tool_references: tool_references,
131134
input_tokens: usage['input_tokens'],
132135
output_tokens: usage['output_tokens'],
133136
cached_tokens: cached_tokens,

0 commit comments

Comments
 (0)