Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/_advanced/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,16 @@ redirect_from:
{:toc}

---
# Upgrade to 1.15

## How to Upgrade

1.15 adds tool search in a fully additive way. No generator, no migration — upgrade the gem and continue using RubyLLM as before.

## What's New in 1.15

- **Tool Search (Anthropic)** — `RubyLLM::Chat#with_tool` / `#with_tools` accept a new `defer:` keyword argument, and `RubyLLM::Tool` exposes a class-level `deferred` DSL. On Anthropic this translates to the native `defer_loading: true` flag plus the `tool_search_tool_bm25_20251119` primitive: deferred tools stay out of the system-prompt prefix and Claude loads the ones it actually needs server-side. On other providers `defer:` is ignored with a one-time warning. If you don't use `defer:` or `deferred`, nothing changes. See [Tool Search]({% link _core_features/tool-search.md %}).

# Upgrade to 1.14

## How to Upgrade
Expand Down
113 changes: 113 additions & 0 deletions docs/_core_features/tool-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
layout: default
title: Tool Search
nav_order: 9
description: Keep large tool catalogs out of Claude's prompt prefix. Mark tools as deferred and let Anthropic's server-side tool-search primitive load them on demand.
redirect_from:
- /guides/tool-search
---

# {{ page.title }}
{: .d-inline-block .no_toc }

New in 1.15
{: .label .label-green }

{{ page.description }}
{: .fs-6 .fw-300 }

## Table of contents
{: .no_toc .text-delta }

1. TOC
{:toc}

---

After reading this guide, you will know:

* When deferred tool loading helps.
* How to mark tools as deferred.
* How Anthropic loads deferred tools at runtime.
* How to observe which tools the model loaded.

## When to use it

When a `RubyLLM::Chat` is wired to many tools — especially across one or more MCP servers — every tool's full JSON Schema ships in the system-prompt prefix on every turn. Three real costs follow:

1. **Token bloat.** Hundreds of tools can add tens of thousands of tokens per request.
2. **Prompt-cache eviction.** Adding or removing tools changes the prefix and invalidates the cache.
3. **Selection accuracy.** Models choose worse tools when the menu is long.

This translates Anthropic's [tool search tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool) feature: mark tools as `deferred` and RubyLLM forwards `defer_loading: true` to Anthropic's API, which hides the schemas from Claude until a server-side BM25 primitive loads the tools the conversation actually needs.

**This feature currently only supports Anthropic.** On other providers, `defer: true` is silently coerced to regular registration (a warning is logged once).

## Marking tools as deferred

### Per-class DSL

```ruby
class DeepResearchTool < RubyLLM::Tool
description "Runs a multi-step web search..."
deferred # class-level DSL

param :query, desc: "..."
def execute(query:); ...; end
end
```

### Per-call, for bulk registration (MCP case)

```ruby
chat = RubyLLM.chat(model: "claude-sonnet-4-6")
chat.with_tools(*mcp_client.tools, defer: true)
```

Per-call `defer: true` overrides a non-deferred class; `defer: false` overrides a `deferred` class.

## How Claude loads deferred tools

On Anthropic, `defer: true` translates to two things in the request payload:

1. `defer_loading: true` on each deferred tool's function entry.
2. A `tool_search_tool_bm25_20251119` primitive appended to the tools array.

Claude then runs the search server-side, loads the matching tools via a `tool_reference` mechanism, and calls them directly. RubyLLM parses the `tool_search_tool_result` blocks and moves the referenced tools from `chat.tool_catalog.deferred_tools` into the active `chat.tools` so the next turn can dispatch them normally.

## Observing what was loaded

```ruby
chat.on_tool_search do |event|
# event.query # nil for Anthropic-native — Claude runs the search server-side
# event.results # Array of promoted tool name Symbols
Rails.logger.info("tool_search loaded: #{event.results}")
end
```

Inspect state:

```ruby
chat.tool_catalog # => #<RubyLLM::ToolCatalog deferred=42 loaded=3>
chat.tool_catalog.deferred_tools # Hash of deferred tool name => Tool
chat.tool_catalog.loaded_tools # Set of promoted tool name symbols
```

## Kill switch

```ruby
RubyLLM.configure do |c|
c.tool_search_enabled = false # default true
end
```

When false, `defer: true` is coerced to regular registration and a warning is logged once per chat.

## Non-Anthropic providers

On OpenAI, Gemini, and Bedrock, `defer: true` is ignored and a warning is logged once — the tool registers normally. A follow-up release may add client-side emulation for these providers.

## Further reading

* [Anthropic tool search tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool)
* [Tools guide]({% link _core_features/tools.md %})
2 changes: 2 additions & 0 deletions docs/_core_features/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -509,6 +509,8 @@ end

For MCP server integration, check out the community-maintained [`ruby_llm-mcp`](https://github.com/patvice/ruby_llm-mcp) gem.

When a chat is wired to many tools — especially across MCP servers — see [Tool Search]({% link _core_features/tool-search.md %}) for how to defer tool schemas and let the model load only the ones it needs.

## Debugging Tools

Set the `RUBYLLM_DEBUG` environment variable to see detailed logging, including tool calls and results.
Expand Down
91 changes: 80 additions & 11 deletions lib/ruby_llm/chat.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ module RubyLLM
class Chat
include Enumerable

attr_reader :model, :messages, :tools, :tool_prefs, :params, :headers, :schema
attr_reader :model, :messages, :tools, :tool_prefs, :params, :headers, :schema, :tool_catalog

def initialize(model: nil, provider: nil, assume_model_exists: false, context: nil)
if assume_model_exists && !provider
Expand All @@ -19,6 +19,7 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
@temperature = nil
@messages = []
@tools = {}
@tool_catalog = ToolCatalog.new
@tool_prefs = { choice: nil, calls: nil }
@params = {}
@headers = {}
Expand All @@ -28,7 +29,8 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
new_message: nil,
end_message: nil,
tool_call: nil,
tool_result: nil
tool_result: nil,
tool_search: nil
}
end

Expand All @@ -51,18 +53,18 @@ def with_instructions(instructions, append: false, replace: nil)
self
end

def with_tool(tool, choice: nil, calls: nil)
unless tool.nil?
tool_instance = tool.is_a?(Class) ? tool.new : tool
@tools[tool_instance.name.to_sym] = tool_instance
end
def with_tool(tool, defer: nil, choice: nil, calls: nil)
register_tool(tool, defer: defer) unless tool.nil?
update_tool_options(choice:, calls:)
self
end

def with_tools(*tools, replace: false, choice: nil, calls: nil)
@tools.clear if replace
tools.compact.each { |tool| with_tool tool }
def with_tools(*tools, replace: false, defer: nil, choice: nil, calls: nil)
if replace
@tools.clear
@tool_catalog = ToolCatalog.new
end
tools.compact.each { |tool| with_tool tool, defer: defer }
update_tool_options(choice:, calls:)
self
end
Expand Down Expand Up @@ -132,14 +134,19 @@ def on_tool_result(&block)
self
end

def on_tool_search(&block)
@on[:tool_search] = block
self
end

def each(&)
messages.each(&)
end

def complete(&) # rubocop:disable Metrics/PerceivedComplexity
response = @provider.complete(
messages,
tools: @tools,
tools: effective_tools,
tool_prefs: @tool_prefs,
temperature: @temperature,
model: @model,
Expand All @@ -161,6 +168,7 @@ def complete(&) # rubocop:disable Metrics/PerceivedComplexity
end

add_message response
promote_from_tool_references(response)
@on[:end_message]&.call(response)

if response.tool_call?
Expand All @@ -186,6 +194,25 @@ def instance_variables

private

# Promotes deferred tools that a provider's native tool-search primitive
# loaded via +message.tool_references+. The resulting +SearchEvent+
# carries +query: nil+ to signal the native path.
def promote_from_tool_references(message)
names = Array(message.tool_references)
return self if names.empty? || @tool_catalog.empty?

promoted = names.filter_map do |name|
tool = @tool_catalog.promote(name)
next unless tool

@tools[tool.name.to_sym] = tool
tool.name.to_sym
end

@on[:tool_search]&.call(Tool::SearchEvent.new(nil, promoted)) unless promoted.empty?
self
end

def normalize_schema_payload(raw_schema)
return nil if raw_schema.nil?
return raw_schema unless raw_schema.is_a?(Hash)
Expand Down Expand Up @@ -329,6 +356,48 @@ def content_like?(object)
object.is_a?(Content) || object.is_a?(Content::Raw)
end

def effective_tools
active = @tools.transform_values { |t| Tool::Registration.new(t, deferred: false) }
return active if @tool_catalog.empty?

deferred = @tool_catalog.available.transform_values { |t| Tool::Registration.new(t, deferred: true) }
deferred.merge(active)
end

def register_tool(tool, defer:)
tool_instance = tool.is_a?(Class) ? tool.new : tool

if defer_allowed?(tool_instance, defer)
@tool_catalog.add(tool_instance)
else
@tools[tool_instance.name.to_sym] = tool_instance
end
end

def defer_allowed?(tool, explicit)
return false unless explicit.nil? ? tool.deferred? : explicit == true

unless @config.tool_search_enabled
warn_deferred_ignored('tool_search_enabled is false')
return false
end

unless @provider.respond_to?(:supports_deferred_loading?) && @provider.supports_deferred_loading?
warn_deferred_ignored("provider #{@provider.slug} does not support deferred tool loading")
return false
end

true
end

def warn_deferred_ignored(reason)
@deferred_warnings ||= Set.new
return if @deferred_warnings.include?(reason)

@deferred_warnings << reason
RubyLLM.logger.warn("Ignoring defer: true — #{reason}")
end

def append_system_instruction(instructions)
system_messages, non_system_messages = @messages.partition { |msg| msg.role == :system }
system_messages << Message.new(role: :system, content: instructions)
Expand Down
2 changes: 2 additions & 0 deletions lib/ruby_llm/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ def defaults = @defaults ||= {}
option :log_stream_debug, -> { ENV['RUBYLLM_STREAM_DEBUG'] == 'true' }
option :log_regexp_timeout, -> { Regexp.respond_to?(:timeout) ? (Regexp.timeout || 1.0) : nil }

option :tool_search_enabled, true

def initialize
self.class.send(:defaults).each do |key, default|
value = default.respond_to?(:call) ? instance_exec(&default) : default
Expand Down
3 changes: 2 additions & 1 deletion lib/ruby_llm/message.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ module RubyLLM
class Message
ROLES = %i[system user assistant tool].freeze

attr_reader :role, :model_id, :tool_calls, :tool_call_id, :raw, :thinking, :tokens
attr_reader :role, :model_id, :tool_calls, :tool_call_id, :raw, :thinking, :tokens, :tool_references
attr_writer :content

def initialize(options = {})
Expand All @@ -24,6 +24,7 @@ def initialize(options = {})
)
@raw = options[:raw]
@thinking = options[:thinking]
@tool_references = Array(options[:tool_references])

ensure_valid_role
end
Expand Down
4 changes: 4 additions & 0 deletions lib/ruby_llm/providers/anthropic.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ def headers
}
end

def supports_deferred_loading?
true
end

class << self
def capabilities
Anthropic::Capabilities
Expand Down
9 changes: 6 additions & 3 deletions lib/ruby_llm/providers/anthropic/chat.rb
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def build_base_payload(chat_messages, model, stream, thinking)

def add_optional_fields(payload, system_content:, tools:, tool_prefs:, temperature:, schema: nil) # rubocop:disable Metrics/ParameterLists
if tools.any?
payload[:tools] = tools.values.map { |t| Tools.function_for(t) }
payload[:tools] = Tools.format_tools(tools)
unless tool_prefs[:choice].nil? && tool_prefs[:calls].nil?
payload[:tool_choice] = Tools.build_tool_choice(tool_prefs)
end
Expand All @@ -90,8 +90,10 @@ def parse_completion_response(response)
thinking_content = extract_thinking_content(content_blocks)
thinking_signature = extract_thinking_signature(content_blocks)
tool_use_blocks = Tools.find_tool_uses(content_blocks)
tool_references = Tools.find_tool_references(content_blocks)

build_message(data, text_content, thinking_content, thinking_signature, tool_use_blocks, response)
build_message(data, text_content, thinking_content, thinking_signature, tool_use_blocks, tool_references,
response)
end

def extract_text_content(blocks)
Expand All @@ -111,7 +113,7 @@ def extract_thinking_signature(blocks)
thinking_block&.dig('signature') || thinking_block&.dig('data')
end

def build_message(data, content, thinking, thinking_signature, tool_use_blocks, response) # rubocop:disable Metrics/ParameterLists
def build_message(data, content, thinking, thinking_signature, tool_use_blocks, tool_references, response) # rubocop:disable Metrics/ParameterLists
usage = data['usage'] || {}
cached_tokens = usage['cache_read_input_tokens']
cache_creation_tokens = usage['cache_creation_input_tokens']
Expand All @@ -128,6 +130,7 @@ def build_message(data, content, thinking, thinking_signature, tool_use_blocks,
content: content,
thinking: Thinking.build(text: thinking, signature: thinking_signature),
tool_calls: Tools.parse_tool_calls(tool_use_blocks),
tool_references: tool_references,
input_tokens: usage['input_tokens'],
output_tokens: usage['output_tokens'],
cached_tokens: cached_tokens,
Expand Down
Loading