layout

default

title

Chapter 7: Memory, Tool Retrieval, and Third-Party APIs

nav_order

parent

AutoAgent Tutorial

format_version

why

As your AutoAgent deployment accumulates tools, the LLM cannot fit every tool schema in its context window. The ToolMemory + ToolReranker pipeline ensures agents always see the most relevant tools — not all tools. Understanding this system lets you ingest large API catalogs and keep agents fast at scale.

mental_model

ToolMemory is a semantic search index over tool descriptions. When an agent runs, the query (user message) retrieves the top-K most relevant tools from ChromaDB, which are then reranked by a small LLM call. Only the final shortlist is passed to the agent's tool schema — the rest never reach the LLM.

learning_outcomes

Understand how ToolMemory indexes tool descriptions with ChromaDB + text-embedding-3-small

Configure ToolReranker for LLM-based relevance scoring with Pydantic RerankResult

Ingest RapidAPI tool documentation via process_tool_docs.py

Apply the 12,000-token output cap and truncate_output() wrapper correctly

Use CodeMemory and RAGMemory for codebase navigation and document retrieval

snapshot

source_repo	stars	language	license
https://github.com/HKUDS/AutoAgent	9116	Python	MIT

chapter_map

autoagent/tool_memory.py

autoagent/rag_memory.py

autoagent/rag_tools.py

autoagent/code_memory.py

autoagent/tool_retriever.py

autoagent/search_tools.py

autoagent/process_tool_docs.py

sources

https://github.com/HKUDS/AutoAgent

https://arxiv.org/abs/2502.05957

Chapter 7: Memory, Tool Retrieval, and Third-Party APIs

What Problem Does This Solve?

A production AutoAgent deployment might have hundreds of registered tools — scraped from RapidAPI, generated by the Agent Editor, or built by contributors. If every tool schema is included in every LLM call:

Context window overflows (GPT-4o: 128k tokens; each tool schema: ~200-500 tokens)
LLM attention diffuses — too many irrelevant tools confuse tool selection
Inference cost scales linearly with tool count even when most tools are irrelevant

AutoAgent solves this with a two-stage retrieval pipeline: semantic search (ChromaDB) narrows thousands of tools to ~20 candidates, then LLM-based reranking (ToolReranker) picks the top 5-10 for the actual tool schema list.

The Two-Stage Retrieval Pipeline

flowchart TD
    Q["User query\n'recommend a product based on budget'"]

    subgraph "Stage 1: Semantic Search"
        TM["ToolMemory\nChromaDB collection"]
        EMB["text-embedding-3-small\nquery → vector"]
        K20["Top-20 candidates\nby cosine similarity"]
    end

    subgraph "Stage 2: LLM Reranking"
        TR["ToolReranker\nLiteLLM call"]
        RR["RerankResult (Pydantic)\nscores + justifications"]
        K5["Top-5 tools selected"]
    end

    subgraph "Agent Invocation"
        SCHEMA["Tool schemas injected\ninto agent's function list"]
        AGENT["MetaChain.run()"]
    end

    Q --> EMB
    EMB --> TM
    TM --> K20
    K20 --> TR
    TR --> RR
    RR --> K5
    K5 --> SCHEMA
    SCHEMA --> AGENT

ToolMemory (`tool_memory.py`)

Indexing

# autoagent/tool_memory.py

import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

class ToolMemory:
    """Semantic search index over tool descriptions using ChromaDB."""

    def __init__(self, collection_name: str = "autoagent_tools"):
        self.client = chromadb.PersistentClient(path="./workspace/.chroma")
        self.embedding_fn = OpenAIEmbeddingFunction(
            api_key=os.getenv("OPENAI_API_KEY"),
            model_name="text-embedding-3-small",
        )
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_fn,
        )

    def index_tool(self, tool: Callable) -> None:
        """Add a tool to the semantic index."""
        # Create rich description for embedding
        schema = function_to_json(tool)
        description = f"""Tool: {schema['function']['name']}
Description: {schema['function']['description']}
Parameters: {json.dumps(schema['function']['parameters']['properties'], indent=2)}
"""
        self.collection.upsert(
            ids=[tool.__name__],
            documents=[description],
            metadatas=[{"module": tool.__module__, "is_plugin": True}],
        )

    def search(self, query: str, n_results: int = 20) -> list[dict]:
        """Semantic search: return top-n_results tools for the query."""
        results = self.collection.query(
            query_texts=[query],
            n_results=min(n_results, self.collection.count()),
        )

        tools = []
        for i, tool_id in enumerate(results["ids"][0]):
            tools.append({
                "name": tool_id,
                "description": results["documents"][0][i],
                "distance": results["distances"][0][i],
                "metadata": results["metadatas"][0][i],
            })

        return tools  # Sorted by relevance (lower distance = more relevant)

    def index_all_registry_tools(self) -> int:
        """Index all registered plugin tools. Returns count indexed."""
        registry = get_registry()
        count = 0
        for tool_name, tool_func in registry["plugin_tools"].items():
            self.index_tool(tool_func)
            count += 1
        return count

ToolMemory Lifecycle

sequenceDiagram
    participant CLI as cli.py startup
    participant TM as ToolMemory
    participant CR as ChromaDB
    participant AE as Agent Editor
    participant RA as RapidAPI Ingester

    CLI->>TM: index_all_registry_tools()
    TM->>CR: upsert(built-in tools)
    Note over CR: Built-in tools indexed

    AE->>TM: index_tool(new_plugin_tool)
    TM->>CR: upsert(new tool description)
    Note over CR: Plugin tool added live

    RA->>TM: index_tool(rapidapi_tool)
    TM->>CR: upsert(API tool description)
    Note over CR: RapidAPI tool added

    participant AG as Agent at runtime
    AG->>TM: search(query, n_results=20)
    TM->>CR: query(embedding)
    CR-->>TM: top-20 candidates
    TM-->>AG: candidates list

When Tools Are Indexed

Tools are indexed at three points:

At session startup (cli.py calls tool_memory.index_all_registry_tools())
After Agent Editor creates a new tool (ToolEditorAgent calls tool_memory.index_tool())
After RapidAPI ingestion (process_tool_docs.py calls tool_memory.index_tool() for each ingested tool)

ToolReranker (`tool_retriever.py`)

After semantic search returns 20 candidates, ToolReranker uses an LLM call to score each tool's relevance to the current query:

# autoagent/tool_retriever.py

from pydantic import BaseModel

class ToolScore(BaseModel):
    tool_name: str
    relevance_score: float  # 0.0 to 1.0
    justification: str

class RerankResult(BaseModel):
    """Pydantic model for LLM reranking output."""
    scores: list[ToolScore]
    selected_tools: list[str]  # Top-K tool names after reranking

class ToolReranker:
    """LLM-based tool reranker for precision after semantic search recall."""

    def __init__(self, model: str = "gpt-4o-mini", top_k: int = 5):
        self.model = model   # Use a small/fast model for reranking
        self.top_k = top_k

    def rerank(self, query: str, candidates: list[dict]) -> list[str]:
        """Score candidates and return top-k tool names."""
        candidates_text = "\n\n".join([
            f"Tool {i+1}: {c['name']}\n{c['description']}"
            for i, c in enumerate(candidates)
        ])

        prompt = f"""Given this user query:
"{query}"

Score each tool's relevance (0.0-1.0) and select the {self.top_k} most useful tools.

Tools to evaluate:
{candidates_text}

Return a JSON object matching this schema:
{{
  "scores": [
    {{"tool_name": "...", "relevance_score": 0.9, "justification": "..."}}
  ],
  "selected_tools": ["tool1", "tool2", ...]  // top {self.top_k} names
}}"""

        response = litellm.completion(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"},
        )

        result = RerankResult.model_validate_json(
            response.choices[0].message.content
        )
        return result.selected_tools[:self.top_k]

The key design choice: use gpt-4o-mini (fast, cheap) for reranking, not the full gpt-4o. Reranking is a classification task that doesn't need the full model's reasoning capability.

RAGMemory and Document Retrieval (`rag_memory.py`)

RAGMemory provides semantic search over document chunks for knowledge-intensive tasks:

# autoagent/rag_memory.py

class RAGMemory:
    """Document chunk storage and retrieval using ChromaDB."""

    def __init__(self, collection_name: str = "autoagent_docs"):
        self.client = chromadb.PersistentClient(path="./workspace/.chroma")
        self.embedding_fn = OpenAIEmbeddingFunction(
            api_key=os.getenv("OPENAI_API_KEY"),
            model_name="text-embedding-3-small",
        )
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_fn,
        )

    def add_document(
        self,
        text: str,
        doc_id: str,
        chunk_size: int = 500,
        overlap: int = 50,
    ) -> int:
        """Chunk a document and add all chunks to the index."""
        chunks = self._chunk_text(text, chunk_size, overlap)
        ids = [f"{doc_id}__chunk_{i}" for i in range(len(chunks))]

        self.collection.upsert(
            ids=ids,
            documents=chunks,
            metadatas=[{"doc_id": doc_id, "chunk_index": i} for i in range(len(chunks))],
        )
        return len(chunks)

    def query(self, query: str, n_results: int = 5) -> list[str]:
        """Return top-n_results relevant chunks."""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results,
        )
        return results["documents"][0]

    def _chunk_text(self, text: str, chunk_size: int, overlap: int) -> list[str]:
        """Split text into overlapping chunks by word count."""
        words = text.split()
        chunks = []
        for i in range(0, len(words), chunk_size - overlap):
            chunk = " ".join(words[i:i + chunk_size])
            if chunk:
                chunks.append(chunk)
        return chunks

RAG Tools (`rag_tools.py`)

# autoagent/rag_tools.py

from autoagent.registry import register_tool

@register_tool
def rag_search(query: str, context_variables: dict) -> str:
    """Search indexed documents for relevant passages.
    
    Use when you need to find specific information from previously
    loaded documents or knowledge bases.
    """
    rag_memory = context_variables.get("rag_memory")
    if not rag_memory:
        return "RAG memory not initialized. Load documents first."

    chunks = rag_memory.query(query, n_results=5)
    return "\n\n---\n\n".join(chunks)

@register_tool
def add_document_to_rag(
    file_path: str,
    doc_id: str,
    context_variables: dict,
) -> str:
    """Load a document into RAG memory for semantic search."""
    rag_memory = context_variables.get("rag_memory")
    file_env = context_variables.get("file_env")

    content = file_env.visit_page(file_path)
    chunk_count = rag_memory.add_document(content, doc_id)
    return f"Indexed {chunk_count} chunks from {file_path}"

CodeMemory (`code_memory.py`)

CodeMemory specializes in codebase navigation — indexing source files so agents can find relevant code by describing its function:

# autoagent/code_memory.py

class CodeMemory:
    """Semantic search over source code for codebase navigation tasks."""

    def __init__(self):
        self.client = chromadb.PersistentClient(path="./workspace/.chroma")
        # Use a code-specific embedding model
        self.embedding_fn = OpenAIEmbeddingFunction(
            api_key=os.getenv("OPENAI_API_KEY"),
            model_name="text-embedding-3-small",
        )
        self.collection = self.client.get_or_create_collection(
            name="autoagent_code",
            embedding_function=self.embedding_fn,
        )

    def index_repository(self, repo_path: str) -> int:
        """Index all Python files in a repository."""
        count = 0
        for py_file in Path(repo_path).rglob("*.py"):
            content = py_file.read_text()
            # Index at function/class level for precision
            for chunk in self._extract_code_chunks(content, str(py_file)):
                self.collection.upsert(
                    ids=[chunk["id"]],
                    documents=[chunk["content"]],
                    metadatas=[chunk["metadata"]],
                )
                count += 1
        return count

    def find_relevant_code(self, description: str, n_results: int = 5) -> list[dict]:
        """Find code chunks matching a natural language description."""
        results = self.collection.query(
            query_texts=[description],
            n_results=n_results,
        )
        return [
            {
                "file": results["metadatas"][0][i]["file"],
                "function": results["metadatas"][0][i].get("function", ""),
                "code": results["documents"][0][i],
            }
            for i in range(len(results["ids"][0]))
        ]

RapidAPI Ingestion (`process_tool_docs.py`)

AutoAgent can ingest tools from RapidAPI's 50,000+ API catalog:

# autoagent/process_tool_docs.py

class RapidAPIIngester:
    """Ingests RapidAPI tool documentation and generates AutoAgent tools."""

    def __init__(self, rapidapi_key: str):
        self.rapidapi_key = rapidapi_key
        self.headers = {
            "X-RapidAPI-Key": rapidapi_key,
            "X-RapidAPI-Host": "rapidapi.com",
        }

    def ingest_api(
        self,
        api_name: str,
        endpoint_docs: dict,
        tool_memory: ToolMemory,
    ) -> list[Callable]:
        """Convert RapidAPI endpoint documentation to AutoAgent tools."""
        tools = []

        for endpoint_name, endpoint_info in endpoint_docs.items():
            # Generate tool function from API docs
            tool_code = self._generate_tool_code(
                api_name=api_name,
                endpoint_name=endpoint_name,
                endpoint_info=endpoint_info,
            )

            # Execute to get function object
            namespace = {}
            exec(tool_code, namespace)
            tool_func = namespace[endpoint_name.replace("/", "_")]

            # Apply plugin tool decorator (adds 12k token cap)
            tool_func = register_plugin_tool(tool_func)

            # Index in ToolMemory
            tool_memory.index_tool(tool_func)
            tools.append(tool_func)

        return tools

    def _generate_tool_code(
        self,
        api_name: str,
        endpoint_name: str,
        endpoint_info: dict,
    ) -> str:
        """Generate a Python wrapper function for a RapidAPI endpoint."""
        params = endpoint_info.get("parameters", [])
        param_str = ", ".join([
            f"{p['name']}: {p.get('type', 'str')} = None"
            for p in params
        ])

        return f'''
import requests
from autoagent.registry import register_plugin_tool

@register_plugin_tool
def {endpoint_name.replace("/", "_")}({param_str}) -> str:
    """{endpoint_info.get("description", f"Call {api_name} {endpoint_name} endpoint")}"""
    url = "https://rapidapi.com/{api_name}/{endpoint_name}"
    headers = {{
        "X-RapidAPI-Key": os.getenv("RAPIDAPI_KEY"),
        "X-RapidAPI-Host": "{api_name}.rapidapi.com",
    }}
    params = {{{", ".join([f'"{p["name"]}": {p["name"]}' for p in params])}}}
    response = requests.get(url, headers=headers, params=params)
    return response.text
'''

The 12,000-Token Output Cap

All plugin tools automatically have their output truncated to 12,000 tokens via the @register_plugin_tool decorator:

# autoagent/registry.py

import tiktoken

def truncate_output(output: str, max_tokens: int = 12000) -> str:
    """Truncate output to max_tokens using tiktoken counting."""
    enc = tiktoken.get_encoding("cl100k_base")
    tokens = enc.encode(output)

    if len(tokens) <= max_tokens:
        return output

    truncated = enc.decode(tokens[:max_tokens])
    return truncated + f"\n\n[OUTPUT TRUNCATED: {len(tokens) - max_tokens} tokens omitted]"

def register_plugin_tool(func: Callable) -> Callable:
    """Register a tool in the plugin_tools namespace with output truncation."""
    @wraps(func)
    def wrapped(*args, **kwargs):
        result = func(*args, **kwargs)
        return truncate_output(str(result))

    # Store original for introspection
    wrapped.__wrapped__ = func
    wrapped.__name__ = func.__name__

    # Register in global registry
    _registry["plugin_tools"][func.__name__] = wrapped
    return wrapped

The cap prevents runaway LLM costs when tools return large payloads (full web pages, large CSV files, API responses). The built-in @register_tool decorator does NOT apply this cap — it's only for plugin tools.

Token Budget Enforcement

Decorator	Output Cap	Use Case
`@register_plugin_tool`	12,000 tokens	User-generated and RapidAPI tools
`@register_tool`	None	Built-in system tools (trusted, controlled output)

GitHub Client (`github_client.py`)

For agent workflows that interact with GitHub:

# autoagent/github_client.py (simplified)

class GitHubClient:
    """Wrapper for common GitHub operations used in research workflows."""

    def __init__(self, token: str):
        from github import Github
        self.gh = Github(token)

    def get_repo_info(self, owner: str, repo: str) -> dict:
        """Get repository metadata including stars, language, license."""
        r = self.gh.get_repo(f"{owner}/{repo}")
        return {
            "name": r.name,
            "stars": r.stargazers_count,
            "language": r.language,
            "license": r.license.name if r.license else "Unknown",
            "description": r.description,
            "last_updated": r.updated_at.isoformat(),
        }

    def search_code(self, query: str, repo: str | None = None) -> list[dict]:
        """Search code across GitHub or within a specific repo."""
        search_query = f"{query} repo:{repo}" if repo else query
        results = self.gh.search_code(search_query)
        return [
            {"path": r.path, "url": r.html_url, "sha": r.sha}
            for r in results[:10]
        ]

Summary

Component	File	Role
`ToolMemory`	`tool_memory.py`	ChromaDB index over tool descriptions
`ToolReranker`	`tool_retriever.py`	LLM-based reranking of semantic search candidates
`RerankResult`	`tool_retriever.py`	Pydantic model for reranker output
`RAGMemory`	`rag_memory.py`	Chunked document index for knowledge retrieval
`rag_search()`	`rag_tools.py`	Tool to query RAG memory
`CodeMemory`	`code_memory.py`	Codebase semantic search at function/class level
`RapidAPIIngester`	`process_tool_docs.py`	Convert RapidAPI docs to AutoAgent tools
`truncate_output()`	`registry.py`	Enforce 12k token cap on plugin tool output
`@register_plugin_tool`	`registry.py`	Register with 12k cap (user-generated tools)
`@register_tool`	`registry.py`	Register without cap (built-in system tools)
`GitHubClient`	`github_client.py`	GitHub API operations for research workflows
`text-embedding-3-small`	(OpenAI API)	Embedding model for all ChromaDB collections

Continue to Chapter 8: Evaluation, Benchmarks, and Contributing to learn how to run GAIA benchmarks, add new evaluation suites, and contribute tools and agents to the ecosystem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 7: Memory, Tool Retrieval, and Third-Party APIs

What Problem Does This Solve?

The Two-Stage Retrieval Pipeline

ToolMemory (`tool_memory.py`)

Indexing

ToolMemory Lifecycle

When Tools Are Indexed

ToolReranker (`tool_retriever.py`)

RAGMemory and Document Retrieval (`rag_memory.py`)

RAG Tools (`rag_tools.py`)

CodeMemory (`code_memory.py`)

RapidAPI Ingestion (`process_tool_docs.py`)

The 12,000-Token Output Cap

Token Budget Enforcement

GitHub Client (`github_client.py`)

Summary

FilesExpand file tree

07-memory-tool-retrieval-apis.md

Latest commit

History

07-memory-tool-retrieval-apis.md

File metadata and controls

Chapter 7: Memory, Tool Retrieval, and Third-Party APIs

What Problem Does This Solve?

The Two-Stage Retrieval Pipeline

ToolMemory (tool_memory.py)

Indexing

ToolMemory Lifecycle

When Tools Are Indexed

ToolReranker (tool_retriever.py)

RAGMemory and Document Retrieval (rag_memory.py)

RAG Tools (rag_tools.py)

CodeMemory (code_memory.py)

RapidAPI Ingestion (process_tool_docs.py)

The 12,000-Token Output Cap

Token Budget Enforcement

GitHub Client (github_client.py)

Summary

ToolMemory (`tool_memory.py`)

ToolReranker (`tool_retriever.py`)

RAGMemory and Document Retrieval (`rag_memory.py`)

RAG Tools (`rag_tools.py`)

CodeMemory (`code_memory.py`)

RapidAPI Ingestion (`process_tool_docs.py`)

GitHub Client (`github_client.py`)