Skip to content

feat: add Exa semantic search tool & enable Jina Reader free mode#1

Open
ignorejjj wants to merge 1 commit into
RUC-NLPIR:mainfrom
ignorejjj:feat/exa-search-jina-free
Open

feat: add Exa semantic search tool & enable Jina Reader free mode#1
ignorejjj wants to merge 1 commit into
RUC-NLPIR:mainfrom
ignorejjj:feat/exa-search-jina-free

Conversation

@ignorejjj
Copy link
Copy Markdown
Member

Add Exa semantic search & enable Jina Reader free mode

Summary

This PR adds Exa Search as a new search tool and updates the Jina Reader integration to properly support free-mode usage (no API key required). Both changes are non-breaking — they extend the existing architecture without modifying any core interfaces.


Motivation

  1. Search diversity: SearchClaw currently relies on Serper (Google) as its primary web search, falling back to DuckDuckGo HTML scraping when no API key is set. Adding Exa provides a complementary semantic search engine with different ranking and coverage, improving research quality — especially for natural-language queries and topic-focused searches.

  2. Zero-config web fetching: The original code treated JINA_API_KEY as effectively required — without it, the warning message implied Jina was unavailable and the system fell back to direct HTTP fetch (which can't handle JS-rendered pages or PDFs). In reality, Jina Reader works without an API key at 20 RPM. This PR makes that explicit, so SearchClaw works better out of the box.


Changes

New: src/tools/exa_search.py

A new ExaSearchTool that calls the official Exa MCP endpoint (https://mcp.exa.ai/mcp) using Streamable HTTP transport (JSON-RPC over HTTP). Key design decisions:

  • No extra dependencies — uses httpx (already a project dependency) to send JSON-RPC requests directly. No MCP client library or mcporter CLI needed.
  • Works without API key — the Exa MCP endpoint accepts unauthenticated requests (rate-limited). When EXA_API_KEY is set, it's appended to the URL for higher limits.
  • Follows existing patterns — implements the same Tool interface as all other tools (input_schema, prompt(), validate_input(), call()). Registered as an optional tool in build_default_registry() with a try/except ImportError guard.
  • Category filtering — supports Exa's category parameter (research paper, news, company, personal site) for focused searches.
  • Concurrency-safe — marked is_concurrency_safe = True, consistent with other search tools.

Modified: src/tools/web_fetch.py

  • Updated _fetch_via_jina() docstring to document both modes: free mode (20 RPM, no key) and authenticated mode (200 RPM, with JINA_API_KEY).
  • Added debug-level log when running in free mode for observability.
  • Updated module docstring to reflect that Jina works without a key.

Modified: src/core/tool.py

  • Added ExaSearchTool registration to build_default_registry(), following the same optional-tool pattern used by AcademicSearchTool, NewsSearchTool, and WeChatSearchTool.

Modified: src/web/router.py

  • Added _exa_search_default_results and _exa_search_max_results config variables (with defaults).
  • Passes Exa config into build_default_registry().

Modified: src/main.py

  • Changed JINA_API_KEY missing log from WARNING to INFO, clarifying that Jina still works in free mode.
  • Added startup log for Exa search status (with/without API key).

Modified: config/settings.yaml

  • Added exa_search_default_results and exa_search_max_results under tools:.
  • Updated environment variable documentation: Jina is now marked OPTIONAL (was RECOMMENDED), with a note that it works without a key. Added EXA_API_KEY documentation.

Files changed

File Type Description
src/tools/exa_search.py New Exa semantic search tool via MCP
src/tools/web_fetch.py Modified Jina free mode documentation & logging
src/core/tool.py Modified Register ExaSearchTool
src/web/router.py Modified Pass Exa config to registry
src/main.py Modified Startup log updates
config/settings.yaml Modified Exa config + Jina docs update

How to test

# Basic: verify all tools register correctly
python -c "
from src.core.tool import build_default_registry
registry = build_default_registry()
print([t.name for t in registry.all_tools()])
# Should include 'exa_search'
"

# Exa search (no API key needed):
python -c "
import asyncio
from src.tools.exa_search import ExaSearchTool
from src.core.tool import ToolUseContext

async def test():
    tool = ExaSearchTool()
    ctx = ToolUseContext(session_id='test')
    result = await tool.call({'query': 'latest advances in RAG'}, ctx)
    print(result.data[:500])

asyncio.run(test())
"

# Jina free mode (no API key needed):
python -c "
import asyncio
from src.tools.web_fetch import WebFetchTool
from src.core.tool import ToolUseContext

async def test():
    tool = WebFetchTool()
    ctx = ToolUseContext(session_id='test')
    result = await tool.call({'url': 'https://example.com'}, ctx)
    print(result.data[:500])

asyncio.run(test())
"

Non-breaking guarantees

  • No existing tool behavior is changed — web_search, web_fetch, and all other tools work exactly as before.
  • No new required dependencies — only uses httpx which is already in pyproject.toml.
  • No new required environment variables — both features work without any API keys.
  • No changes to the Tool base class, ToolRegistry, or agent loop.
  • Configuration additions have sensible defaults; existing settings.yaml files continue to work without changes.

- Add ExaSearchTool (src/tools/exa_search.py) using Exa's official MCP
  endpoint via Streamable HTTP transport (JSON-RPC). Works without API
  key; set EXA_API_KEY for higher rate limits.
- Update Jina Reader to properly support free mode (20 RPM without key).
  Change startup log from WARNING to INFO since Jina works without a key.
- Register ExaSearchTool in build_default_registry() with try/except guard.
- Add exa_search config (default/max results) to settings.yaml and router.
- No new dependencies required (uses existing httpx).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant