Skip to content

feat(llm): pluggable web search providers (Exa, Tavily, SearXNG)#9556

Open
tgonzalezc5 wants to merge 1 commit into
TriliumNext:mainfrom
tgonzalezc5:feat/search-provider-registry
Open

feat(llm): pluggable web search providers (Exa, Tavily, SearXNG)#9556
tgonzalezc5 wants to merge 1 commit into
TriliumNext:mainfrom
tgonzalezc5:feat/search-provider-registry

Conversation

@tgonzalezc5
Copy link
Copy Markdown

Summary

Adds a pluggable web-search provider abstraction for the LLM agent. Each provider is a user-configured instance stored in a new searchProviders option, mirroring the shape of the existing llmProviders registry so multiple instances of the same provider (for example two SearXNG URLs) can coexist and the same UI patterns are reused.

Three providers ship with the abstraction: Exa, Tavily and SearXNG. When any provider is configured the agent routes web_search through it; when the list is empty the existing per-provider built-in search (Anthropic, OpenAI, Google) is preserved, so this change is fully backward-compatible with current installs.

What's in it

  • apps/server/src/services/search_providers/ — new module containing:
    • base_search_provider.ts: SearchProvider interface + normalized SearchResult shape (title, url, snippet, optional publishedDate, author) and a SearchOptions type exposing numResults, domain and date filters, and category.
    • exa.ts, tavily.ts, searxng.ts: raw-fetch implementations. Each returns typed SearchResult[]. Exa requests highlights, a summary and a 500-char text extract in a single call and cascades through them so the LLM always sees a meaningful snippet even when one field is missing.
    • index.ts: registry with getSearchProvider(id?), getFirstSearchProvider(), hasConfiguredSearchProviders(), clearSearchProviderCache(), and graceful handling of unknown provider types or bad config.
    • tool.ts: AI-SDK tool(...) wrapper that exposes any SearchProvider as the web_search tool.
  • base_provider.ts: buildTools() now consults the search registry first and only falls through to addWebSearchTool() (native) when nothing is configured.
  • packages/commons/src/lib/options_interface.ts: new searchProviders: string option (JSON-serialized array).
  • apps/server/src/services/options_init.ts: default searchProviders: \"[]\" (synced).
  • apps/server/src/routes/api/options.ts: option whitelisted for API updates.
  • apps/client/src/widgets/type_widgets/options/llm.tsx: new SearchProviderSettings section with add/delete flow that mirrors the existing ProviderSettings widget.
  • apps/client/src/widgets/type_widgets/options/llm/AddSearchProviderModal.tsx: add-provider modal; fields are driven by provider metadata (requiresApiKey for Exa/Tavily, requiresBaseUrl for SearXNG).
  • English translation keys under llm.*.
  • 28 new unit tests across exa.spec.ts, tavily.spec.ts, searxng.spec.ts and index.spec.ts covering response parsing, content-field fallbacks, filter propagation, error paths, disabled state, registry caching and unknown-type handling.

Backward compatibility

  • searchProviders defaults to \"[]\", so every existing install falls through to the current provider-native search with no behaviour change.
  • No existing option is renamed or removed.

Usage

// Example configuration persisted in the searchProviders option:
[
  { id: \"exa_1\", name: \"Exa\", provider: \"exa\", apiKey: \"...\" },
  { id: \"searxng_local\", name: \"Local SearXNG\", provider: \"searxng\", baseUrl: \"http://localhost:8888\" }
]

Adding a provider is done through Options → AI / LLM → Web Search Providers → Add Search Provider.

Files changed

  • apps/client/src/translations/en/translation.json
  • apps/client/src/widgets/type_widgets/options/llm.tsx
  • apps/client/src/widgets/type_widgets/options/llm/AddSearchProviderModal.tsx (new)
  • apps/server/src/routes/api/options.ts
  • apps/server/src/services/llm/providers/base_provider.ts
  • apps/server/src/services/options_init.ts
  • apps/server/src/services/search_providers/*.ts (new module, 10 files incl. specs)
  • packages/commons/src/lib/options_interface.ts

Test plan

  • pnpm -C apps/server exec vitest run src/services/search_providers/ — 28/28 new tests pass.
  • pnpm -C apps/server exec vitest run — full server suite: 725 passed / 1 expected fail / 26 skipped (same as main).
  • New files type-check in isolation (no errors from search_providers/ or AddSearchProviderModal.tsx).
  • Manual smoke: start server, add an Exa provider in Options → AI / LLM, trigger a chat with web search on, confirm the web_search tool hits Exa and returns results.
  • Manual smoke: remove all search providers, confirm LLM agent falls back to provider-native search.
  • Manual smoke: SearXNG against a running instance.

Introduces a search-provider registry that mirrors the existing
`llmProviders` pattern, letting users configure one or more third-party
web search engines for the LLM agent. When any provider is configured it
replaces each LLM provider's built-in web search; when empty, behaviour
is unchanged (Anthropic, OpenAI and Google continue to use their native
search).

- New `apps/server/src/services/search_providers/` module with a common
  `SearchProvider` interface, raw-fetch implementations of Exa, Tavily
  and SearXNG, and an AI-SDK tool wrapper.
- Single `searchProviders` option (JSON array of setups) persisted and
  synced, whitelisted in the options API, added to `OptionDefinitions`.
- `BaseProvider.buildTools` prefers a configured search provider and
  falls back to the provider-native web search if none is set.
- New `SearchProviderSettings` section in the AI / LLM options page
  with an add-modal that shows per-provider fields (API key for Exa
  and Tavily, base URL for SearXNG) and a list widget for managing
  configured instances.
- Unit specs for each provider covering response parsing, content-field
  fallbacks, error handling and the registry's caching and unknown-type
  behaviour.
@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces pluggable web search providers (Exa, Tavily, and SearXNG) for the AI agent, allowing it to use third-party search engines instead of relying solely on built-in LLM search capabilities. The changes include new UI components for managing search providers, server-side registry and tool integration, and comprehensive unit tests. Feedback includes a correction for a CSS class typo and recommendations to add array validation when parsing search provider configurations to prevent potential runtime crashes.

if (!providersJson) {
return [];
}
return JSON.parse(providersJson) as SearchProviderSetup[];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The result of JSON.parse should be validated to ensure it is an array before returning. If the searchProviders option is manually set to a non-array JSON value (like an object or null), subsequent calls to .length or .find on the returned value will throw an error.

        const parsed = JSON.parse(providersJson);
        return Array.isArray(parsed) ? parsed : [];

const [providersJson, setProvidersJson] = useTriliumOption("searchProviders");
const providers = useMemo<SearchProviderConfig[]>(() => {
try {
return providersJson ? JSON.parse(providersJson) : [];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the server-side registry, the parsed JSON should be validated as an array. If the option contains a non-array value, the useMemo hook will return a value that causes providers.filter or providers.map to crash the UI.

            const parsed = providersJson ? JSON.parse(providersJson) : [];
            return Array.isArray(parsed) ? parsed : [];


return (
<div style={{ overflow: "auto" }}>
<table className="table table-stripped">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo in Bootstrap class name: table-stripped should be table-striped.

Suggested change
<table className="table table-stripped">
<table className="table table-striped">

@eliandoran
Copy link
Copy Markdown
Contributor

Hi, what's the difference compared to #9342?

@tgonzalezc5
Copy link
Copy Markdown
Author

Hello @eliandoran ! I saw your feedback in the other PR asking for a provider-type abstraction, multiple instances, and a few other things. I generally did the following:

  1. search enges as a registry so not just a flat set of options.
  2. scale out # of instances supported
  3. keep UI in line with existing flow
  4. add in another search option
  5. unit tests for the search providers

Also made sure it's backward compat with a default empty config.

Happy to have you fold this into #9342 , whatever you prefer.

@eliandoran
Copy link
Copy Markdown
Contributor

@tgonzalezc5 , what's your opinion of the LLM chat we have so far?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflicts size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants