Skip to content

Add TinyFish search and fetch tools#2820

Open
pranavjana wants to merge 11 commits into
ag2ai:mainfrom
pranavjana:feat/add-tinyfish-search-fetch
Open

Add TinyFish search and fetch tools#2820
pranavjana wants to merge 11 commits into
ag2ai:mainfrom
pranavjana:feat/add-tinyfish-search-fetch

Conversation

@pranavjana
Copy link
Copy Markdown

Summary

This PR expands the TinyFish integration so AG2 agents can use all three TinyFish web APIs that are useful in agent workflows:

  • TinyFishTool for goal-directed Agent API automation on a target URL
  • TinyFishSearchTool for ranked web search results in the existing autogen.tools.experimental API
  • TinyFishFetchTool for browser-rendered content extraction from known URLs in the existing autogen.tools.experimental API
  • TinyFishSearchToolkit for beta agents, exposing both tinyfish_search and tinyfish_fetch

The main use case is a research workflow where an agent first discovers candidate pages with Search, then reads selected pages with Fetch, and can still use the existing goal-directed TinyFish Agent API when it needs TinyFish to operate a website from a natural-language goal.

Changes

  • Adds autogen.beta.tools.search.TinyFishSearchToolkit with structured search and fetch result dataclasses.
  • Re-exports the beta toolkit from autogen.beta.tools.search and autogen.beta.tools with the existing optional-dependency fallback pattern.
  • Adds TinyFishSearchTool and TinyFishFetchTool to autogen.tools.experimental.tinyfish, alongside the existing TinyFishTool.
  • Updates experimental exports so the new tools can be imported from autogen.tools.experimental and autogen.tools.experimental.tinyfish.
  • Adds tinyfish to the docs/search optional extras groups.
  • Extends existing TinyFish reference docs to cover Agent, Search, and Fetch, and adds beta toolkit docs under common toolkits.

Validation

  • git diff --check
  • uv run ruff format --check autogen/tools/experimental/tinyfish/tinyfish_tool.py autogen/beta/tools/search/tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py test/beta/tools/search/test_tinyfish.py
  • uv run ruff check autogen/tools/experimental/tinyfish/tinyfish_tool.py autogen/beta/tools/search/tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py test/beta/tools/search/test_tinyfish.py
  • uv run --extra tinyfish --extra tavily --extra exa --extra ddgs --extra perplexity pytest test/beta/tools/search test/tools/experimental/tinyfish/test_tinyfish.py -q

The pytest run passed with 99 passed.

Documentation

Docs were updated in:

  • website/docs/user-guide/reference-tools/tinyfish.mdx
  • website/docs/user-guide/reference-tools/index.mdx
  • website/docs/beta/tools/common_toolkits.mdx

I attempted the local docs build with the docs extra, but this environment is missing Quarto, so notebook metadata generation fails before the docs site can complete building.

@pranavjana pranavjana requested a review from Lancetnik as a code owner May 14, 2026 08:35
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 14, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file beta labels May 14, 2026
@pranavjana
Copy link
Copy Markdown
Author

pranavjana commented May 14, 2026

Addressed the Codecov patch coverage feedback with focused tests for both TinyFish implementations:

  • beta TinyFishSearchToolkit client option forwarding and fetch link fallback behavior
  • legacy TinyFishTool / TinyFishSearchTool / TinyFishFetchTool SDK response mapping helpers, including close calls and Agent API result branches

Validation after the update:

git diff --check
uv run ruff check test/tools/experimental/tinyfish/test_tinyfish.py test/beta/tools/search/test_tinyfish.py
uv run --extra tinyfish --extra tavily --extra exa --extra ddgs --extra perplexity pytest test/beta/tools/search test/tools/experimental/tinyfish/test_tinyfish.py -q

Result: 107 passed.

@marklysze
Copy link
Copy Markdown
Collaborator

Thanks for the TinyFish search + fetch additions. The try/except ImportError guard and missing_optional_dependency fallback match the existing pattern for ExaToolkit / TavilySearchTool correctly.

Merge conflict to resolve

PR #2859 also modified website/docs/beta/tools/common_toolkits.mdx (added a ## RAGToolkit section). Neither change will conflict semantically — they add independent sections — but GitHub will flag it as a merge conflict once either PR lands. Whoever merges second will need a quick rebase.

TinyFishTool vs TinyFishSearchToolkit — one API or two?

The diff removes from .tinyfish import TinyFishTool from autogen/tools/experimental/tinyfish/__init__.py and adds TinyFishSearchToolkit to the beta tools. Is TinyFishTool (the goal-directed Agent API wrapper) still present in tinyfish_tool.py, just no longer re-exported? Or is it dropped entirely? The PR body mentions it as a separate tool, but I want to confirm it's still accessible to users who need the goal-directed path.

Minor nit

TinyFishFetchResult has latency_ms: float | None = None — is this populated in practice, or always None? If the upstream API doesn't return latency, removing the field avoids silent confusion in tool outputs.

Otherwise the toolkit structure is clean, the test suite uses pytest.importorskip correctly, and the pyproject.toml optional extras look right. LGTM pending the above clarifications.

Copy link
Copy Markdown
Collaborator

@marklysze marklysze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are now thorough and the implementation is clean. Approving.

Code quality: good. The factory method pattern (search()/fetch()) cleanly separates per-tool params from client params. try/finally await client.close() is correct async resource management. resolve_variable for context injection follows the established beta pattern.

One remaining action for the contributor: resolve the merge conflict on website/docs/beta/tools/common_toolkits.mdx with PR #2859 before this can be merged. The conflict is straightforward — both PRs add independent sections, so a rebase should resolve it cleanly.

Copy link
Copy Markdown
Collaborator

@marklysze marklysze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good contribution — the structure follows existing toolkit patterns well (lazy imports, missing_optional_dependency guard, try/finally close, resolve_variable for Context-bound defaults).

One security nit before this lands:

URL scheme validation in tinyfish_fetch

tinyfish_fetch accepts a list[str] of URLs from the LLM and passes them straight to AsyncTinyFish. An adversarial prompt or a confused agent could pass file:///etc/passwd, javascript:..., or data: URLs. Even if the TinyFish SDK rejects them server-side, we should validate at our layer.

Pattern from the just-merged Crawl4AIToolkit fix:

from urllib.parse import urlparse

_SAFE_SCHEMES = {"http", "https"}

def _safe_url(url: str) -> bool:
    return urlparse(url).scheme.lower() in _SAFE_SCHEMES

Then in tinyfish_fetch, before calling the client:

invalid = [u for u in urls if not _safe_url(u)]
if invalid:
    return ToolResult({"error": f"Only http/https URLs are supported; rejected: {invalid}"})

Otherwise LGTM — structure is clean, docs table is helpful, and the try/finally resource handling is correct.

Copy link
Copy Markdown
Collaborator

@marklysze marklysze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good contribution — the structure follows existing toolkit patterns well (lazy imports, missing_optional_dependency guard, try/finally close, resolve_variable for Context-bound defaults).

One security nit before this lands:

URL scheme validation in tinyfish_fetch

tinyfish_fetch accepts a list[str] of URLs from the LLM and passes them straight to AsyncTinyFish. An adversarial prompt or a confused agent could pass file:///etc/passwd, javascript:..., or data: URLs. Even if the TinyFish SDK rejects them server-side, we should validate at our layer (same fix was applied to Crawl4AIToolkit in #2860).

Pattern used in #2860:

from urllib.parse import urlparse

_SAFE_SCHEMES = {"http", "https"}

def _safe_url(url: str) -> bool:
    return urlparse(url).scheme.lower() in _SAFE_SCHEMES

Then in tinyfish_fetch, filter before calling the client:

invalid = [u for u in urls if not _safe_url(u)]
if invalid:
    return ToolResult({"error": f"Only http/https URLs are supported; rejected: {invalid}"})

Otherwise LGTM — structure is clean, docs table is helpful, try/finally resource handling is correct, and the split between beta TinyFishSearchToolkit and experimental TinyFishSearchTool/FetchTool matches the existing AG2 dual-path pattern.

@marklysze
Copy link
Copy Markdown
Collaborator

Good contribution — the structure follows existing toolkit patterns well (lazy imports, missing_optional_dependency guard, try/finally close, resolve_variable for Context-bound defaults).

One security nit before this lands:

URL scheme validation in tinyfish_fetch

tinyfish_fetch accepts a list[str] of URLs from the LLM and passes them straight to AsyncTinyFish. An adversarial prompt or a confused agent could pass file:///etc/passwd, javascript:..., or data: URLs. Even if the TinyFish SDK rejects them server-side, we should validate at our layer (same fix was applied to Crawl4AIToolkit in #2860).

Pattern used in #2860:

from urllib.parse import urlparse

_SAFE_SCHEMES = {"http", "https"}

def _safe_url(url: str) -> bool:
    return urlparse(url).scheme.lower() in _SAFE_SCHEMES

Then in tinyfish_fetch, filter before calling the client:

invalid = [u for u in urls if not _safe_url(u)]
if invalid:
    return ToolResult({"error": f"Only http/https URLs are supported; rejected: {invalid}"})

Otherwise LGTM — structure is clean, docs table is helpful, try/finally resource handling is correct, and the split between beta TinyFishSearchToolkit and experimental TinyFishSearchTool/FetchTool matches the existing AG2 dual-path pattern.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
autogen/beta/tools/__init__.py 100.00% <100.00%> (ø)
autogen/beta/tools/search/__init__.py 63.63% <100.00%> (-36.37%) ⬇️
autogen/beta/tools/search/tinyfish.py 100.00% <100.00%> (ø)

... and 389 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pranavjana
Copy link
Copy Markdown
Author

Addressed the maintainer follow-ups:

  • Added URL scheme validation for tinyfish_fetch so only http and https URLs are accepted before calling the TinyFish SDK.
  • Added beta and legacy tests proving unsafe schemes are rejected before the client/helper call.
  • Confirmed TinyFishTool remains exported from both autogen.tools.experimental.tinyfish and autogen.tools.experimental.
  • Removed latency_ms from the exposed beta and legacy fetch results. The TinyFish SDK model does include it, but it does not need to be part of AG2's public tool result contract.
  • Cleaned up the TinyFish docs section after the upstream merge so the Variable note is no longer misplaced in the Perplexity section.

Validation:

git diff --check
uv run ruff format --check autogen/beta/tools/search/tinyfish.py autogen/tools/experimental/tinyfish/tinyfish_tool.py test/beta/tools/search/test_tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py
uv run ruff check autogen/beta/tools/search/tinyfish.py autogen/tools/experimental/tinyfish/tinyfish_tool.py test/beta/tools/search/test_tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py
uv run --extra tinyfish --extra tavily --extra exa --extra ddgs --extra perplexity pytest test/beta/tools/search test/tools/experimental/tinyfish/test_tinyfish.py -q

Result: 110 passed.

@pranavjana
Copy link
Copy Markdown
Author

Added the TinyFish integration marker follow-up.

What changed:

  • Agent and Fetch POST calls now set TinyFish's SDK-supported integration marker to ag2 for the duration of the request.
  • The marker is internal and not user-configurable from AG2 tool constructors or function arguments.
  • Any existing TF_API_INTEGRATION value is restored after the TinyFish call.
  • Search is unchanged because the TinyFish Search SDK call is a GET request, so there is no request body field to inject.

Validation:

git diff --check
uv run ruff check autogen/beta/tools/search/tinyfish.py autogen/tools/experimental/tinyfish/tinyfish_tool.py test/beta/tools/search/test_tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py
uv run --extra tinyfish pytest test/beta/tools/search/test_tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py -q

Result: 45 passed.

@marklysze
Copy link
Copy Markdown
Collaborator

marklysze commented May 18, 2026

@pranavjana thanks again for creating this.

We've recently compiled a Contribution Policy for beta and we'd like to put this in as an extension as we move towards V1.0.

Can you please move the beta code, test, and documentation so that when we get to V1.0 we can move it accordingly.

E.g. Code to:
autogen/beta/extensions/tools/search

E.g. Documentation to:
website/docs/beta/extensions/tools/search

E.g. Tests to:
test/beta/extensions/tools/search

@pranavjana
Copy link
Copy Markdown
Author

Addressed the beta extension policy request.

What changed:

  • Moved the beta TinyFishSearchToolkit implementation to autogen/beta/extensions/tools/search.
  • Moved the beta TinyFish tests to test/beta/extensions/tools/search.
  • Moved the beta docs section out of Common Tools into website/docs/beta/extensions/tools/search/tinyfish.mdx and added it to the docs navigation template.
  • Updated beta examples to import from autogen.beta.extensions.tools.search.
  • Removed the beta TinyFish re-export from autogen.beta.tools / autogen.beta.tools.search so it is treated as an extension.
  • Left the legacy autogen.tools.experimental.tinyfish implementation unchanged.

Validation:

git diff --check
uv run ruff format autogen/beta/extensions/tools/search/tinyfish.py autogen/beta/extensions/tools/__init__.py autogen/beta/extensions/tools/search/__init__.py autogen/beta/tools/__init__.py autogen/beta/tools/search/__init__.py test/beta/extensions/tools/conftest.py test/beta/extensions/tools/search/test_tinyfish.py test/beta/extensions/tools/__init__.py test/beta/extensions/tools/search/__init__.py
uv run ruff check autogen/beta/extensions/tools/search/tinyfish.py autogen/beta/extensions/tools/__init__.py autogen/beta/extensions/tools/search/__init__.py autogen/beta/tools/__init__.py autogen/beta/tools/search/__init__.py test/beta/extensions/tools/conftest.py test/beta/extensions/tools/search/test_tinyfish.py test/beta/extensions/tools/__init__.py test/beta/extensions/tools/search/__init__.py
uv run --extra tinyfish --extra tavily --extra exa --extra ddgs --extra perplexity pytest test/beta/tools/search test/beta/extensions/tools/search test/tools/experimental/tinyfish/test_tinyfish.py -q

Result: 112 passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

beta dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants