Add TinyFish search and fetch tools#2820
Conversation
|
Addressed the Codecov patch coverage feedback with focused tests for both TinyFish implementations:
Validation after the update: git diff --check
uv run ruff check test/tools/experimental/tinyfish/test_tinyfish.py test/beta/tools/search/test_tinyfish.py
uv run --extra tinyfish --extra tavily --extra exa --extra ddgs --extra perplexity pytest test/beta/tools/search test/tools/experimental/tinyfish/test_tinyfish.py -qResult: |
|
Thanks for the TinyFish search + fetch additions. The Merge conflict to resolve PR #2859 also modified TinyFishTool vs TinyFishSearchToolkit — one API or two? The diff removes Minor nit
Otherwise the toolkit structure is clean, the test suite uses |
marklysze
left a comment
There was a problem hiding this comment.
Tests are now thorough and the implementation is clean. Approving.
Code quality: good. The factory method pattern (search()/fetch()) cleanly separates per-tool params from client params. try/finally await client.close() is correct async resource management. resolve_variable for context injection follows the established beta pattern.
One remaining action for the contributor: resolve the merge conflict on website/docs/beta/tools/common_toolkits.mdx with PR #2859 before this can be merged. The conflict is straightforward — both PRs add independent sections, so a rebase should resolve it cleanly.
marklysze
left a comment
There was a problem hiding this comment.
Good contribution — the structure follows existing toolkit patterns well (lazy imports, missing_optional_dependency guard, try/finally close, resolve_variable for Context-bound defaults).
One security nit before this lands:
URL scheme validation in tinyfish_fetch
tinyfish_fetch accepts a list[str] of URLs from the LLM and passes them straight to AsyncTinyFish. An adversarial prompt or a confused agent could pass file:///etc/passwd, javascript:..., or data: URLs. Even if the TinyFish SDK rejects them server-side, we should validate at our layer.
Pattern from the just-merged Crawl4AIToolkit fix:
from urllib.parse import urlparse
_SAFE_SCHEMES = {"http", "https"}
def _safe_url(url: str) -> bool:
return urlparse(url).scheme.lower() in _SAFE_SCHEMESThen in tinyfish_fetch, before calling the client:
invalid = [u for u in urls if not _safe_url(u)]
if invalid:
return ToolResult({"error": f"Only http/https URLs are supported; rejected: {invalid}"})Otherwise LGTM — structure is clean, docs table is helpful, and the try/finally resource handling is correct.
marklysze
left a comment
There was a problem hiding this comment.
Good contribution — the structure follows existing toolkit patterns well (lazy imports, missing_optional_dependency guard, try/finally close, resolve_variable for Context-bound defaults).
One security nit before this lands:
URL scheme validation in tinyfish_fetch
tinyfish_fetch accepts a list[str] of URLs from the LLM and passes them straight to AsyncTinyFish. An adversarial prompt or a confused agent could pass file:///etc/passwd, javascript:..., or data: URLs. Even if the TinyFish SDK rejects them server-side, we should validate at our layer (same fix was applied to Crawl4AIToolkit in #2860).
Pattern used in #2860:
from urllib.parse import urlparse
_SAFE_SCHEMES = {"http", "https"}
def _safe_url(url: str) -> bool:
return urlparse(url).scheme.lower() in _SAFE_SCHEMESThen in tinyfish_fetch, filter before calling the client:
invalid = [u for u in urls if not _safe_url(u)]
if invalid:
return ToolResult({"error": f"Only http/https URLs are supported; rejected: {invalid}"})Otherwise LGTM — structure is clean, docs table is helpful, try/finally resource handling is correct, and the split between beta TinyFishSearchToolkit and experimental TinyFishSearchTool/FetchTool matches the existing AG2 dual-path pattern.
|
Good contribution — the structure follows existing toolkit patterns well (lazy imports, One security nit before this lands: URL scheme validation in
Pattern used in #2860: from urllib.parse import urlparse
_SAFE_SCHEMES = {"http", "https"}
def _safe_url(url: str) -> bool:
return urlparse(url).scheme.lower() in _SAFE_SCHEMESThen in invalid = [u for u in urls if not _safe_url(u)]
if invalid:
return ToolResult({"error": f"Only http/https URLs are supported; rejected: {invalid}"})Otherwise LGTM — structure is clean, docs table is helpful, |
Codecov Report✅ All modified and coverable lines are covered by tests.
... and 389 files with indirect coverage changes 🚀 New features to boost your workflow:
|
|
Addressed the maintainer follow-ups:
Validation: git diff --check
uv run ruff format --check autogen/beta/tools/search/tinyfish.py autogen/tools/experimental/tinyfish/tinyfish_tool.py test/beta/tools/search/test_tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py
uv run ruff check autogen/beta/tools/search/tinyfish.py autogen/tools/experimental/tinyfish/tinyfish_tool.py test/beta/tools/search/test_tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py
uv run --extra tinyfish --extra tavily --extra exa --extra ddgs --extra perplexity pytest test/beta/tools/search test/tools/experimental/tinyfish/test_tinyfish.py -qResult: |
|
Added the TinyFish integration marker follow-up. What changed:
Validation: git diff --check
uv run ruff check autogen/beta/tools/search/tinyfish.py autogen/tools/experimental/tinyfish/tinyfish_tool.py test/beta/tools/search/test_tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py
uv run --extra tinyfish pytest test/beta/tools/search/test_tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py -qResult: |
|
@pranavjana thanks again for creating this. We've recently compiled a Contribution Policy for beta and we'd like to put this in as an extension as we move towards V1.0. Can you please move the beta code, test, and documentation so that when we get to V1.0 we can move it accordingly. E.g. Code to: E.g. Documentation to: E.g. Tests to: |
|
Addressed the beta extension policy request. What changed:
Validation: git diff --check
uv run ruff format autogen/beta/extensions/tools/search/tinyfish.py autogen/beta/extensions/tools/__init__.py autogen/beta/extensions/tools/search/__init__.py autogen/beta/tools/__init__.py autogen/beta/tools/search/__init__.py test/beta/extensions/tools/conftest.py test/beta/extensions/tools/search/test_tinyfish.py test/beta/extensions/tools/__init__.py test/beta/extensions/tools/search/__init__.py
uv run ruff check autogen/beta/extensions/tools/search/tinyfish.py autogen/beta/extensions/tools/__init__.py autogen/beta/extensions/tools/search/__init__.py autogen/beta/tools/__init__.py autogen/beta/tools/search/__init__.py test/beta/extensions/tools/conftest.py test/beta/extensions/tools/search/test_tinyfish.py test/beta/extensions/tools/__init__.py test/beta/extensions/tools/search/__init__.py
uv run --extra tinyfish --extra tavily --extra exa --extra ddgs --extra perplexity pytest test/beta/tools/search test/beta/extensions/tools/search test/tools/experimental/tinyfish/test_tinyfish.py -qResult: |
Summary
This PR expands the TinyFish integration so AG2 agents can use all three TinyFish web APIs that are useful in agent workflows:
TinyFishToolfor goal-directed Agent API automation on a target URLTinyFishSearchToolfor ranked web search results in the existingautogen.tools.experimentalAPITinyFishFetchToolfor browser-rendered content extraction from known URLs in the existingautogen.tools.experimentalAPITinyFishSearchToolkitfor beta agents, exposing bothtinyfish_searchandtinyfish_fetchThe main use case is a research workflow where an agent first discovers candidate pages with Search, then reads selected pages with Fetch, and can still use the existing goal-directed TinyFish Agent API when it needs TinyFish to operate a website from a natural-language goal.
Changes
autogen.beta.tools.search.TinyFishSearchToolkitwith structured search and fetch result dataclasses.autogen.beta.tools.searchandautogen.beta.toolswith the existing optional-dependency fallback pattern.TinyFishSearchToolandTinyFishFetchTooltoautogen.tools.experimental.tinyfish, alongside the existingTinyFishTool.autogen.tools.experimentalandautogen.tools.experimental.tinyfish.tinyfishto the docs/search optional extras groups.Validation
git diff --checkuv run ruff format --check autogen/tools/experimental/tinyfish/tinyfish_tool.py autogen/beta/tools/search/tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py test/beta/tools/search/test_tinyfish.pyuv run ruff check autogen/tools/experimental/tinyfish/tinyfish_tool.py autogen/beta/tools/search/tinyfish.py test/tools/experimental/tinyfish/test_tinyfish.py test/beta/tools/search/test_tinyfish.pyuv run --extra tinyfish --extra tavily --extra exa --extra ddgs --extra perplexity pytest test/beta/tools/search test/tools/experimental/tinyfish/test_tinyfish.py -qThe pytest run passed with
99 passed.Documentation
Docs were updated in:
website/docs/user-guide/reference-tools/tinyfish.mdxwebsite/docs/user-guide/reference-tools/index.mdxwebsite/docs/beta/tools/common_toolkits.mdxI attempted the local docs build with the docs extra, but this environment is missing Quarto, so notebook metadata generation fails before the docs site can complete building.