Skip to content

feat: add bocha web search component#13322

Open
NiTingKY wants to merge 3 commits into
langflow-ai:mainfrom
NiTingKY:add-bocha-web-search
Open

feat: add bocha web search component#13322
NiTingKY wants to merge 3 commits into
langflow-ai:mainfrom
NiTingKY:add-bocha-web-search

Conversation

@NiTingKY
Copy link
Copy Markdown

@NiTingKY NiTingKY commented May 25, 2026

just like other search engine company, I also introduce the bocha Web_search API into the project.

Summary

Added Bocha Web Search integration as a new Langflow component.

Changes

  • Added Bocha component package
  • Added Bocha Web Search component
  • Added documentation page for the Bocha bundle

Testing

  • Tested locally with Langflow
  • Verified the component appears in the UI

Summary by CodeRabbit

  • New Features

    • Added Bocha Web Search component enabling web searches with configurable API key, query parameters, result summaries, freshness filtering, and result count limits.
  • Documentation

    • Added comprehensive documentation for the Bocha Web Search component with parameter descriptions and usage examples.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Walkthrough

This PR adds a new Bocha web search component to the Langflow component system. It includes the component implementation with API integration, module registration for proper imports, a catalog entry with embedded schema, and user documentation with sidebar navigation integration.

Changes

Bocha Web Search Component Integration

Layer / File(s) Summary
BochaSearchComponent implementation
src/lfx/src/lfx/components/bocha/bocha_web_search.py
BochaSearchComponent posts queries to Bocha's /v1/web-search API, parses returned webPages into Data records (title, url, snippet, summary, site, date), caps results at 50, and handles errors (timeout, HTTP, request) by returning a single Data entry with error message. The fetch_content_dataframe method wraps results as a DataFrame.
Module registration and exports
src/lfx/src/lfx/components/__init__.py, src/lfx/src/lfx/components/bocha/__init__.py
bocha is registered in TYPE_CHECKING, added to _dynamic_imports as a module-level entry, included in __all__ for discovery, and re-exported from the bocha package's __init__.py as BochaSearchComponent.
Component asset catalog
src/lfx/src/lfx/_assets/component_index.json
A new bochaBochaWebSearch entry is added with input schema (api_key, query, count, freshness, summary), output dataframe config, and an embedded implementation. Catalog metadata counters and checksum are updated.
User documentation and navigation
docs/docs/Components/bundles-bocha.mdx, docs/sidebars.js
A new MDX page documents the Bocha bundle and its web search component, listing inputs (api_key, query, count, freshness, summary) and output columns (title, url, snippet, summary, site_name, site_icon, date_published), with links to related web search bundles. The page is added to the sidebar under Components → Core components → Bundles.

🎯 3 (Moderate) | ⏱️ ~20 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error PR adds BochaSearchComponent (112 lines) without corresponding test files. Repository has established test directories for components with no Bocha tests. Add unit tests for API calls, error handling, and response parsing; add integration tests for end-to-end functionality with Bocha API.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage ⚠️ Warning No tests found for the new BochaSearchComponent; component makes HTTP API calls and requires test coverage per project standards. Add unit tests covering component metadata, mock httpx calls for success/error scenarios, JSON parsing, Data object creation, and DataFrame output.
Test File Naming And Structure ⚠️ Warning No test files exist for the Bocha component. Expected test directory src/lfx/tests/unit/components/bocha/ with test_*.py files is missing from PR. Add test file src/lfx/tests/unit/components/bocha/test_bocha_web_search.py with pytest tests covering success cases, error handling (timeout, HTTP errors, JSON parse errors), and edge cases.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add bocha web search component' directly and clearly describes the main change: adding a new Bocha Web Search component to the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Excessive Mock Usage Warning ✅ Passed No test files are present in this PR; therefore, there is no mock usage to review. The check is not applicable to PRs without test files.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the enhancement New feature or request label May 25, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 25, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 25, 2026
@NiTingKY
Copy link
Copy Markdown
Author

The remaining failing check is Update Component Index, but it fails before running the index update with:

A branch or tag with the name 'add-bocha-web-search' could not be found

This branch exists in my fork (NiTingKY/langflow:add-bocha-web-search), not in langflow-ai/langflow. The PR already includes src/lfx/src/lfx/_assets/component_index.json and src/lfx/src/lfx/components/__init__.py.

@NiTingKY NiTingKY marked this pull request as ready for review May 25, 2026 09:45
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 25, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/docs/Components/bundles-bocha.mdx`:
- Line 29: Update the docs table row for the count parameter (the "count" table
cell) to mention the hard cap of 50 in addition to the default of 10—e.g.,
change the description to "Input parameter. The maximum number of search results
to return. Default: `10`. Maximum: `50`." so users know the results are capped
at 50.

In `@src/lfx/src/lfx/components/bocha/bocha_web_search.py`:
- Line 66: The payload currently uses "count": min(int(self.count), 50) which
only caps the upper bound; change it to clamp to a valid lower bound as well
(e.g., "count": max(1, min(int(self.count), 50))) so zero or negative values
cannot be sent; update the construction in bocha_web_search.py (the code that
sets the "count" field) to use this two-sided clamp and ensure int(self.count)
is still applied before clamping.
- Around line 74-75: The code calls response.json() and then accesses webPages
but doesn't handle JSON parsing errors; wrap the response.json() call in a
try/except that catches json.decoder.JSONDecodeError, ValueError (and optionally
TypeError) around the call in the function in bocha_web_search.py, log or build
a clear message and return the structured Data(text=msg, data={"error": msg})
instead of letting the exception escape; keep the rest of the httpx exception
handling intact and continue to use the same web_pages extraction
(result.get("data", {}).get("webPages", {}).get("value", [])) when parsing
succeeds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4f80bee-7da3-43e4-bcdb-1b18132cbeb8

📥 Commits

Reviewing files that changed from the base of the PR and between b0b5849 and 2379d1a.

📒 Files selected for processing (6)
  • docs/docs/Components/bundles-bocha.mdx
  • docs/sidebars.js
  • src/lfx/src/lfx/_assets/component_index.json
  • src/lfx/src/lfx/components/__init__.py
  • src/lfx/src/lfx/components/bocha/__init__.py
  • src/lfx/src/lfx/components/bocha/bocha_web_search.py

|------|------|-------------|
| api_key | SecretString | Input parameter. The API key for authenticating with Bocha. |
| query | String | Input parameter. The search query to send to Bocha. |
| count | Integer | Input parameter. The maximum number of search results to return. Default: `10`. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the maximum count limit.

The parameter table shows the default count is 10, but according to the review context, the result count is capped at 50. Users should be informed of this maximum limit to set appropriate expectations.

📝 Proposed fix to document the maximum limit
-| count | Integer | Input parameter. The maximum number of search results to return. Default: `10`. |
+| count | Integer | Input parameter. The maximum number of search results to return. Default: `10`. Maximum: `50`. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| count | Integer | Input parameter. The maximum number of search results to return. Default: `10`. |
| count | Integer | Input parameter. The maximum number of search results to return. Default: `10`. Maximum: `50`. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/docs/Components/bundles-bocha.mdx` at line 29, Update the docs table row
for the count parameter (the "count" table cell) to mention the hard cap of 50
in addition to the default of 10—e.g., change the description to "Input
parameter. The maximum number of search results to return. Default: `10`.
Maximum: `50`." so users know the results are capped at 50.

"query": self.query,
"summary": self.summary,
"freshness": self.freshness or "noLimit",
"count": min(int(self.count), 50),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clamp count to a valid lower bound before issuing the request.

Line 66 only enforces an upper cap. Zero/negative values can still be sent and cause avoidable API failures.

Suggested fix
-        payload = {
+        count = max(1, min(int(self.count), 50))
+        payload = {
             "query": self.query,
             "summary": self.summary,
             "freshness": self.freshness or "noLimit",
-            "count": min(int(self.count), 50),
+            "count": count,
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"count": min(int(self.count), 50),
count = max(1, min(int(self.count), 50))
payload = {
"query": self.query,
"summary": self.summary,
"freshness": self.freshness or "noLimit",
"count": count,
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lfx/src/lfx/components/bocha/bocha_web_search.py` at line 66, The payload
currently uses "count": min(int(self.count), 50) which only caps the upper
bound; change it to clamp to a valid lower bound as well (e.g., "count": max(1,
min(int(self.count), 50))) so zero or negative values cannot be sent; update the
construction in bocha_web_search.py (the code that sets the "count" field) to
use this two-sided clamp and ensure int(self.count) is still applied before
clamping.

Comment on lines +74 to +75
result = response.json()
web_pages = result.get("data", {}).get("webPages", {}).get("value", [])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does httpx.Response.json()raiseValueError/JSONDecodeError when the response body is not valid JSON?

💡 Result:

Yes, the httpx.Response.json method raises a json.decoder.JSONDecodeError if the response body is not valid JSON [1][2]. Because httpx uses the standard library's json module to parse the response body, any failure to decode the content as JSON will propagate the exception raised by json.loads [1][2]. This includes scenarios where the response body is empty, malformed, or contains content that does not conform to JSON syntax [1][3][2]. Additionally, depending on the response content and encoding, it is also possible for the method to raise a UnicodeDecodeError if the response body contains bytes that cannot be decoded into a string using the detected or default character encoding [1][4].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/lfx/src/lfx/components/bocha/bocha_web_search.py"

# Show the relevant sections around the reported line ranges
nl -ba "$FILE" | sed -n '50,130p'

Repository: langflow-ai/langflow

Length of output: 107


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/lfx/src/lfx/components/bocha/bocha_web_search.py"

# Print with line numbers (cat -n exists) and focus around the referenced areas
cat -n "$FILE" | sed -n '60,120p'

Repository: langflow-ai/langflow

Length of output: 2389


Handle invalid JSON responses so bocha_web_search fails gracefully
response.json() exceptions (e.g., json.decoder.JSONDecodeError / ValueError) aren’t caught by the current httpx exception handlers, so malformed/empty JSON will escape the component instead of returning the structured Data(text=msg, data={"error": msg}).

Suggested fix
         except httpx.TimeoutException:
             msg = "Bocha request timed out."
+        except ValueError as exc:
+            msg = f"Bocha response parse error: {exc}"
         except httpx.HTTPStatusError as exc:
             msg = f"Bocha HTTP error: {exc.response.status_code} - {exc.response.text}"
         except httpx.RequestError as exc:
             msg = f"Bocha request failed: {exc}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lfx/src/lfx/components/bocha/bocha_web_search.py` around lines 74 - 75,
The code calls response.json() and then accesses webPages but doesn't handle
JSON parsing errors; wrap the response.json() call in a try/except that catches
json.decoder.JSONDecodeError, ValueError (and optionally TypeError) around the
call in the function in bocha_web_search.py, log or build a clear message and
return the structured Data(text=msg, data={"error": msg}) instead of letting the
exception escape; keep the rest of the httpx exception handling intact and
continue to use the same web_pages extraction (result.get("data",
{}).get("webPages", {}).get("value", [])) when parsing succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant