Skip to content

docs(hgraph): document brute_force_threshold and add example#2124

Open
wxyucs wants to merge 1 commit into
mainfrom
docs/hgraph-brute-force-threshold
Open

docs(hgraph): document brute_force_threshold and add example#2124
wxyucs wants to merge 1 commit into
mainfrom
docs/hgraph-brute-force-threshold

Conversation

@wxyucs
Copy link
Copy Markdown
Collaborator

@wxyucs wxyucs commented May 29, 2026

Change Type

  • Bug fix
  • New feature
  • Improvement/Refactor
  • Documentation
  • CI/Build/Infra

Linked Issue

Closes: #2123

(Tracks the docs/example gap left by the now-merged feature issue #1631,
which only covered the code change for brute_force_threshold and did not
include user-facing documentation or an example.)

What Changed

The HGraph search-time parameter brute_force_threshold already ships in
code (src/algorithm/hgraph/hgraph_parameter.{h,cpp},
src/algorithm/hgraph/hgraph_search.cpp, tests at
tests/test_hgraph.cpp:2480-2619) but is not mentioned in any user-facing
doc and has no example. This PR closes that discoverability gap without
touching any code path.

  • docs/docs/{en,zh}/src/indexes/hgraph.md — new row in the Search
    parameters
    table plus a dedicated subsection explaining trigger
    condition, storage-preference order, reorder-skip behavior, the
    intentional iterator-search opt-out, and how to pick a value.
  • docs/hgraph.md — new brute_force_threshold reference entry and an
    extra JSON snippet.
  • docs/docs/{en,zh}/src/resources/index_parameters.md — mention the key
    in the HGraph search-params snippet.
  • docs/docs/{en,zh}/src/advanced/filtered_search.md — cross-link from the
    Performance Notes section so users hitting "very selective filter"
    guidance see the new option alongside the existing "raise ef_search"
    advice.
  • New example examples/cpp/322_feature_hgraph_brute_force_threshold.cpp
    with three runs (baseline graph search, threshold-below-ratio →
    fallback NOT triggered, threshold-above-ratio → fallback triggered) plus
    a hand-rolled exact reference. Registered in
    examples/cpp/CMakeLists.txt; listed in examples/cpp/README.md.

Test Evidence

  • make fmt
  • make lint
  • make test
  • make cov, run tests, and collect coverage
  • Other (describe below)

Test details:

Docs-only + new example. The existing functional tests at
tests/test_hgraph.cpp:2480-2619 already cover the parameter behaviour:
- "(PR) HGraph brute_force_threshold" — verifies the fallback's exact-scan
  results match an independent hand-rolled exhaustive scan.
- "(PR) HGraph brute_force_threshold default is no-op" — verifies that
  threshold=0.0 leaves graph-search results bit-identical.
The new example exercises the same API surface end-to-end.

Local environment is macOS, where CI tooling (clang-format-15 /
clang-tidy-15) is not available; fmt/lint/test will be exercised by the
project CI checks on this PR.

Compatibility Impact

  • API/ABI compatibility: none — docs and a new standalone example only.
  • Behavior changes: none — no code modified; the parameter, its default
    (0.0, disabled), and its dispatch logic are unchanged.

Performance and Concurrency Impact

  • Performance impact: none.
  • Concurrency/thread-safety impact: none.

Documentation Impact

  • No docs update needed
  • Updated docs:
    • README.md
    • DEVELOPMENT.md
    • CONTRIBUTING.md
    • Other: docs/docs/{en,zh}/src/indexes/hgraph.md,
      docs/docs/{en,zh}/src/resources/index_parameters.md,
      docs/docs/{en,zh}/src/advanced/filtered_search.md,
      docs/hgraph.md, examples/cpp/README.md,
      examples/cpp/322_feature_hgraph_brute_force_threshold.cpp,
      examples/cpp/CMakeLists.txt

Risk and Rollback

  • Risk level: low (docs + new opt-in example; no existing binary or build
    target changed besides adding a new add_executable).
  • Rollback plan: revert the single commit.

Checklist

  • I have linked the relevant issue (Closes: #2123)
  • I have added/updated tests for new behavior or bug fixes
    (N/A — pre-existing functional tests cover the parameter; the new
    example serves as additional manual verification)
  • I have considered API compatibility impact
  • I have updated docs if behavior/workflow changed
  • My commit messages follow project conventions (Conventional Commits)

The HGraph search-time parameter `brute_force_threshold` (added in #1631 to
let HGraph skip the graph walk and run an exact scan when the active
filter's `ValidRatio()` is small) ships in code but was never documented
or demonstrated, leaving the feature undiscoverable to users.

This commit closes that gap without changing any code paths:

- Add a parameter row + dedicated subsection to the EN and ZH HGraph index
  pages (`docs/docs/{en,zh}/src/indexes/hgraph.md`) covering trigger
  condition, storage-preference order, reorder-skip behavior, the
  iterator-search opt-out, and value-picking guidance.
- Add a `brute_force_threshold` entry plus an extra JSON snippet to the
  top-level `docs/hgraph.md` reference.
- Mention the new key in the HGraph search-params snippet of
  `docs/docs/{en,zh}/src/resources/index_parameters.md`.
- Cross-link from the EN/ZH `advanced/filtered_search.md` Performance Notes
  section, since that is where users hitting "very selective filter"
  guidance currently land.
- Add `examples/cpp/322_feature_hgraph_brute_force_threshold.cpp`, register
  it in `examples/cpp/CMakeLists.txt`, and list it in
  `examples/cpp/README.md`. The example builds a 10k-vector HGraph,
  defines a `Filter` with `ValidRatio() = 0.02`, and shows three runs:
  baseline graph search, threshold-below-ratio (fallback NOT triggered),
  threshold-above-ratio (fallback triggered). A hand-rolled exhaustive
  reference is printed so users can verify the exact-scan branch matches.

No source code changes; the parameter, its default (`0.0`, disabled),
dispatch logic, and tests already exist
(`src/algorithm/hgraph/hgraph_parameter.{h,cpp}`,
`src/algorithm/hgraph/hgraph_search.cpp`, `tests/test_hgraph.cpp`).

Closes: #2123

Signed-off-by: Xiangyu Wang <wxy407827@antgroup.com>
Assisted-by: OpenCode:claude-opus-4.7
Copilot AI review requested due to automatic review settings May 29, 2026 07:00
@wxyucs wxyucs added kind/documentation Improvements or additions to documentation version/1.0 labels May 29, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 29, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.
  • label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.
  • label~=^version/

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation and a runnable C++ example for the new HGraph search-time parameter brute_force_threshold, which automatically triggers an exact brute-force scan when filter selectivity is high. The review comments identify a few issues in the newly added files: missing standard library headers (<algorithm>, <memory>, and <utility>) and an unchecked .value() call on CreateIndex in the C++ example, as well as a broken relative link in docs/hgraph.md.

Comment on lines +46 to +48
#include <iostream>
#include <random>
#include <vector>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example uses std::sort (which requires <algorithm>), std::shared_ptr/std::make_shared (which requires <memory>), and std::pair (which requires <utility>). These headers are not explicitly included, which can lead to compilation failures on compilers/standard libraries that do not transitively include them. Please include them explicitly.

#include <algorithm>
#include <iostream>
#include <memory>
#include <random>
#include <utility>
#include <vector>

"ef_construction": 200
}
})";
auto index = vsag::Factory::CreateIndex("hgraph", build_params).value();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return value of vsag::Factory::CreateIndex is a tl::expected (or similar), but .value() is called directly without checking if the creation succeeded. If index creation fails (e.g., due to invalid parameters), this will cause an unhandled exception or crash. It is safer and more consistent with the rest of the example to check has_value() and handle the error gracefully.

Suggested change
auto index = vsag::Factory::CreateIndex("hgraph", build_params).value();
auto index_res = vsag::Factory::CreateIndex("hgraph", build_params);
if (not index_res.has_value()) {
std::cerr << "Failed to create index: " << index_res.error().message << std::endl;
return -1;
}
auto index = index_res.value();

Comment thread docs/hgraph.md
- **Optional Values**: any float in `[0.0, 1.0]`
- **Default Value**: 0.0 (disabled — preserves legacy behavior)
- **Applies to**: `KnnSearch` (non-iterator overload, also used by `SearchWithRequest`) and `RangeSearch`. The iterator-style `KnnSearch` does not use this parameter.
- **Note**: The decision relies on `Filter::ValidRatio()` returning a meaningful selectivity estimate; see [filtered search](docs/docs/en/src/advanced/filtered_search.md). The brute-force scan visits every indexed id once to call `CheckValid`, so its cost is roughly `O(N × dim)` regardless of selectivity. A runnable example is [`322_feature_hgraph_brute_force_threshold.cpp`](https://github.com/antgroup/vsag/blob/main/examples/cpp/322_feature_hgraph_brute_force_threshold.cpp).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The relative link to filtered_search.md is broken. Since docs/hgraph.md is located in the docs/ directory, the path docs/docs/en/src/advanced/filtered_search.md resolves to docs/docs/docs/en/... which results in a 404 error. It should be corrected to docs/en/src/advanced/filtered_search.md.

Suggested change
- **Note**: The decision relies on `Filter::ValidRatio()` returning a meaningful selectivity estimate; see [filtered search](docs/docs/en/src/advanced/filtered_search.md). The brute-force scan visits every indexed id once to call `CheckValid`, so its cost is roughly `O(N × dim)` regardless of selectivity. A runnable example is [`322_feature_hgraph_brute_force_threshold.cpp`](https://github.com/antgroup/vsag/blob/main/examples/cpp/322_feature_hgraph_brute_force_threshold.cpp).
- **Note**: The decision relies on `Filter::ValidRatio()` returning a meaningful selectivity estimate; see [filtered search](docs/en/src/advanced/filtered_search.md). The brute-force scan visits every indexed id once to call `CheckValid`, so its cost is roughly `O(N dim)` regardless of selectivity. A runnable example is [`322_feature_hgraph_brute_force_threshold.cpp`](https://github.com/antgroup/vsag/blob/main/examples/cpp/322_feature_hgraph_brute_force_threshold.cpp).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Documentation-only PR that closes the discoverability gap for the already-shipped HGraph search-time parameter brute_force_threshold. It adds an English/Chinese reference, cross-links from the filtered-search guide, updates the top-level HGraph doc, and ships a runnable C++ example that contrasts graph-search vs. fallback-triggered vs. fallback-not-triggered modes against a hand-rolled exact reference.

Changes:

  • New parameter row + dedicated subsection in docs/docs/{en,zh}/src/indexes/hgraph.md and docs/hgraph.md, plus mentions in index_parameters.md and a cross-link from filtered_search.md (en/zh).
  • New example examples/cpp/322_feature_hgraph_brute_force_threshold.cpp registered in CMakeLists.txt and README.md.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
examples/cpp/README.md Lists the new 322 example.
examples/cpp/CMakeLists.txt Registers the new example executable.
examples/cpp/322_feature_hgraph_brute_force_threshold.cpp New runnable example demonstrating the fallback.
docs/hgraph.md Adds brute_force_threshold reference entry and extra JSON snippet.
docs/docs/en/src/indexes/hgraph.md Adds parameter row and dedicated subsection (English).
docs/docs/zh/src/indexes/hgraph.md Adds parameter row and dedicated subsection (Chinese).
docs/docs/en/src/resources/index_parameters.md Mentions the new key in HGraph search-params snippet (English).
docs/docs/zh/src/resources/index_parameters.md Mentions the new key in HGraph search-params snippet (Chinese).
docs/docs/en/src/advanced/filtered_search.md Cross-links to the new option from Performance Notes (English).
docs/docs/zh/src/advanced/filtered_search.md Cross-links to the new option from Performance Notes (Chinese).

Comment on lines +44 to +48
#include <vsag/vsag.h>

#include <iostream>
#include <random>
#include <vector>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[docs](hgraph): document brute_force_threshold search parameter and add example

2 participants