Adapt [vector_async_index] add collection.flush_index()#209
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds BaseClient feature detection for a refresh capability, hides the Collection refresh attribute when unsupported, implements Collection.refresh_index() to call the DB index manager refresh procedure, and updates integration tests to call refresh before queries. ChangesIndex Refresh Operation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pyseekdb/client/collection.py`:
- Around line 596-604: Collection.flush currently calls a non-existent
procedure; update the implementation so it invokes OceanBase's
DBMS_VECTOR.REFRESH_INDEX with the required parameters instead of "CALL
dbms_index_manager.refresh();": build the CALL statement using the collection's
index name and table name (and optional vector column if available) and execute
it via self._client._execute (e.g., "CALL DBMS_VECTOR.REFRESH_INDEX(idx_name,
table_name, idx_vector_col, refresh_threshold, refresh_type);"), or if the
intent was to defer parameters, raise/validate and document required args in
Collection.flush and accept them as parameters to the method before calling
self._client._execute.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 03433250-9858-4b86-874e-547d26a2cc43
📒 Files selected for processing (1)
src/pyseekdb/client/collection.py
| def flush(self) -> None: | ||
| """ | ||
| Flush async vector index build tasks. | ||
|
|
||
| This executes ``CALL dbms_index_manager.refresh();`` and returns only | ||
| after the database completes the refresh procedure. | ||
| """ | ||
| self._client._execute("CALL dbms_index_manager.refresh();") | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
OceanBase dbms_index_manager.refresh procedure signature and scope: does it refresh all async vector indexes globally, or can it be scoped to a specific table/index? Please provide official documentation links and examples.
💡 Result:
No evidence found for a dbms_index_manager.refresh procedure in OceanBase documentation across official OceanBase Database (en.oceanbase.com) and seekdb (oceanbase.ai) sites [1][2][3][4][5][6]. Searches for the exact package and procedure returned no matching results [1-30]. OceanBase (seekdb) uses DBMS_VECTOR.REFRESH_INDEX for incremental refresh of async vector indexes. It is scoped to a specific index, table, and vector column, not global [1]. Procedure signature [1]: PROCEDURE refresh_index( IN idx_name VARCHAR(65535), IN table_name VARCHAR(65535), IN idx_vector_col VARCHAR(65535) DEFAULT NULL, IN refresh_threshold INT DEFAULT 10000, IN refresh_type VARCHAR(65535) DEFAULT NULL); Example [1]: CALL DBMS_VECTOR.REFRESH_INDEX('idx1', 'vector_index_test', 'c2', 1, 'FAST'); Official docs: https://www.oceanbase.ai/docs/zh-CN/refresh-index [1], https://www.oceanbase.ai/docs/vector-index-monitor-and-maintain [2]. dbms_index_manager may be a misremembered or non-existent package; standard index management uses DBMS_STATS or ALTER INDEX [7]. Vector indexes are async by design with automatic background refresh, manual via DBMS_VECTOR [2].
Citations:
- 1: https://www.oceanbase.ai/docs/zh-CN/refresh-index
- 2: https://www.oceanbase.ai/docs/vector-index-monitor-and-maintain
- 3: https://en.oceanbase.com/docs
- 4: https://en.oceanbase.com/docs/oceanbase-database
- 5: https://en.oceanbase.com/docs/common-oceanbase-database-10000000001229686
- 6: https://en.oceanbase.com/docs/common-oceanbase-database-10000000001230686
- 7: https://en.oceanbase.com/docs/common-oceanbase-database-10000000001232087
🏁 Script executed:
# Search for any usage of dbms_index_manager in the codebase
rg "dbms_index_manager" --type py --type sql
# Search for any DBMS_VECTOR usages that might indicate the correct pattern
rg "DBMS_VECTOR" --type py --type sql
# Check for test files related to flush or index refresh
fd "test.*collection" --type f | head -20
# Check the client._execute calls to understand the pattern
rg "_execute\(" src/pyseekdb/client/collection.py -A 2 -B 2Repository: oceanbase/pyseekdb
Length of output: 1030
🏁 Script executed:
# Search for any tests of flush method
rg "\.flush\(" tests/ --type py -B 3 -A 3
# Check git log for recent changes to this file
cd src/pyseekdb/client && git log --oneline collection.py | head -20
# Check if there are any comments or docstrings that mention refresh or flush
rg "refresh|flush" src/pyseekdb/client/collection.py --context 5Repository: oceanbase/pyseekdb
Length of output: 185
🏁 Script executed:
# Search for flush tests anywhere in the repo
rg "flush" tests/ --type py -B 2 -A 2
# Get commit details
git show c561754 --stat
# Also check if there's any OceanBase-specific documentation in the repo
find . -type f -name "*.md" -o -name "*.rst" | xargs rg "refresh|flush|dbms_index_manager" 2>/dev/null | head -20Repository: oceanbase/pyseekdb
Length of output: 8781
🏁 Script executed:
# Search more broadly in tests for flush
rg "flush" . --type py
# Check the collection.py file around the flush method for context
sed -n '590,620p' src/pyseekdb/client/collection.py
# Also check if there's any comment or documentation about refresh behavior
rg "refresh" . --type py --type md -B 2 -A 2 | head -50Repository: oceanbase/pyseekdb
Length of output: 1683
Collection.flush() calls a non-existent database procedure and will fail at runtime
The procedure dbms_index_manager.refresh() does not exist in OceanBase. The correct procedure for refreshing async vector indexes is DBMS_VECTOR.REFRESH_INDEX(), which requires specific parameters: index name, table name, and optional vector column name.
Per OceanBase documentation (https://www.oceanbase.ai/docs/zh-CN/refresh-index), the signature is:
CALL DBMS_VECTOR.REFRESH_INDEX(idx_name, table_name, idx_vector_col DEFAULT NULL, refresh_threshold INT DEFAULT 10000, refresh_type VARCHAR DEFAULT NULL);
Update the implementation to use the correct procedure with appropriate parameters, or clarify the intent if this is meant to call a custom stored procedure.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/pyseekdb/client/collection.py` around lines 596 - 604, Collection.flush
currently calls a non-existent procedure; update the implementation so it
invokes OceanBase's DBMS_VECTOR.REFRESH_INDEX with the required parameters
instead of "CALL dbms_index_manager.refresh();": build the CALL statement using
the collection's index name and table name (and optional vector column if
available) and execute it via self._client._execute (e.g., "CALL
DBMS_VECTOR.REFRESH_INDEX(idx_name, table_name, idx_vector_col,
refresh_threshold, refresh_type);"), or if the intent was to defer parameters,
raise/validate and document required args in Collection.flush and accept them as
parameters to the method before calling self._client._execute.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/pyseekdb/client/collection.py (1)
620-634:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift
refresh_index()is bound to a likely invalid refresh procedure/signatureLine 633 executes
CALL dbms_index_manager.refresh();with no index/table context. This is a high-risk runtime failure if the backend expects vector-index-scoped refresh parameters.What is the official manual refresh procedure/signature for async vector indexes in OceanBase/SeekDB (e.g., package name, required arguments, and examples)? Is `CALL dbms_index_manager.refresh();` valid?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pyseekdb/client/collection.py` around lines 620 - 634, The call in refresh_index() uses a bare "CALL dbms_index_manager.refresh();" which may be invalid if the backend requires an index or table context; update refresh_index (and related helper _is_refresh_available) to construct and execute the correct procedure signature per the backend docs: either call the refresh with the index/table identifier (e.g., "CALL dbms_index_manager.refresh('<index_name>');") or iterate over known async indexes and call refresh for each, and validate the response; if the exact signature is unknown, replace the direct _client._execute call with a helper that reads the documented procedure name/parameters from configuration or raises a clear NotImplementedError indicating the required index parameter so callers supply it. Ensure you reference refresh_index, _is_refresh_available, and _client._execute when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pyseekdb/client/collection.py`:
- Around line 118-132: The attribute gating in Collection.__getattribute__ and
Collection.__dir__ checks for "refresh" but the real public API is
"refresh_index", causing wrong introspection and AttributeError messages; update
both checks to look for "refresh_index" (and the error message raised in
__getattribute__ to mention 'refresh_index') and keep using the existing
_is_refresh_available() helper to decide removal/raising so introspection,
exception text, and runtime behavior are consistent with the actual method name.
In `@tests/integration_tests/test_collection_hybrid_search_source_inference.py`:
- Around line 123-125: The assertion assumes embeddings_only["embeddings"][0] is
iterable but a backend may return row-level None; update the checks in
test_collection_hybrid_search_source_inference (the assertions that iterate
embeddings_only["embeddings"][0] around the current lines) to first guard
against None per row (e.g., iterate over embeddings_only["embeddings"] and for
each row check if the row is None then accept it, otherwise assert it is a list
of length dimension), and apply the same guard/approach to the other similar
block around lines 131-134 so TypeError is avoided when a row is None.
In `@tests/integration_tests/test_collection_query.py`:
- Line 113: The unconditional call to collection.refresh_index() should be
guarded because refresh support varies by DB type/version; modify the test to
check the collection or client capability before invoking refresh_index (e.g.,
use a boolean capability check like collection.supports_refresh() or
client.supports_feature("refresh") or a runtime gate such as hasattr(collection,
"refresh_index") and a version check) and only call collection.refresh_index()
when the capability is present so the parameterized test won't fail for
unsupported modes.
---
Duplicate comments:
In `@src/pyseekdb/client/collection.py`:
- Around line 620-634: The call in refresh_index() uses a bare "CALL
dbms_index_manager.refresh();" which may be invalid if the backend requires an
index or table context; update refresh_index (and related helper
_is_refresh_available) to construct and execute the correct procedure signature
per the backend docs: either call the refresh with the index/table identifier
(e.g., "CALL dbms_index_manager.refresh('<index_name>');") or iterate over known
async indexes and call refresh for each, and validate the response; if the exact
signature is unknown, replace the direct _client._execute call with a helper
that reads the documented procedure name/parameters from configuration or raises
a clear NotImplementedError indicating the required index parameter so callers
supply it. Ensure you reference refresh_index, _is_refresh_available, and
_client._execute when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 054e3cc6-e23b-4b19-8dcb-ba188f892197
📒 Files selected for processing (7)
src/pyseekdb/client/client_base.pysrc/pyseekdb/client/collection.pytests/integration_tests/test_collection_hybrid_search_source_inference.pytests/integration_tests/test_collection_query.pytests/integration_tests/test_collection_v1_compatibility.pytests/integration_tests/test_offical_case.pytests/integration_tests/test_sparse_vector_index.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pyseekdb/client/collection.py`:
- Around line 133-140: The capability check in _is_refresh_available currently
returns True when the _refresh_enabled attribute is missing, exposing
refresh_index on clients that don't support it; change the behavior so that if
getattr(self._client, "_refresh_enabled", None) yields None the method returns
False (i.e., unknown => unavailable), and preserve the existing callable
handling (call and bool cast inside try/except) to return False on exceptions;
update _is_refresh_available to only return True when a callable
_refresh_enabled exists and evaluates truthy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7cdbbcea-b978-4a44-9ce5-860802d622bf
📒 Files selected for processing (2)
src/pyseekdb/client/collection.pytests/integration_tests/test_collection_hybrid_search_source_inference.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/integration_tests/test_collection_hybrid_search_source_inference.py
| def __repr__(self) -> str: | ||
| return f"Collection(name='{self._name}', dimension={self._dimension}, client={self._client.mode})" | ||
|
|
||
| def __getattribute__(self, name: str): |
There was a problem hiding this comment.
拐的弯太大了吧。实在不行你把refresh_index函数代理给client来实现好了,比如:
class Collection:
def refresh_index(self):
self._client.refresh_index()
class Client:
def _refresh_index_enabled(self) -> bool:
return version >= Version("1.3.0")
def refresh_index():
if not _refresh_index_enabled():
return
try:
self._execute('call dbms_index_manager.refresh()')
except:
# do something.There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/pyseekdb/client/collection.py (1)
621-634:⚠️ Potential issue | 🔴 Critical
refresh_index()still calls a non-existent database procedure.This issue was previously flagged: Line 634 calls
dbms_index_manager.refresh(), which does not exist in OceanBase. According to the earlier review with web search verification, the correct procedure isDBMS_VECTOR.REFRESH_INDEX(idx_name, table_name, idx_vector_col, refresh_threshold, refresh_type).The method will fail at runtime when invoked on databases that support refresh.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pyseekdb/client/collection.py` around lines 621 - 634, The refresh_index implementation currently invokes a non-existent procedure; update refresh_index (and keep the availability guard _is_refresh_available) to call the correct OceanBase procedure via self._client._execute using the DBMS_VECTOR signature: "CALL DBMS_VECTOR.REFRESH_INDEX(idx_name, table_name, idx_vector_col, refresh_threshold, refresh_type)". Use the collection's actual index/table/column identifiers (fetch from this collection's metadata or attributes) to populate idx_name, table_name and idx_vector_col, and choose sensible defaults or parameters for refresh_threshold and refresh_type (or expose them as method args) before executing the CALL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/pyseekdb/client/collection.py`:
- Around line 621-634: The refresh_index implementation currently invokes a
non-existent procedure; update refresh_index (and keep the availability guard
_is_refresh_available) to call the correct OceanBase procedure via
self._client._execute using the DBMS_VECTOR signature: "CALL
DBMS_VECTOR.REFRESH_INDEX(idx_name, table_name, idx_vector_col,
refresh_threshold, refresh_type)". Use the collection's actual
index/table/column identifiers (fetch from this collection's metadata or
attributes) to populate idx_name, table_name and idx_vector_col, and choose
sensible defaults or parameters for refresh_threshold and refresh_type (or
expose them as method args) before executing the CALL.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2984fea4-3fc2-4b4d-8779-48851c6801b4
📒 Files selected for processing (5)
src/pyseekdb/client/collection.pytests/integration_tests/test_collection_query.pytests/integration_tests/test_collection_v1_compatibility.pytests/integration_tests/test_offical_case.pytests/integration_tests/test_sparse_vector_index.py
✅ Files skipped from review due to trivial changes (1)
- tests/integration_tests/test_sparse_vector_index.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/integration_tests/test_collection_v1_compatibility.py
- tests/integration_tests/test_offical_case.py
## Summary <!-- Please clearly and concisely describe the purpose of this pull request. If this pull request resolves an issue, please link it via "close #xxx" or "fix #xxx". --> to adapt vector_async_index "call dbms_index_manager.refresh()" , add collection.refresh_index() ## Solution Description <!-- Please clearly and concisely describe your solution. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Manual collection index refresh exposed when supported by the connected database. * **Behavior** * Hybrid search results may include embeddings by default; embedding entries can be None and dimensions validated only when present. * **Tests** * Integration tests updated to call refresh when available and to relax expectations around returned result fields. [](https://app.coderabbit.ai/change-stack/oceanbase/pyseekdb/pull/209) <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
to adapt vector_async_index "call dbms_index_manager.refresh()" , add collection.refresh_index()
Solution Description
Summary by CodeRabbit
New Features
Behavior
Tests