Add namespace-scoped collections with IVF vector indexing for agent database by JingYuChe · Pull Request #240 · oceanbase/pyseekdb

JingYuChe · 2026-06-22T03:53:19Z

Summary

Solution Description

Summary by CodeRabbit

New Features
- Added namespace-enabled collections with namespace-scoped CRUD, querying, and prewarm, including stricter use_namespace routing and validation.
- Introduced IVF dense vector indexing via IVFConfiguration (IVFIndexType/IVFIndexLib) and added partition_count + has_vector_index support.
- Expanded schema/index configuration to accept IVF in addition to HNSW.
Bug Fixes
- Improved remote connection robustness and added more user-friendly namespace error messages.
Documentation
- Updated query() include to support distances and extended where_document docs with $regex (and updated logical operator guidance).
Tests
- Significantly expanded namespace/IVF/hybrid-search and lifecycle concurrency coverage, including prewarm and optional-index scenarios.

namespace调用oceanbase的内核DBMS_LOGIC_TABLE包，内核自行管理元数据表

新增 namespace 统计 catalog 表

…_dev

…must OceanBase rejects a bool with only must_not when it is nested under must. Combine $not_contains with metadata filters via outer filter + must_not instead.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (4)

tests/unit_tests/test_document_query_builder.py (2)
50-56: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Also assert that $not_contains survives wrapping.

document_expr_as_knn_filter() is supposed to preserve the original must_not clause and add the permissive exists filter. This test only checks the added filter, so an implementation that drops must_not would still pass.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit_tests/test_document_query_builder.py` around lines 50 - 56, The
test for document KNN wrapping only verifies the added exists filter and can
miss a regression where the original $not_contains / must_not clause is dropped.
Update test_pure_must_not_adds_exists_filter in document_expr_as_knn_filter /
build_document_hybrid_expression to also assert that the wrapped bool query
still contains the original must_not from $not_contains while keeping the
permissive exists filter.
29-40: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Strengthen the nested $and/$or assertion.

This test only proves that some bool.must wrapper exists. It would still pass if the nested $or branch or the "c" term were dropped during translation, which is the main behavior this case is meant to protect. Please assert the concrete nested structure here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit_tests/test_document_query_builder.py` around lines 29 - 40, The
nested `$and`/`$or` test in `test_and_or_nested` is too weak because it only
checks for a generic `bool.must` wrapper and could miss dropped branches during
translation. Update the assertions to verify the concrete structure returned by
`build_document_hybrid_expression`, including the nested `$or` branch and the
`"c"` term, so the test specifically protects the full nested boolean
translation.
src/pyseekdb/client/client_base.py (1)
149-154: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Parenthesize the mixed and/or for clarity. The final clause "1061" in message and "duplicate" in message binds tighter than the surrounding or, which is the intended grouping but is easy to misread.
♻️ Suggested tweak
     if (
         "already exists" in message
         or "duplicate key name" in message
         or "code=1061" in message
-        or "1061" in message and "duplicate" in message
+        or ("1061" in message and "duplicate" in message)
     ):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pyseekdb/client/client_base.py` around lines 149 - 154, The conditional
in client_base.py mixes or and and in a way that is correct but hard to read;
update the message-checking logic in the relevant exception handler to
explicitly parenthesize the final "1061" in message and "duplicate" in message
clause alongside the other or conditions. Keep the behavior the same, but make
the grouping obvious by clarifying the boolean expression used in that
duplicate-detection check.
Source: Linters/SAST tools
tests/integration_tests/namespace_hybrid_search_helpers.py (1)
137-139: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Use strict=True in zip for the ground-truth distance. Since l2_squared computes the expected-KNN baseline, a query/corpus dimension mismatch would silently truncate and yield a wrong expected ordering, masking a real regression instead of failing loudly.
♻️ Suggested tweak
-    return sum((x - y) ** 2 for x, y in zip(a, b))
+    return sum((x - y) ** 2 for x, y in zip(a, b, strict=True))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration_tests/namespace_hybrid_search_helpers.py` around lines 137
- 139, The l2_squared helper used for the expected KNN baseline should fail on
vector length mismatches instead of silently truncating. Update the zip call
inside l2_squared in namespace_hybrid_search_helpers.py to use strict=True so
any query/corpus dimension mismatch raises immediately, and keep the behavior
localized to this ground-truth distance helper.
Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pyseekdb/client/document_query_builder.py`:
- Around line 254-275: The unused merge_into_knn_filter helper should be removed
because knn filter merging is already handled elsewhere and client_base does not
use it. Delete the merge_into_knn_filter function from document_query_builder
and remove its import from client_base, then verify _build_knn_expression and
_inject_filter_into_knn continue to cover all merge behavior.
- Around line 14-22: The query-string escaping in _query_string_contains is
using SQL-style escaping instead of search-syntax escaping, so
Lucene/Elasticsearch reserved characters can be misread. Update
_query_string_contains to use a query_string-aware escaper for the query
payload, and apply the same escaping approach anywhere the document query
builder combines terms with $and/$or so those joins preserve search syntax
correctly.

In `@src/pyseekdb/client/kernel_errors.py`:
- Line 64: The boolean expression in the kernel error check mixes or and and
without parentheses, which Ruff flags in the kernel_errors logic. Update the
condition in the relevant text-matching branch to group the and subexpression
explicitly, keeping the existing behavior while making the intent clear and
lint-safe. Use the surrounding kernel_errors.py check that references "being
dropped" and "namespace" to locate the expression.

In `@tests/integration_tests/test_namespace_query_where_document.py`:
- Around line 98-119: The document filter tests can pass vacuously when the
query returns no rows, so add a non-empty result assertion before checking for
violations. Update test_where_document_not_contains, test_where_document_and,
test_where_document_or, test_where_document_regex, and the two nested tests to
assert that docs is not empty (like the baseline test) after ns.query, then keep
the existing exclusion checks so regressions that filter everything out are
caught.

---

Nitpick comments:
In `@src/pyseekdb/client/client_base.py`:
- Around line 149-154: The conditional in client_base.py mixes or and and in a
way that is correct but hard to read; update the message-checking logic in the
relevant exception handler to explicitly parenthesize the final "1061" in
message and "duplicate" in message clause alongside the other or conditions.
Keep the behavior the same, but make the grouping obvious by clarifying the
boolean expression used in that duplicate-detection check.

In `@tests/integration_tests/namespace_hybrid_search_helpers.py`:
- Around line 137-139: The l2_squared helper used for the expected KNN baseline
should fail on vector length mismatches instead of silently truncating. Update
the zip call inside l2_squared in namespace_hybrid_search_helpers.py to use
strict=True so any query/corpus dimension mismatch raises immediately, and keep
the behavior localized to this ground-truth distance helper.

In `@tests/unit_tests/test_document_query_builder.py`:
- Around line 50-56: The test for document KNN wrapping only verifies the added
exists filter and can miss a regression where the original $not_contains /
must_not clause is dropped. Update test_pure_must_not_adds_exists_filter in
document_expr_as_knn_filter / build_document_hybrid_expression to also assert
that the wrapped bool query still contains the original must_not from
$not_contains while keeping the permissive exists filter.
- Around line 29-40: The nested `$and`/`$or` test in `test_and_or_nested` is too
weak because it only checks for a generic `bool.must` wrapper and could miss
dropped branches during translation. Update the assertions to verify the
concrete structure returned by `build_document_hybrid_expression`, including the
nested `$or` branch and the `"c"` term, so the test specifically protects the
full nested boolean translation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d4fbc95-a5a6-43f4-aae7-2525641835cf

📥 Commits

Reviewing files that changed from the base of the PR and between c688423 and 908186a.

📒 Files selected for processing (50)

src/pyseekdb/client/client_base.py
src/pyseekdb/client/client_seekdb_server.py
src/pyseekdb/client/collection.py
src/pyseekdb/client/document_query_builder.py
src/pyseekdb/client/filters.py
src/pyseekdb/client/kernel_errors.py
src/pyseekdb/client/namespace.py
src/pyseekdb/client/schema.py
src/pyseekdb/client/validators.py
tests/integration_tests/namespace_dml_helpers.py
tests/integration_tests/namespace_fts_helpers.py
tests/integration_tests/namespace_hybrid_search_helpers.py
tests/integration_tests/test_get_or_create_collection_concurrency.py
tests/integration_tests/test_get_or_create_collection_multiprocess.py
tests/integration_tests/test_logic_table_monitoring_p1.py
tests/integration_tests/test_namespace_constraints.py
tests/integration_tests/test_namespace_dml.py
tests/integration_tests/test_namespace_dml_multi_coll.py
tests/integration_tests/test_namespace_dml_multi_coll_multi_ns.py
tests/integration_tests/test_namespace_dml_multi_ns.py
tests/integration_tests/test_namespace_drop_validation.py
tests/integration_tests/test_namespace_get_delete_filters.py
tests/integration_tests/test_namespace_hybrid_search_combined.py
tests/integration_tests/test_namespace_hybrid_search_fulltext.py
tests/integration_tests/test_namespace_hybrid_search_fulltext_multi_coll_multi_ns.py
tests/integration_tests/test_namespace_hybrid_search_multi_coll_multi_ns.py
tests/integration_tests/test_namespace_hybrid_search_search_index.py
tests/integration_tests/test_namespace_hybrid_search_triple_branch.py
tests/integration_tests/test_namespace_hybrid_search_triple_branch_multi_coll_multi_ns.py
tests/integration_tests/test_namespace_hybrid_search_vector.py
tests/integration_tests/test_namespace_lifecycle.py
tests/integration_tests/test_namespace_lifecycle_concurrency.py
tests/integration_tests/test_namespace_optional_indexes.py
tests/integration_tests/test_namespace_prewarm.py
tests/integration_tests/test_namespace_prewarm_lob.py
tests/integration_tests/test_namespace_query.py
tests/integration_tests/test_namespace_query_where_document.py
tests/integration_tests/test_namespace_reconnect.py
tests/integration_tests/test_namespace_session_vars.py
tests/integration_tests/test_namespace_upsert_concurrency.py
tests/unit_tests/test_build_document_query.py
tests/unit_tests/test_build_knn_filter.py
tests/unit_tests/test_build_query_expression.py
tests/unit_tests/test_document_query_builder.py
tests/unit_tests/test_get_or_create_collection_concurrency.py
tests/unit_tests/test_get_or_create_namespace_concurrency.py
tests/unit_tests/test_namespace.py
tests/unit_tests/test_namespace_kernel_errors.py
tests/unit_tests/test_namespace_lifecycle_lock.py
tests/unit_tests/test_namespace_upsert_reconcile.py

🚧 Files skipped from review as they are similar to previous changes (36)

tests/unit_tests/test_build_knn_filter.py
tests/integration_tests/test_namespace_hybrid_search_search_index.py
tests/integration_tests/test_namespace_hybrid_search_combined.py
tests/integration_tests/test_namespace_lifecycle_concurrency.py
tests/integration_tests/test_namespace_hybrid_search_triple_branch.py
tests/integration_tests/test_namespace_constraints.py
tests/unit_tests/test_namespace_lifecycle_lock.py
src/pyseekdb/client/client_seekdb_server.py
tests/integration_tests/test_namespace_dml_multi_coll_multi_ns.py
tests/unit_tests/test_build_document_query.py
tests/unit_tests/test_get_or_create_namespace_concurrency.py
tests/integration_tests/test_logic_table_monitoring_p1.py
tests/integration_tests/test_namespace_session_vars.py
tests/integration_tests/test_namespace_dml_multi_ns.py
tests/integration_tests/test_get_or_create_collection_concurrency.py
tests/unit_tests/test_build_query_expression.py
src/pyseekdb/client/schema.py
tests/integration_tests/test_namespace_hybrid_search_vector.py
tests/integration_tests/test_namespace_dml_multi_coll.py
tests/integration_tests/test_namespace_reconnect.py
tests/integration_tests/test_namespace_hybrid_search_fulltext_multi_coll_multi_ns.py
tests/integration_tests/test_namespace_upsert_concurrency.py
tests/integration_tests/test_namespace_query.py
tests/integration_tests/test_namespace_prewarm.py
src/pyseekdb/client/collection.py
src/pyseekdb/client/namespace.py
tests/integration_tests/test_namespace_get_delete_filters.py
tests/integration_tests/test_namespace_hybrid_search_fulltext.py
tests/integration_tests/test_namespace_prewarm_lob.py
tests/unit_tests/test_get_or_create_collection_concurrency.py
tests/integration_tests/test_namespace_hybrid_search_multi_coll_multi_ns.py
tests/integration_tests/test_namespace_hybrid_search_triple_branch_multi_coll_multi_ns.py
tests/integration_tests/namespace_dml_helpers.py
tests/integration_tests/test_namespace_drop_validation.py
tests/unit_tests/test_namespace.py
tests/integration_tests/namespace_fts_helpers.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/pyseekdb/client/document_query_builder.py (1)
149-159: 🎯 Functional Correctness | 🔴 Critical

The fast path for $and/$or combining $contains clauses alters boolean semantics for multi-word terms.

The optimization at lines 149-159 (and 173-183) joins all $contains values with spaces and forces a default_operator.

Non-optimized path: Generates separate query_string clauses for each condition. In Lucene/Elasticsearch, a raw string like "foo bar" without an explicit operator defaults to foo OR bar. Thus {"$and": [...]} results in (foo OR bar) AND baz.

Optimized path: Joins to "foo bar baz" and sets default_operator: "and". Lucene parses this as foo AND bar AND baz.

This changes the semantics for any multi-word term. The optimization should only apply if every $contains value is a single atomic term (no spaces or operators), or it should be removed to fall back to the correct bool composition.

Example discrepancy

Input: {"$and": [{"$contains": "foo bar"}, {"$contains": "baz"}]}

Expected (current fallback): (foo OR bar) AND baz

Current Fast Path: foo AND bar AND baz
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pyseekdb/client/document_query_builder.py` around lines 149 - 159, The
`$and`/$`or` fast-path in `document_query_builder` changes semantics for
multi-word `$contains` values by collapsing them into one `query_string` and
forcing `default_operator`; update the optimization in the `$and` and `$or`
branches to only trigger for single-token atomic terms, or remove it and use the
existing bool composition fallback. Keep the logic aligned with the surrounding
query-building path in `document_query_builder` so multi-word phrases still
behave the same as the non-optimized `query_string` clauses.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pyseekdb/client/document_query_builder.py`:
- Around line 14-17: The backslash handling in _escape_query_string_term is
duplicated because _QUERY_STRING_RESERVED_RE already matches backslashes, so
remove the manual text.replace("\\", "\\\\") and rely on the regex substitution
alone. Update the logic in _escape_query_string_term so every reserved
character, including backslash, is escaped exactly once and consistently with
the other Lucene query_string characters.

---

Outside diff comments:
In `@src/pyseekdb/client/document_query_builder.py`:
- Around line 149-159: The `$and`/$`or` fast-path in `document_query_builder`
changes semantics for multi-word `$contains` values by collapsing them into one
`query_string` and forcing `default_operator`; update the optimization in the
`$and` and `$or` branches to only trigger for single-token atomic terms, or
remove it and use the existing bool composition fallback. Keep the logic aligned
with the surrounding query-building path in `document_query_builder` so
multi-word phrases still behave the same as the non-optimized `query_string`
clauses.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 78a0309a-8d7d-49fa-8794-0debfa57241b

📥 Commits

Reviewing files that changed from the base of the PR and between 908186a and 1ac0899.

📒 Files selected for processing (10)

docs/guide/dql.md
src/pyseekdb/client/client_base.py
src/pyseekdb/client/configuration.py
src/pyseekdb/client/document_query_builder.py
src/pyseekdb/client/kernel_errors.py
tests/integration_tests/namespace_hybrid_search_helpers.py
tests/integration_tests/test_namespace_constraints.py
tests/integration_tests/test_namespace_query_where_document.py
tests/unit_tests/test_document_query_builder.py
tests/unit_tests/test_namespace.py

🚧 Files skipped from review as they are similar to previous changes (5)

src/pyseekdb/client/kernel_errors.py
tests/integration_tests/test_namespace_query_where_document.py
tests/unit_tests/test_document_query_builder.py
src/pyseekdb/client/configuration.py
tests/unit_tests/test_namespace.py

coderabbitai

🧹 Nitpick comments (1)

tests/unit_tests/test_namespace_no_index_embedding_dim.py (1)

11-47: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add coverage for None embeddings.

_validate_namespace_explicit_embedding_dimensions() now has a dedicated if vec is None: continue branch, but these tests only cover full-vector match/mismatch paths. A small regression test here would lock in the intended mixed-payload behavior for both indexed and no-index namespaces.

Suggested test

 class TestNoIndexExplicitEmbeddingDimension:
+    def test_ignores_none_embeddings(self):
+        _validate_namespace_no_index_explicit_embeddings(
+            [None, [0.0] * 384],
+            expected_dimension=384,
+        )
+
     def test_accepts_matching_dimension(self):
         _validate_namespace_no_index_explicit_embeddings(
             [[0.0] * 384, [1.0] * 384],
             expected_dimension=384,
         )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit_tests/test_namespace_no_index_embedding_dim.py` around lines 11 -
47, Add regression coverage for None embeddings in the existing namespace
dimension tests. Extend TestNoIndexExplicitEmbeddingDimension and
TestIvfExplicitEmbeddingDimension to call
_validate_namespace_no_index_explicit_embeddings and
_validate_namespace_explicit_embedding_dimensions with a payload that mixes None
and valid vectors, and assert that None entries are skipped while the remaining
embeddings still validate correctly. Use the existing helper names and
expected_dimension/has_vector_index parameters so the new test locks in the vec
is None branch behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit_tests/test_namespace_no_index_embedding_dim.py`:
- Around line 11-47: Add regression coverage for None embeddings in the existing
namespace dimension tests. Extend TestNoIndexExplicitEmbeddingDimension and
TestIvfExplicitEmbeddingDimension to call
_validate_namespace_no_index_explicit_embeddings and
_validate_namespace_explicit_embedding_dimensions with a payload that mixes None
and valid vectors, and assert that None entries are skipped while the remaining
embeddings still validate correctly. Use the existing helper names and
expected_dimension/has_vector_index parameters so the new test locks in the vec
is None branch behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 941c6c59-88ef-4cbc-b37c-81fdf3d6b2e8

📥 Commits

Reviewing files that changed from the base of the PR and between 1ac0899 and b39abbd.

📒 Files selected for processing (9)

src/pyseekdb/client/client_base.py
src/pyseekdb/client/collection.py
src/pyseekdb/client/configuration.py
src/pyseekdb/client/namespace.py
src/pyseekdb/client/validators.py
tests/integration_tests/test_namespace_constraints.py
tests/integration_tests/test_namespace_optional_indexes.py
tests/unit_tests/test_namespace.py
tests/unit_tests/test_namespace_no_index_embedding_dim.py

🚧 Files skipped from review as they are similar to previous changes (5)

tests/integration_tests/test_namespace_optional_indexes.py
src/pyseekdb/client/collection.py
src/pyseekdb/client/configuration.py
src/pyseekdb/client/namespace.py
tests/unit_tests/test_namespace.py

hnwyllmm

One API ergonomics issue: create_collection(name, use_namespace=True) currently appears to fail on the default path. When schema is omitted, create_collection() first builds the legacy default HNSW schema via _prepare_schema_parameters(), and then _create_namespace_collection() rejects it with ValueError("use_namespace=True only supports IVF index type, HNSW is not allowed").

That error is technically accurate, but confusing for users because they did not explicitly choose HNSW; the SDK chose it as the default. For a namespace-enabled collection, the default path should either construct the namespace-compatible default schema, for example IVF plus the default fulltext config, or fail earlier with an actionable message such as: use_namespace=True requires an IVF schema. The default non-namespace schema is HNSW. Please pass Schema(vector_index=VectorIndexConfig(ivf=IVFConfiguration(...))) or remove use_namespace=True.

I would prefer making the default namespace path work if this is meant to match the Agent Database design shorthand, but at minimum the current error should explain why the default path became HNSW and how users should fix it.

hnwyllmm

Another compatibility concern: namespace record IDs are currently validated with the same identifier-like pattern used for names ([A-Za-z0-9_]). That is stricter than the existing collection path, where _id supports arbitrary string formats and is safely escaped/cast before being written.

For record IDs, real applications may reasonably use UUIDs, file paths, URLs, external document IDs, or other opaque business keys, for example repo/src/foo.py, https://example.com/doc/123, or 550e8400-e29b-41d4-a716-446655440000. These would work in a normal collection today but fail after moving to a namespace collection.

Could we relax namespace record-id validation to match the existing collection semantics, e.g. validate only type/non-empty/length and rely on parameterization/escaping plus JSON storage for safety? The stricter [A-Za-z0-9_] rule seems appropriate for resource names like collection/namespace/ltable names, but too restrictive for per-record business IDs.

hnwyllmm

One more API-safety issue: SparseVectorIndexConfig appears to be silently ignored when creating a namespace-enabled collection.

The design doc says sparse vector support is out of scope for this phase, which is fine. But the public Schema still allows sparse_vector_index=..., and the namespace creation path does not reject it, persist it, or create a sparse index. That means a user can pass a sparse config, receive a successfully created collection, and only later discover that sparse retrieval is not available.

Could we explicitly reject schema.sparse_vector_index is not None when use_namespace=True, with a clear error such as use_namespace=True does not support SparseVectorIndexConfig yet? Silent feature drop is risky here, especially because the same schema works for normal collections.

hnwyllmm

Test coverage is broad, but I think we still need a few public API compatibility tests for the namespace feature. Many current tests exercise validators, DDL fragments, include/n_results handling, or internal helpers such as _create_namespace_collection(). Those are useful, but they do not fully protect the real user-facing behavior.

Could we add tests around the public APIs, for example:

client.create_collection(name, use_namespace=True) with no explicit schema: either succeeds with the intended namespace default schema, or fails with a clear actionable error.
client.get_or_create_collection(name, use_namespace=True) when a normal collection with the same name already exists: should not silently return a non-namespace collection; it should raise a type/mode mismatch or otherwise make the behavior explicit.
client.create_collection(..., use_namespace=True, schema=...) with unsupported configs such as HNSW or SparseVectorIndexConfig: should fail explicitly rather than silently ignoring unsupported pieces.
Namespace record IDs should preserve ordinary collection compatibility, including IDs like UUIDs, paths, and URLs if we relax the validation.

These tests should go through Client/BaseClient.create_collection() and get_or_create_collection() rather than only calling internal methods, because the current regressions are mostly about public API composition and defaults.

hnwyllmm

The integration coverage is impressive across lifecycle, DML, query, hybrid search, prewarm, drop, concurrency, and session variables. One gap I still see is compatibility and misuse coverage at the integration level.

Most namespace integration helpers appear to create collections with an explicit IVF schema, which is the happy path. Could we also add a small set of integration tests for edge/user-migration behavior, such as:

Creating a namespace collection through the default public API path, or verifying the error is clear if default schema creation is intentionally unsupported.
Attempting to reuse a normal collection name with get_or_create_collection(..., use_namespace=True) and asserting an explicit mode/type mismatch instead of returning the wrong collection type.
Passing unsupported namespace schema options, such as HNSW or sparse vector index, and asserting they fail explicitly.
Verifying namespace record IDs accept the same practical business-key shapes as normal collections, such as UUIDs, paths, and URLs, if validation is relaxed.
If multi-LTable is part of the intended design scope, covering more than one logical table under the same namespace; if not, documenting/test-asserting the current single-default-LTable limitation.

These would complement the current happy-path namespace tests and protect backward compatibility for users migrating from normal collections.

hnwyllmm

Could we also add user-facing documentation and example code for the namespace feature before merging?

This PR introduces a fairly large new API surface and some important constraints, so users will need a dedicated guide rather than only tests. I suggest adding a namespace guide that covers:

Creating a namespace-enabled collection, including the required/recommended schema.
Creating/getting/listing/deleting namespaces.
Writing data with add/update/upsert/delete through the namespace API.
Querying and hybrid_search, including where, where_document, include, and distances.
prewarm behavior and deployment limitations.
Deleting namespaces/collections and the async cleanup semantics if applicable.
Compatibility and limitations: OceanBase/version requirement, unsupported schema options such as HNSW/sparse vector if they remain unsupported, record ID compatibility, and the current LTable behavior/scope.

A small runnable example under examples or docs would also help a lot, ideally showing the full flow from create_collection(..., use_namespace=True) to namespace CRUD/query and cleanup.

JingYuChe and others added 30 commits April 28, 2026 21:11

pyseekdb collection支持按namespace隔离

8b7a956

pyseekdb支持设置namespace_id相关的session缓存

c4897dd

代码检视问题修改

da41d77

delete

5f0e1a9

namespace调用oceanbase的内核DBMS_LOGIC_TABLE包，内核自行管理元数据表

pyseekdb修改ob的search index with parser

f7f6b1c

新增 namespace 统计 catalog 表

ca1a069

Merge pull request #1 from JingYuChe/feat/sdk-namespaces-stats

ef3d119

新增 namespace 统计 catalog 表

prewarm集成测试用例加强

0622ed9

提供set_namespace_partition_count动态设置分区数

5eb9b44

屏蔽hybrid_search用search_para解析成sql，直接向内核输入hybrid_search接口

2d901a4

混搜支持data_content过滤

750f581

kv table到key 改为varbinary

0c62edc

namespace.add,get等操作显示设置session

765dcee

namespace.add,get等操作显示设置session

941a0c9

pyseekdb新增dbms_logic_table.drop_namespace用例

4ee60df

logic table id采用真实分配的值

69542a1

fts适配default_operator:and

33ab2c3

Fix conflict

17743d8

or修复

4ae23cd

修复json_search双引号问题

5d10d73

add namespace case

4afb949

Merge branch 'develop' of https://github.com/skylhd/pyseekdb into adb…

dafc36d

…_dev

修复scalar term query in must/should clause not supported

13c0101

fix dsl

f82317f

sdk改名

accc8ef

fix(hybrid_search): hoist $not_contains must_not out of scoring bool …

6ea284c

…must OceanBase rejects a bool with only must_not when it is nested under must. Combine $not_contains with metadata filters via outer filter + must_not instead.

open spfresh and add case

ee11730

open spfresh and add case

65bf56c

fix conflict

79c0b9c

fix case

c18230d

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread src/pyseekdb/client/document_query_builder.py

Comment thread src/pyseekdb/client/document_query_builder.py Outdated

Comment thread src/pyseekdb/client/kernel_errors.py Outdated

Comment thread tests/integration_tests/test_namespace_query_where_document.py

fix:issue

a0c9599

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread src/pyseekdb/client/document_query_builder.py Outdated

fix:【pyseekdb】ob内核不支持ivf索引超越2048维，但是sdk给的范围是[1,4096]

513a33a

JingYuChe force-pushed the develop branch from 1ac0899 to 513a33a Compare June 26, 2026 07:53

fix:【pyseekdb】不传入向量索引配置的时候，向量列是sdk内部指定的384维，ns.add报错信息不明确

b39abbd

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

JingYuChe added 5 commits June 26, 2026 22:35

issue_fix

e06c500

ut适配collection_dimension

251eb05

非namespace路径不去校验_validate_include

7abf5c7

dont run namespace testcases on workflow

ec803f2

import:_validate_namespace_name

b59159a

hnwyllmm reviewed Jun 29, 2026

View reviewed changes

fix_quality

843e8bd

hnwyllmm reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

JingYuChe added 5 commits June 29, 2026 17:31

skip_lakebase

6d2f357

namespace模式只支持lakebase

15ab7fc

适配内核计算lob头

dc95767

fix_workflow

6860dbd

fix_issue_hnwyllmm

b071f5d

JingYuChe requested a review from hnwyllmm June 30, 2026 03:18

JingYuChe added 3 commits June 30, 2026 11:45

without ignore

7811b2b

quality_fix

3546102

fix_test

fb2c7cf

hnwyllmm merged commit 44d74c3 into oceanbase:develop Jun 30, 2026
9 checks passed

Uh oh!

Conversation

JingYuChe commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Solution Description

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

hnwyllmm left a comment

Choose a reason for hiding this comment

Uh oh!

hnwyllmm left a comment

Choose a reason for hiding this comment

Uh oh!

hnwyllmm left a comment

Choose a reason for hiding this comment

Uh oh!

hnwyllmm left a comment

Choose a reason for hiding this comment

Uh oh!

hnwyllmm left a comment

Choose a reason for hiding this comment

Uh oh!

hnwyllmm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JingYuChe commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading