Add namespace-scoped collections with IVF vector indexing for agent database#240
Conversation
新增 namespace 统计 catalog 表
…must OceanBase rejects a bool with only must_not when it is nested under must. Combine $not_contains with metadata filters via outer filter + must_not instead.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
tests/unit_tests/test_document_query_builder.py (2)
50-56: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAlso assert that
$not_containssurvives wrapping.
document_expr_as_knn_filter()is supposed to preserve the originalmust_notclause and add the permissiveexistsfilter. This test only checks the addedfilter, so an implementation that dropsmust_notwould still pass.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit_tests/test_document_query_builder.py` around lines 50 - 56, The test for document KNN wrapping only verifies the added exists filter and can miss a regression where the original $not_contains / must_not clause is dropped. Update test_pure_must_not_adds_exists_filter in document_expr_as_knn_filter / build_document_hybrid_expression to also assert that the wrapped bool query still contains the original must_not from $not_contains while keeping the permissive exists filter.
29-40: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winStrengthen the nested
$and/$orassertion.This test only proves that some
bool.mustwrapper exists. It would still pass if the nested$orbranch or the"c"term were dropped during translation, which is the main behavior this case is meant to protect. Please assert the concrete nested structure here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit_tests/test_document_query_builder.py` around lines 29 - 40, The nested `$and`/`$or` test in `test_and_or_nested` is too weak because it only checks for a generic `bool.must` wrapper and could miss dropped branches during translation. Update the assertions to verify the concrete structure returned by `build_document_hybrid_expression`, including the nested `$or` branch and the `"c"` term, so the test specifically protects the full nested boolean translation.src/pyseekdb/client/client_base.py (1)
149-154: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueParenthesize the mixed
and/orfor clarity. The final clause"1061" in message and "duplicate" in messagebinds tighter than the surroundingor, which is the intended grouping but is easy to misread.♻️ Suggested tweak
if ( "already exists" in message or "duplicate key name" in message or "code=1061" in message - or "1061" in message and "duplicate" in message + or ("1061" in message and "duplicate" in message) ):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pyseekdb/client/client_base.py` around lines 149 - 154, The conditional in client_base.py mixes or and and in a way that is correct but hard to read; update the message-checking logic in the relevant exception handler to explicitly parenthesize the final "1061" in message and "duplicate" in message clause alongside the other or conditions. Keep the behavior the same, but make the grouping obvious by clarifying the boolean expression used in that duplicate-detection check.Source: Linters/SAST tools
tests/integration_tests/namespace_hybrid_search_helpers.py (1)
137-139: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winUse
strict=Trueinzipfor the ground-truth distance. Sincel2_squaredcomputes the expected-KNN baseline, a query/corpus dimension mismatch would silently truncate and yield a wrong expected ordering, masking a real regression instead of failing loudly.♻️ Suggested tweak
- return sum((x - y) ** 2 for x, y in zip(a, b)) + return sum((x - y) ** 2 for x, y in zip(a, b, strict=True))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration_tests/namespace_hybrid_search_helpers.py` around lines 137 - 139, The l2_squared helper used for the expected KNN baseline should fail on vector length mismatches instead of silently truncating. Update the zip call inside l2_squared in namespace_hybrid_search_helpers.py to use strict=True so any query/corpus dimension mismatch raises immediately, and keep the behavior localized to this ground-truth distance helper.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pyseekdb/client/document_query_builder.py`:
- Around line 254-275: The unused merge_into_knn_filter helper should be removed
because knn filter merging is already handled elsewhere and client_base does not
use it. Delete the merge_into_knn_filter function from document_query_builder
and remove its import from client_base, then verify _build_knn_expression and
_inject_filter_into_knn continue to cover all merge behavior.
- Around line 14-22: The query-string escaping in _query_string_contains is
using SQL-style escaping instead of search-syntax escaping, so
Lucene/Elasticsearch reserved characters can be misread. Update
_query_string_contains to use a query_string-aware escaper for the query
payload, and apply the same escaping approach anywhere the document query
builder combines terms with $and/$or so those joins preserve search syntax
correctly.
In `@src/pyseekdb/client/kernel_errors.py`:
- Line 64: The boolean expression in the kernel error check mixes or and and
without parentheses, which Ruff flags in the kernel_errors logic. Update the
condition in the relevant text-matching branch to group the and subexpression
explicitly, keeping the existing behavior while making the intent clear and
lint-safe. Use the surrounding kernel_errors.py check that references "being
dropped" and "namespace" to locate the expression.
In `@tests/integration_tests/test_namespace_query_where_document.py`:
- Around line 98-119: The document filter tests can pass vacuously when the
query returns no rows, so add a non-empty result assertion before checking for
violations. Update test_where_document_not_contains, test_where_document_and,
test_where_document_or, test_where_document_regex, and the two nested tests to
assert that docs is not empty (like the baseline test) after ns.query, then keep
the existing exclusion checks so regressions that filter everything out are
caught.
---
Nitpick comments:
In `@src/pyseekdb/client/client_base.py`:
- Around line 149-154: The conditional in client_base.py mixes or and and in a
way that is correct but hard to read; update the message-checking logic in the
relevant exception handler to explicitly parenthesize the final "1061" in
message and "duplicate" in message clause alongside the other or conditions.
Keep the behavior the same, but make the grouping obvious by clarifying the
boolean expression used in that duplicate-detection check.
In `@tests/integration_tests/namespace_hybrid_search_helpers.py`:
- Around line 137-139: The l2_squared helper used for the expected KNN baseline
should fail on vector length mismatches instead of silently truncating. Update
the zip call inside l2_squared in namespace_hybrid_search_helpers.py to use
strict=True so any query/corpus dimension mismatch raises immediately, and keep
the behavior localized to this ground-truth distance helper.
In `@tests/unit_tests/test_document_query_builder.py`:
- Around line 50-56: The test for document KNN wrapping only verifies the added
exists filter and can miss a regression where the original $not_contains /
must_not clause is dropped. Update test_pure_must_not_adds_exists_filter in
document_expr_as_knn_filter / build_document_hybrid_expression to also assert
that the wrapped bool query still contains the original must_not from
$not_contains while keeping the permissive exists filter.
- Around line 29-40: The nested `$and`/`$or` test in `test_and_or_nested` is too
weak because it only checks for a generic `bool.must` wrapper and could miss
dropped branches during translation. Update the assertions to verify the
concrete structure returned by `build_document_hybrid_expression`, including the
nested `$or` branch and the `"c"` term, so the test specifically protects the
full nested boolean translation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3d4fbc95-a5a6-43f4-aae7-2525641835cf
📒 Files selected for processing (50)
src/pyseekdb/client/client_base.pysrc/pyseekdb/client/client_seekdb_server.pysrc/pyseekdb/client/collection.pysrc/pyseekdb/client/document_query_builder.pysrc/pyseekdb/client/filters.pysrc/pyseekdb/client/kernel_errors.pysrc/pyseekdb/client/namespace.pysrc/pyseekdb/client/schema.pysrc/pyseekdb/client/validators.pytests/integration_tests/namespace_dml_helpers.pytests/integration_tests/namespace_fts_helpers.pytests/integration_tests/namespace_hybrid_search_helpers.pytests/integration_tests/test_get_or_create_collection_concurrency.pytests/integration_tests/test_get_or_create_collection_multiprocess.pytests/integration_tests/test_logic_table_monitoring_p1.pytests/integration_tests/test_namespace_constraints.pytests/integration_tests/test_namespace_dml.pytests/integration_tests/test_namespace_dml_multi_coll.pytests/integration_tests/test_namespace_dml_multi_coll_multi_ns.pytests/integration_tests/test_namespace_dml_multi_ns.pytests/integration_tests/test_namespace_drop_validation.pytests/integration_tests/test_namespace_get_delete_filters.pytests/integration_tests/test_namespace_hybrid_search_combined.pytests/integration_tests/test_namespace_hybrid_search_fulltext.pytests/integration_tests/test_namespace_hybrid_search_fulltext_multi_coll_multi_ns.pytests/integration_tests/test_namespace_hybrid_search_multi_coll_multi_ns.pytests/integration_tests/test_namespace_hybrid_search_search_index.pytests/integration_tests/test_namespace_hybrid_search_triple_branch.pytests/integration_tests/test_namespace_hybrid_search_triple_branch_multi_coll_multi_ns.pytests/integration_tests/test_namespace_hybrid_search_vector.pytests/integration_tests/test_namespace_lifecycle.pytests/integration_tests/test_namespace_lifecycle_concurrency.pytests/integration_tests/test_namespace_optional_indexes.pytests/integration_tests/test_namespace_prewarm.pytests/integration_tests/test_namespace_prewarm_lob.pytests/integration_tests/test_namespace_query.pytests/integration_tests/test_namespace_query_where_document.pytests/integration_tests/test_namespace_reconnect.pytests/integration_tests/test_namespace_session_vars.pytests/integration_tests/test_namespace_upsert_concurrency.pytests/unit_tests/test_build_document_query.pytests/unit_tests/test_build_knn_filter.pytests/unit_tests/test_build_query_expression.pytests/unit_tests/test_document_query_builder.pytests/unit_tests/test_get_or_create_collection_concurrency.pytests/unit_tests/test_get_or_create_namespace_concurrency.pytests/unit_tests/test_namespace.pytests/unit_tests/test_namespace_kernel_errors.pytests/unit_tests/test_namespace_lifecycle_lock.pytests/unit_tests/test_namespace_upsert_reconcile.py
🚧 Files skipped from review as they are similar to previous changes (36)
- tests/unit_tests/test_build_knn_filter.py
- tests/integration_tests/test_namespace_hybrid_search_search_index.py
- tests/integration_tests/test_namespace_hybrid_search_combined.py
- tests/integration_tests/test_namespace_lifecycle_concurrency.py
- tests/integration_tests/test_namespace_hybrid_search_triple_branch.py
- tests/integration_tests/test_namespace_constraints.py
- tests/unit_tests/test_namespace_lifecycle_lock.py
- src/pyseekdb/client/client_seekdb_server.py
- tests/integration_tests/test_namespace_dml_multi_coll_multi_ns.py
- tests/unit_tests/test_build_document_query.py
- tests/unit_tests/test_get_or_create_namespace_concurrency.py
- tests/integration_tests/test_logic_table_monitoring_p1.py
- tests/integration_tests/test_namespace_session_vars.py
- tests/integration_tests/test_namespace_dml_multi_ns.py
- tests/integration_tests/test_get_or_create_collection_concurrency.py
- tests/unit_tests/test_build_query_expression.py
- src/pyseekdb/client/schema.py
- tests/integration_tests/test_namespace_hybrid_search_vector.py
- tests/integration_tests/test_namespace_dml_multi_coll.py
- tests/integration_tests/test_namespace_reconnect.py
- tests/integration_tests/test_namespace_hybrid_search_fulltext_multi_coll_multi_ns.py
- tests/integration_tests/test_namespace_upsert_concurrency.py
- tests/integration_tests/test_namespace_query.py
- tests/integration_tests/test_namespace_prewarm.py
- src/pyseekdb/client/collection.py
- src/pyseekdb/client/namespace.py
- tests/integration_tests/test_namespace_get_delete_filters.py
- tests/integration_tests/test_namespace_hybrid_search_fulltext.py
- tests/integration_tests/test_namespace_prewarm_lob.py
- tests/unit_tests/test_get_or_create_collection_concurrency.py
- tests/integration_tests/test_namespace_hybrid_search_multi_coll_multi_ns.py
- tests/integration_tests/test_namespace_hybrid_search_triple_branch_multi_coll_multi_ns.py
- tests/integration_tests/namespace_dml_helpers.py
- tests/integration_tests/test_namespace_drop_validation.py
- tests/unit_tests/test_namespace.py
- tests/integration_tests/namespace_fts_helpers.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pyseekdb/client/document_query_builder.py (1)
149-159: 🎯 Functional Correctness | 🔴 CriticalThe fast path for
$and/$orcombining$containsclauses alters boolean semantics for multi-word terms.The optimization at lines 149-159 (and 173-183) joins all
$containsvalues with spaces and forces adefault_operator.
- Non-optimized path: Generates separate
query_stringclauses for each condition. In Lucene/Elasticsearch, a raw string like"foo bar"without an explicit operator defaults tofoo OR bar. Thus{"$and": [...]}results in(foo OR bar) AND baz.- Optimized path: Joins to
"foo bar baz"and setsdefault_operator: "and". Lucene parses this asfoo AND bar AND baz.This changes the semantics for any multi-word term. The optimization should only apply if every
$containsvalue is a single atomic term (no spaces or operators), or it should be removed to fall back to the correctboolcomposition.Example discrepancy
Input:
{"$and": [{"$contains": "foo bar"}, {"$contains": "baz"}]}
- Expected (current fallback):
(foo OR bar) AND baz- Current Fast Path:
foo AND bar AND baz🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pyseekdb/client/document_query_builder.py` around lines 149 - 159, The `$and`/$`or` fast-path in `document_query_builder` changes semantics for multi-word `$contains` values by collapsing them into one `query_string` and forcing `default_operator`; update the optimization in the `$and` and `$or` branches to only trigger for single-token atomic terms, or remove it and use the existing bool composition fallback. Keep the logic aligned with the surrounding query-building path in `document_query_builder` so multi-word phrases still behave the same as the non-optimized `query_string` clauses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pyseekdb/client/document_query_builder.py`:
- Around line 14-17: The backslash handling in _escape_query_string_term is
duplicated because _QUERY_STRING_RESERVED_RE already matches backslashes, so
remove the manual text.replace("\\", "\\\\") and rely on the regex substitution
alone. Update the logic in _escape_query_string_term so every reserved
character, including backslash, is escaped exactly once and consistently with
the other Lucene query_string characters.
---
Outside diff comments:
In `@src/pyseekdb/client/document_query_builder.py`:
- Around line 149-159: The `$and`/$`or` fast-path in `document_query_builder`
changes semantics for multi-word `$contains` values by collapsing them into one
`query_string` and forcing `default_operator`; update the optimization in the
`$and` and `$or` branches to only trigger for single-token atomic terms, or
remove it and use the existing bool composition fallback. Keep the logic aligned
with the surrounding query-building path in `document_query_builder` so
multi-word phrases still behave the same as the non-optimized `query_string`
clauses.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 78a0309a-8d7d-49fa-8794-0debfa57241b
📒 Files selected for processing (10)
docs/guide/dql.mdsrc/pyseekdb/client/client_base.pysrc/pyseekdb/client/configuration.pysrc/pyseekdb/client/document_query_builder.pysrc/pyseekdb/client/kernel_errors.pytests/integration_tests/namespace_hybrid_search_helpers.pytests/integration_tests/test_namespace_constraints.pytests/integration_tests/test_namespace_query_where_document.pytests/unit_tests/test_document_query_builder.pytests/unit_tests/test_namespace.py
🚧 Files skipped from review as they are similar to previous changes (5)
- src/pyseekdb/client/kernel_errors.py
- tests/integration_tests/test_namespace_query_where_document.py
- tests/unit_tests/test_document_query_builder.py
- src/pyseekdb/client/configuration.py
- tests/unit_tests/test_namespace.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit_tests/test_namespace_no_index_embedding_dim.py (1)
11-47: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd coverage for
Noneembeddings.
_validate_namespace_explicit_embedding_dimensions()now has a dedicatedif vec is None: continuebranch, but these tests only cover full-vector match/mismatch paths. A small regression test here would lock in the intended mixed-payload behavior for both indexed and no-index namespaces.Suggested test
class TestNoIndexExplicitEmbeddingDimension: + def test_ignores_none_embeddings(self): + _validate_namespace_no_index_explicit_embeddings( + [None, [0.0] * 384], + expected_dimension=384, + ) + def test_accepts_matching_dimension(self): _validate_namespace_no_index_explicit_embeddings( [[0.0] * 384, [1.0] * 384], expected_dimension=384, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit_tests/test_namespace_no_index_embedding_dim.py` around lines 11 - 47, Add regression coverage for None embeddings in the existing namespace dimension tests. Extend TestNoIndexExplicitEmbeddingDimension and TestIvfExplicitEmbeddingDimension to call _validate_namespace_no_index_explicit_embeddings and _validate_namespace_explicit_embedding_dimensions with a payload that mixes None and valid vectors, and assert that None entries are skipped while the remaining embeddings still validate correctly. Use the existing helper names and expected_dimension/has_vector_index parameters so the new test locks in the vec is None branch behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit_tests/test_namespace_no_index_embedding_dim.py`:
- Around line 11-47: Add regression coverage for None embeddings in the existing
namespace dimension tests. Extend TestNoIndexExplicitEmbeddingDimension and
TestIvfExplicitEmbeddingDimension to call
_validate_namespace_no_index_explicit_embeddings and
_validate_namespace_explicit_embedding_dimensions with a payload that mixes None
and valid vectors, and assert that None entries are skipped while the remaining
embeddings still validate correctly. Use the existing helper names and
expected_dimension/has_vector_index parameters so the new test locks in the vec
is None branch behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 941c6c59-88ef-4cbc-b37c-81fdf3d6b2e8
📒 Files selected for processing (9)
src/pyseekdb/client/client_base.pysrc/pyseekdb/client/collection.pysrc/pyseekdb/client/configuration.pysrc/pyseekdb/client/namespace.pysrc/pyseekdb/client/validators.pytests/integration_tests/test_namespace_constraints.pytests/integration_tests/test_namespace_optional_indexes.pytests/unit_tests/test_namespace.pytests/unit_tests/test_namespace_no_index_embedding_dim.py
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/integration_tests/test_namespace_optional_indexes.py
- src/pyseekdb/client/collection.py
- src/pyseekdb/client/configuration.py
- src/pyseekdb/client/namespace.py
- tests/unit_tests/test_namespace.py
hnwyllmm
left a comment
There was a problem hiding this comment.
One API ergonomics issue: create_collection(name, use_namespace=True) currently appears to fail on the default path. When schema is omitted, create_collection() first builds the legacy default HNSW schema via _prepare_schema_parameters(), and then _create_namespace_collection() rejects it with ValueError("use_namespace=True only supports IVF index type, HNSW is not allowed").
That error is technically accurate, but confusing for users because they did not explicitly choose HNSW; the SDK chose it as the default. For a namespace-enabled collection, the default path should either construct the namespace-compatible default schema, for example IVF plus the default fulltext config, or fail earlier with an actionable message such as: use_namespace=True requires an IVF schema. The default non-namespace schema is HNSW. Please pass Schema(vector_index=VectorIndexConfig(ivf=IVFConfiguration(...))) or remove use_namespace=True.
I would prefer making the default namespace path work if this is meant to match the Agent Database design shorthand, but at minimum the current error should explain why the default path became HNSW and how users should fix it.
hnwyllmm
left a comment
There was a problem hiding this comment.
Another compatibility concern: namespace record IDs are currently validated with the same identifier-like pattern used for names ([A-Za-z0-9_]). That is stricter than the existing collection path, where _id supports arbitrary string formats and is safely escaped/cast before being written.
For record IDs, real applications may reasonably use UUIDs, file paths, URLs, external document IDs, or other opaque business keys, for example repo/src/foo.py, https://example.com/doc/123, or 550e8400-e29b-41d4-a716-446655440000. These would work in a normal collection today but fail after moving to a namespace collection.
Could we relax namespace record-id validation to match the existing collection semantics, e.g. validate only type/non-empty/length and rely on parameterization/escaping plus JSON storage for safety? The stricter [A-Za-z0-9_] rule seems appropriate for resource names like collection/namespace/ltable names, but too restrictive for per-record business IDs.
hnwyllmm
left a comment
There was a problem hiding this comment.
One more API-safety issue: SparseVectorIndexConfig appears to be silently ignored when creating a namespace-enabled collection.
The design doc says sparse vector support is out of scope for this phase, which is fine. But the public Schema still allows sparse_vector_index=..., and the namespace creation path does not reject it, persist it, or create a sparse index. That means a user can pass a sparse config, receive a successfully created collection, and only later discover that sparse retrieval is not available.
Could we explicitly reject schema.sparse_vector_index is not None when use_namespace=True, with a clear error such as use_namespace=True does not support SparseVectorIndexConfig yet? Silent feature drop is risky here, especially because the same schema works for normal collections.
hnwyllmm
left a comment
There was a problem hiding this comment.
Test coverage is broad, but I think we still need a few public API compatibility tests for the namespace feature. Many current tests exercise validators, DDL fragments, include/n_results handling, or internal helpers such as _create_namespace_collection(). Those are useful, but they do not fully protect the real user-facing behavior.
Could we add tests around the public APIs, for example:
client.create_collection(name, use_namespace=True)with no explicit schema: either succeeds with the intended namespace default schema, or fails with a clear actionable error.client.get_or_create_collection(name, use_namespace=True)when a normal collection with the same name already exists: should not silently return a non-namespace collection; it should raise a type/mode mismatch or otherwise make the behavior explicit.client.create_collection(..., use_namespace=True, schema=...)with unsupported configs such as HNSW orSparseVectorIndexConfig: should fail explicitly rather than silently ignoring unsupported pieces.- Namespace record IDs should preserve ordinary collection compatibility, including IDs like UUIDs, paths, and URLs if we relax the validation.
These tests should go through Client/BaseClient.create_collection() and get_or_create_collection() rather than only calling internal methods, because the current regressions are mostly about public API composition and defaults.
hnwyllmm
left a comment
There was a problem hiding this comment.
The integration coverage is impressive across lifecycle, DML, query, hybrid search, prewarm, drop, concurrency, and session variables. One gap I still see is compatibility and misuse coverage at the integration level.
Most namespace integration helpers appear to create collections with an explicit IVF schema, which is the happy path. Could we also add a small set of integration tests for edge/user-migration behavior, such as:
- Creating a namespace collection through the default public API path, or verifying the error is clear if default schema creation is intentionally unsupported.
- Attempting to reuse a normal collection name with
get_or_create_collection(..., use_namespace=True)and asserting an explicit mode/type mismatch instead of returning the wrong collection type. - Passing unsupported namespace schema options, such as HNSW or sparse vector index, and asserting they fail explicitly.
- Verifying namespace record IDs accept the same practical business-key shapes as normal collections, such as UUIDs, paths, and URLs, if validation is relaxed.
- If multi-LTable is part of the intended design scope, covering more than one logical table under the same namespace; if not, documenting/test-asserting the current single-default-LTable limitation.
These would complement the current happy-path namespace tests and protect backward compatibility for users migrating from normal collections.
hnwyllmm
left a comment
There was a problem hiding this comment.
Could we also add user-facing documentation and example code for the namespace feature before merging?
This PR introduces a fairly large new API surface and some important constraints, so users will need a dedicated guide rather than only tests. I suggest adding a namespace guide that covers:
- Creating a namespace-enabled collection, including the required/recommended schema.
- Creating/getting/listing/deleting namespaces.
- Writing data with
add/update/upsert/deletethrough the namespace API. - Querying and
hybrid_search, includingwhere,where_document,include, and distances. prewarmbehavior and deployment limitations.- Deleting namespaces/collections and the async cleanup semantics if applicable.
- Compatibility and limitations: OceanBase/version requirement, unsupported schema options such as HNSW/sparse vector if they remain unsupported, record ID compatibility, and the current LTable behavior/scope.
A small runnable example under examples or docs would also help a lot, ideally showing the full flow from create_collection(..., use_namespace=True) to namespace CRUD/query and cleanup.
Summary
Solution Description
Summary by CodeRabbit
prewarm, including stricteruse_namespacerouting and validation.IVFConfiguration(IVFIndexType/IVFIndexLib) and addedpartition_count+has_vector_indexsupport.query()includeto supportdistancesand extendedwhere_documentdocs with$regex(and updated logical operator guidance).