fix(postgres): ensure pgvector extension before vector registration#135
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdated Changes
Sequence Diagram(s)sequenceDiagram
participant Test
participant Backend as PostgresEntityBackend
participant DB as psycopg.connect
participant Ext as _ensure_pgvector_extension
participant Reg as register_vector
participant Model as SentenceTransformer
Test->>Backend: instantiate(config)
Backend->>DB: connect(...)
Backend->>Ext: ensure pgvector extension(conn)
alt extension ensured
Backend->>Reg: register_vector(conn)
Backend->>Model: SentenceTransformer(self._settings.embedding_model)
Model-->>Backend: get_sentence_embedding_dimension() -> dim
Backend->>Backend: validate dim -> set self.embedding_dim
Backend-->>Test: initialization complete
else extension/register_vector fails or dim invalid
Reg-->>Backend: raises Exception
Backend->>DB: close()
Backend-->>Test: re-raise Exception
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
evolve/backend/postgres.py (1)
91-93:⚠️ Potential issue | 🟠 Major
details()uses global settings instead of the resolved config.The
__init__method now correctly resolves settings from the passedconfig, butdetails()still hardcodes access to the globalpostgres_db_settingssingleton. If a customPostgresDBSettingsis passed (as shown in context snippet 1 fromevolve_client.py), this method will report the wronghostandport.Consider storing the resolved settings as an instance attribute and using it in
details().🐛 Proposed fix
Store the resolved settings in
__init__:def __init__(self, config: BaseSettings | None = None): super().__init__(config) settings = config if isinstance(config, type(postgres_db_settings)) else postgres_db_settings + self._settings = settings self.conn = psycopg.connect(Then use it in
details():def details(self) -> dict: """Return details about the backend.""" - return {"backend": "postgres", "host": postgres_db_settings.host, "port": postgres_db_settings.port} + return {"backend": "postgres", "host": self._settings.host, "port": self._settings.port}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@evolve/backend/postgres.py` around lines 91 - 93, The details() method is reading the global postgres_db_settings instead of the resolved config; update the class constructor (__init__) to store the resolved PostgresDBSettings on the instance (e.g., self._settings) when you call resolve_postgres_db_settings_from_config(config) and then change details() to return host/port from that instance attribute (self._settings.host, self._settings.port) rather than the global postgres_db_settings so custom settings passed to the backend are reported correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@evolve/backend/postgres.py`:
- Around line 91-93: The details() method is reading the global
postgres_db_settings instead of the resolved config; update the class
constructor (__init__) to store the resolved PostgresDBSettings on the instance
(e.g., self._settings) when you call
resolve_postgres_db_settings_from_config(config) and then change details() to
return host/port from that instance attribute (self._settings.host,
self._settings.port) rather than the global postgres_db_settings so custom
settings passed to the backend are reported correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8eb0f73c-370a-4e2e-8fa2-e1029c2df3dd
📒 Files selected for processing (3)
.secrets.baselineevolve/backend/postgres.pytests/unit/test_postgres_backend.py
| port=5432, | ||
| user="postgres", | ||
| password="postgres", # pragma: allowlist secret | ||
| dbname="kaizen", |
There was a problem hiding this comment.
is it possible to change this name?
db4e9c0 to
d10789f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@evolve/backend/postgres.py`:
- Around line 51-60: The PostgreSQL connection self.conn is opened before
calling _ensure_pgvector_extension(), register_vector(), and
SentenceTransformer() so failures there leak the socket; wrap the
post-connection initialization steps (calls to
self._ensure_pgvector_extension(), register_vector(self.conn), and
SentenceTransformer() initialization) in a try/except (or try/finally) block and
ensure that on any exception you call self.conn.close() (or
self.conn.__exit__/close-equivalent) before re-raising the error so the
connection is not leaked; update the __init__ (or the method that sets
self.conn) to perform this cleanup around those exact symbols.
- Line 61: The Postgres schema hardcodes vector(384) but the embedding model
loaded in __init__ via SentenceTransformer(self._settings.embedding_model) may
produce different dimensions; update __init__ to call
SentenceTransformer.get_sentence_embedding_dimension() after loading the model,
store that dimension on the instance (e.g., self.embedding_dim), and then use
that value in create_namespace() when defining the vector column instead of the
literal 384; also optionally validate in __init__ that the model dimension
matches expectations or raise a clear error so _add_entity()/_update_entity()
won't fail silently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6dfb0118-acb3-45d0-85a5-89ac22298fb4
📒 Files selected for processing (3)
.secrets.baselineevolve/backend/postgres.pytests/unit/test_postgres_backend.py
✅ Files skipped from review due to trivial changes (1)
- .secrets.baseline
|
@gaodan-fang can you address the reviews from code rabbit? |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
evolve/backend/postgres.py (1)
49-49: Avoid silently falling back to global settings on invalid config types.At Line 49, unsupported
configtypes are ignored and replaced withpostgres_db_settings. This can hide caller bugs and accidentally connect to the wrong database. Prefer explicit type validation and fail fast.♻️ Suggested refactor
- self._settings = config if isinstance(config, type(postgres_db_settings)) else postgres_db_settings + if config is None: + self._settings = postgres_db_settings + elif isinstance(config, PostgresDBSettings): + self._settings = config + else: + raise TypeError( + f"`config` must be `PostgresDBSettings | None`, got `{type(config).__name__}`" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@evolve/backend/postgres.py` at line 49, The assignment to self._settings currently silently falls back to postgres_db_settings when config is not the expected type; instead validate config's type explicitly in the constructor (the code touching self._settings, config, postgres_db_settings) and raise a clear TypeError if config is an unsupported type (include the actual type in the error message) so callers fail fast rather than accidentally using the global postgres_db_settings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@evolve/backend/postgres.py`:
- Line 49: The assignment to self._settings currently silently falls back to
postgres_db_settings when config is not the expected type; instead validate
config's type explicitly in the constructor (the code touching self._settings,
config, postgres_db_settings) and raise a clear TypeError if config is an
unsupported type (include the actual type in the error message) so callers fail
fast rather than accidentally using the global postgres_db_settings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 056c68d3-e0fe-41d7-ab4a-f0a61e04eabf
📒 Files selected for processing (2)
evolve/backend/postgres.pytests/unit/test_postgres_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/test_postgres_backend.py
The postgres backend now keeps the resolved settings it was constructed with, so details() reports the actual host and port for custom configurations. A regression test covers both initialization order and custom details output, and the detect-secrets baseline was refreshed by the hook. Constraint: The repository expects Conventional Commits and pre-commit validation before commit Rejected: Leave details() tied to global settings | breaks custom backend reporting Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep backend metadata derived from the resolved instance settings, not module globals Tested: .venv/bin/pytest -v tests/unit/test_postgres_backend.py Tested: .venv/bin/pre-commit run --all-files Not-tested: Full repository test suite
The backend now records the loaded model's embedding dimension during initialization, validates it, and uses that value when creating pgvector tables. Initialization also closes the PostgreSQL connection if post-connect setup fails, and the unit tests now cover custom details reporting, cleanup on setup failure, invalid dimensions, and dimension-aware schema creation. Constraint: The repository expects Conventional Commits and pre-commit validation before commit Rejected: Keep a hardcoded vector width of 384 | breaks custom embedding models Rejected: Discover the dimension lazily during writes | leaves schema creation inconsistent with the loaded model Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep the table vector width derived from the initialized model and preserve connection cleanup on setup failures Tested: .venv/bin/pytest -v tests/unit/test_postgres_backend.py Tested: .venv/bin/pre-commit run --all-files Not-tested: Full repository test suite
|
Do we need to add any configuration in the environment for postgres? https://github.com/AgentToolkit/altk-evolve/blob/main/docs/guides/configuration.md |
The branch rebased cleanly onto origin/main, but detect-secrets updated its generated baseline metadata because tracked line numbers shifted during the replay. This commit captures that generated refresh so the branch stays clean after verification. Constraint: detect-secrets rewrites the baseline when line tracking changes Rejected: Leave the generated baseline unstaged | keeps the rebased branch dirty and fails hooks Confidence: high Scope-risk: narrow Reversibility: clean Directive: Treat .secrets.baseline as generated state and refresh it after history rewrites when hooks request it Tested: .venv/bin/pytest -v tests/unit/test_postgres_backend.py Tested: .venv/bin/pre-commit run --all-files (before baseline refresh, hook requested generated update) Not-tested: Full repository test suite
a4cde37 to
aeba1cd
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
evolve/backend/postgres.py (1)
49-49: Prefer using the class name directly inisinstance()check.Using
type(postgres_db_settings)is less readable than usingPostgresDBSettingsdirectly. Since the class is already imported at line 14, use it for clarity.🔧 Suggested improvement
- self._settings = config if isinstance(config, type(postgres_db_settings)) else postgres_db_settings + self._settings = config if isinstance(config, PostgresDBSettings) else postgres_db_settings🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@evolve/backend/postgres.py` at line 49, Change the isinstance check to use the class name directly: replace the use of type(postgres_db_settings) with PostgresDBSettings when assigning self._settings (i.e., evaluate config with isinstance(config, PostgresDBSettings) and fall back to postgres_db_settings otherwise) so the condition is more readable and explicit; update the expression that sets self._settings accordingly (references: self._settings, config, postgres_db_settings, PostgresDBSettings).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@evolve/backend/postgres.py`:
- Line 49: Change the isinstance check to use the class name directly: replace
the use of type(postgres_db_settings) with PostgresDBSettings when assigning
self._settings (i.e., evaluate config with isinstance(config,
PostgresDBSettings) and fall back to postgres_db_settings otherwise) so the
condition is more readable and explicit; update the expression that sets
self._settings accordingly (references: self._settings, config,
postgres_db_settings, PostgresDBSettings).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3f6b2b6e-a461-4a90-a2a3-62d19156ecc9
📒 Files selected for processing (3)
.secrets.baselineevolve/backend/postgres.pytests/unit/test_postgres_backend.py
✅ Files skipped from review due to trivial changes (1)
- .secrets.baseline
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/test_postgres_backend.py
The configuration guide now reflects that postgres is a supported backend and documents its separate EVOLVE_PG_* environment variables instead of implying the Milvus settings apply to it. Constraint: Documentation must match the current backend options exposed by the code Rejected: Reuse the Milvus settings section for Postgres | variable names and semantics differ Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep backend-specific config docs aligned with the actual settings classes and prefixes Tested: .venv/bin/pre-commit run --files docs/guides/configuration.md Not-tested: Full repository test suite
yes, we should, updated it |
Summary by CodeRabbit
Refactor
Tests
Chores