Skip to content

fix(postgres): ensure pgvector extension before vector registration#135

Merged
visahak merged 8 commits into
mainfrom
postgres-fix
Apr 2, 2026
Merged

fix(postgres): ensure pgvector extension before vector registration#135
visahak merged 8 commits into
mainfrom
postgres-fix

Conversation

@gaodan-fang

@gaodan-fang gaodan-fang commented Apr 2, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • Refactor

    • PostgreSQL backend startup improved: resolved configuration drives embedding model selection, embedding dimension determined at runtime, initialization order updated, and connection cleanup added on failures.
  • Tests

    • New unit tests cover backend initialization sequencing, failure cleanup behavior, and invalid embedding-dimension handling.
  • Chores

    • Updated secrets baseline timestamp and adjusted line mapping for two existing entries.

@coderabbitai

coderabbitai Bot commented Apr 2, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Updated .secrets.baseline metadata timestamp and a test result line number; refactored Postgres backend to store resolved DB settings, ensure pgvector extension before registering vectors, derive and validate embedding dimension at runtime, close connections on init failures, and added unit tests covering these behaviors.

Changes

Cohort / File(s) Summary
Secret Baseline Update
​.secrets.baseline
Bumped generated_at timestamp and adjusted one results line_number (tests/unit/test_extract_trajectories.py 408→407).
Postgres Backend Initialization
evolve/backend/postgres.py
Store resolved PostgresDBSettings on self._settings; remove module EMBEDDING_DIM; call _ensure_pgvector_extension() before register_vector(self.conn); instantiate SentenceTransformer from self._settings.embedding_model; obtain and validate self.embedding_dim via get_sentence_embedding_dimension(); ensure connection is closed if init fails.
Backend Initialization Tests
tests/unit/test_postgres_backend.py
Added tests validating init sequence with PostgresDBSettings, embedding model usage, embedding-dimension validation (raises on invalid/None), and that connection is closed on register_vector failure; updated fixture to mock embedding-dimension and modified test_create_namespace to assert psycopg.sql.Literal is called with postgres_backend.embedding_dim.

Sequence Diagram(s)

sequenceDiagram
    participant Test
    participant Backend as PostgresEntityBackend
    participant DB as psycopg.connect
    participant Ext as _ensure_pgvector_extension
    participant Reg as register_vector
    participant Model as SentenceTransformer

    Test->>Backend: instantiate(config)
    Backend->>DB: connect(...)
    Backend->>Ext: ensure pgvector extension(conn)
    alt extension ensured
        Backend->>Reg: register_vector(conn)
        Backend->>Model: SentenceTransformer(self._settings.embedding_model)
        Model-->>Backend: get_sentence_embedding_dimension() -> dim
        Backend->>Backend: validate dim -> set self.embedding_dim
        Backend-->>Test: initialization complete
    else extension/register_vector fails or dim invalid
        Reg-->>Backend: raises Exception
        Backend->>DB: close()
        Backend-->>Test: re-raise Exception
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through configs, checked a line,
Stored settings snug where keys align.
I prodded pgvector, then called the model,
Measured a vector, then closed the wobbled throttle.
Tests tapped paws — the init held fine.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: ensuring pgvector extension is set up before vector registration in PostgreSQL initialization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch postgres-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
evolve/backend/postgres.py (1)

91-93: ⚠️ Potential issue | 🟠 Major

details() uses global settings instead of the resolved config.

The __init__ method now correctly resolves settings from the passed config, but details() still hardcodes access to the global postgres_db_settings singleton. If a custom PostgresDBSettings is passed (as shown in context snippet 1 from evolve_client.py), this method will report the wrong host and port.

Consider storing the resolved settings as an instance attribute and using it in details().

🐛 Proposed fix

Store the resolved settings in __init__:

     def __init__(self, config: BaseSettings | None = None):
         super().__init__(config)
         settings = config if isinstance(config, type(postgres_db_settings)) else postgres_db_settings
+        self._settings = settings
         self.conn = psycopg.connect(

Then use it in details():

     def details(self) -> dict:
         """Return details about the backend."""
-        return {"backend": "postgres", "host": postgres_db_settings.host, "port": postgres_db_settings.port}
+        return {"backend": "postgres", "host": self._settings.host, "port": self._settings.port}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@evolve/backend/postgres.py` around lines 91 - 93, The details() method is
reading the global postgres_db_settings instead of the resolved config; update
the class constructor (__init__) to store the resolved PostgresDBSettings on the
instance (e.g., self._settings) when you call
resolve_postgres_db_settings_from_config(config) and then change details() to
return host/port from that instance attribute (self._settings.host,
self._settings.port) rather than the global postgres_db_settings so custom
settings passed to the backend are reported correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@evolve/backend/postgres.py`:
- Around line 91-93: The details() method is reading the global
postgres_db_settings instead of the resolved config; update the class
constructor (__init__) to store the resolved PostgresDBSettings on the instance
(e.g., self._settings) when you call
resolve_postgres_db_settings_from_config(config) and then change details() to
return host/port from that instance attribute (self._settings.host,
self._settings.port) rather than the global postgres_db_settings so custom
settings passed to the backend are reported correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8eb0f73c-370a-4e2e-8fa2-e1029c2df3dd

📥 Commits

Reviewing files that changed from the base of the PR and between 5652d4b and 476c895.

📒 Files selected for processing (3)
  • .secrets.baseline
  • evolve/backend/postgres.py
  • tests/unit/test_postgres_backend.py

Comment thread tests/unit/test_postgres_backend.py Outdated
port=5432,
user="postgres",
password="postgres", # pragma: allowlist secret
dbname="kaizen",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to change this name?

visahak
visahak previously approved these changes Apr 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@evolve/backend/postgres.py`:
- Around line 51-60: The PostgreSQL connection self.conn is opened before
calling _ensure_pgvector_extension(), register_vector(), and
SentenceTransformer() so failures there leak the socket; wrap the
post-connection initialization steps (calls to
self._ensure_pgvector_extension(), register_vector(self.conn), and
SentenceTransformer() initialization) in a try/except (or try/finally) block and
ensure that on any exception you call self.conn.close() (or
self.conn.__exit__/close-equivalent) before re-raising the error so the
connection is not leaked; update the __init__ (or the method that sets
self.conn) to perform this cleanup around those exact symbols.
- Line 61: The Postgres schema hardcodes vector(384) but the embedding model
loaded in __init__ via SentenceTransformer(self._settings.embedding_model) may
produce different dimensions; update __init__ to call
SentenceTransformer.get_sentence_embedding_dimension() after loading the model,
store that dimension on the instance (e.g., self.embedding_dim), and then use
that value in create_namespace() when defining the vector column instead of the
literal 384; also optionally validate in __init__ that the model dimension
matches expectations or raise a clear error so _add_entity()/_update_entity()
won't fail silently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6dfb0118-acb3-45d0-85a5-89ac22298fb4

📥 Commits

Reviewing files that changed from the base of the PR and between d10789f and f3b1127.

📒 Files selected for processing (3)
  • .secrets.baseline
  • evolve/backend/postgres.py
  • tests/unit/test_postgres_backend.py
✅ Files skipped from review due to trivial changes (1)
  • .secrets.baseline

Comment thread evolve/backend/postgres.py Outdated
Comment thread evolve/backend/postgres.py Outdated
@visahak

visahak commented Apr 2, 2026

Copy link
Copy Markdown
Collaborator

@gaodan-fang can you address the reviews from code rabbit?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
evolve/backend/postgres.py (1)

49-49: Avoid silently falling back to global settings on invalid config types.

At Line 49, unsupported config types are ignored and replaced with postgres_db_settings. This can hide caller bugs and accidentally connect to the wrong database. Prefer explicit type validation and fail fast.

♻️ Suggested refactor
-        self._settings = config if isinstance(config, type(postgres_db_settings)) else postgres_db_settings
+        if config is None:
+            self._settings = postgres_db_settings
+        elif isinstance(config, PostgresDBSettings):
+            self._settings = config
+        else:
+            raise TypeError(
+                f"`config` must be `PostgresDBSettings | None`, got `{type(config).__name__}`"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@evolve/backend/postgres.py` at line 49, The assignment to self._settings
currently silently falls back to postgres_db_settings when config is not the
expected type; instead validate config's type explicitly in the constructor (the
code touching self._settings, config, postgres_db_settings) and raise a clear
TypeError if config is an unsupported type (include the actual type in the error
message) so callers fail fast rather than accidentally using the global
postgres_db_settings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@evolve/backend/postgres.py`:
- Line 49: The assignment to self._settings currently silently falls back to
postgres_db_settings when config is not the expected type; instead validate
config's type explicitly in the constructor (the code touching self._settings,
config, postgres_db_settings) and raise a clear TypeError if config is an
unsupported type (include the actual type in the error message) so callers fail
fast rather than accidentally using the global postgres_db_settings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 056c68d3-e0fe-41d7-ab4a-f0a61e04eabf

📥 Commits

Reviewing files that changed from the base of the PR and between f3b1127 and a4cde37.

📒 Files selected for processing (2)
  • evolve/backend/postgres.py
  • tests/unit/test_postgres_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/test_postgres_backend.py

The postgres backend now keeps the resolved settings it was constructed with, so details() reports the actual host and port for custom configurations. A regression test covers both initialization order and custom details output, and the detect-secrets baseline was refreshed by the hook.

Constraint: The repository expects Conventional Commits and pre-commit validation before commit
Rejected: Leave details() tied to global settings | breaks custom backend reporting
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep backend metadata derived from the resolved instance settings, not module globals
Tested: .venv/bin/pytest -v tests/unit/test_postgres_backend.py
Tested: .venv/bin/pre-commit run --all-files
Not-tested: Full repository test suite
The backend now records the loaded model's embedding dimension during initialization, validates it, and uses that value when creating pgvector tables. Initialization also closes the PostgreSQL connection if post-connect setup fails, and the unit tests now cover custom details reporting, cleanup on setup failure, invalid dimensions, and dimension-aware schema creation.

Constraint: The repository expects Conventional Commits and pre-commit validation before commit
Rejected: Keep a hardcoded vector width of 384 | breaks custom embedding models
Rejected: Discover the dimension lazily during writes | leaves schema creation inconsistent with the loaded model
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep the table vector width derived from the initialized model and preserve connection cleanup on setup failures
Tested: .venv/bin/pytest -v tests/unit/test_postgres_backend.py
Tested: .venv/bin/pre-commit run --all-files
Not-tested: Full repository test suite
@visahak

visahak commented Apr 2, 2026

Copy link
Copy Markdown
Collaborator

Do we need to add any configuration in the environment for postgres? https://github.com/AgentToolkit/altk-evolve/blob/main/docs/guides/configuration.md

The branch rebased cleanly onto origin/main, but detect-secrets updated its generated baseline metadata because tracked line numbers shifted during the replay. This commit captures that generated refresh so the branch stays clean after verification.

Constraint: detect-secrets rewrites the baseline when line tracking changes
Rejected: Leave the generated baseline unstaged | keeps the rebased branch dirty and fails hooks
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Treat .secrets.baseline as generated state and refresh it after history rewrites when hooks request it
Tested: .venv/bin/pytest -v tests/unit/test_postgres_backend.py
Tested: .venv/bin/pre-commit run --all-files (before baseline refresh, hook requested generated update)
Not-tested: Full repository test suite

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
evolve/backend/postgres.py (1)

49-49: Prefer using the class name directly in isinstance() check.

Using type(postgres_db_settings) is less readable than using PostgresDBSettings directly. Since the class is already imported at line 14, use it for clarity.

🔧 Suggested improvement
-        self._settings = config if isinstance(config, type(postgres_db_settings)) else postgres_db_settings
+        self._settings = config if isinstance(config, PostgresDBSettings) else postgres_db_settings
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@evolve/backend/postgres.py` at line 49, Change the isinstance check to use
the class name directly: replace the use of type(postgres_db_settings) with
PostgresDBSettings when assigning self._settings (i.e., evaluate config with
isinstance(config, PostgresDBSettings) and fall back to postgres_db_settings
otherwise) so the condition is more readable and explicit; update the expression
that sets self._settings accordingly (references: self._settings, config,
postgres_db_settings, PostgresDBSettings).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@evolve/backend/postgres.py`:
- Line 49: Change the isinstance check to use the class name directly: replace
the use of type(postgres_db_settings) with PostgresDBSettings when assigning
self._settings (i.e., evaluate config with isinstance(config,
PostgresDBSettings) and fall back to postgres_db_settings otherwise) so the
condition is more readable and explicit; update the expression that sets
self._settings accordingly (references: self._settings, config,
postgres_db_settings, PostgresDBSettings).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f6b2b6e-a461-4a90-a2a3-62d19156ecc9

📥 Commits

Reviewing files that changed from the base of the PR and between a4cde37 and aeba1cd.

📒 Files selected for processing (3)
  • .secrets.baseline
  • evolve/backend/postgres.py
  • tests/unit/test_postgres_backend.py
✅ Files skipped from review due to trivial changes (1)
  • .secrets.baseline
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/test_postgres_backend.py

visahak and others added 2 commits April 2, 2026 16:38
The configuration guide now reflects that postgres is a supported backend and documents its separate EVOLVE_PG_* environment variables instead of implying the Milvus settings apply to it.

Constraint: Documentation must match the current backend options exposed by the code
Rejected: Reuse the Milvus settings section for Postgres | variable names and semantics differ
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep backend-specific config docs aligned with the actual settings classes and prefixes
Tested: .venv/bin/pre-commit run --files docs/guides/configuration.md
Not-tested: Full repository test suite
@gaodan-fang

Copy link
Copy Markdown
Collaborator Author

Do we need to add any configuration in the environment for postgres? https://github.com/AgentToolkit/altk-evolve/blob/main/docs/guides/configuration.md

yes, we should, updated it

@visahak visahak self-requested a review April 2, 2026 23:39

@visahak visahak left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@visahak visahak merged commit a6b7b33 into main Apr 2, 2026
16 checks passed
@visahak visahak deleted the postgres-fix branch April 2, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants