Skip to content

fix(dab): restart dab-postgres on-failure (mirror the dab-mongo fix)#12

Merged
kentwelcome merged 1 commit into
mainfrom
fix/dab-postgres-restart-on-failure
Jun 16, 2026
Merged

fix(dab): restart dab-postgres on-failure (mirror the dab-mongo fix)#12
kentwelcome merged 1 commit into
mainfrom
fix/dab-postgres-restart-on-failure

Conversation

@kentwelcome

Copy link
Copy Markdown
Contributor

Problem

The dab-postgres service in the DAB plugin's compose generation has a pg_isready healthcheck and a service_healthy startup gate, but no restart: policy — unlike dab-mongo, which already has restart: on-failure.

When postgres:17 dies mid-run (crash/OOM), it is never restarted, the container leaves dab-net, and dab-postgres stops resolving for the rest of the trial (could not translate host name "dab-postgres": Temporary failure in name resolution). For hybrid Postgres+DuckDB datasets like PANCANCER_ATLAS, the clinical data then becomes unreachable and the agent can only abstain.

Observed impact

In a codex/gpt-5.5 full DAB run, PANCANCER_ATLAS q2 and q3 both returned UNABLE TO DETERMINE (reward 0). Trial evidence:

  • pre-run healthcheck passed (dab-postgres:5432 reachable at start),
  • mid-run the worker hit repeated could not translate host name "dab-postgres" on every psycopg2 retry,
  • a sibling worker (q1) in the same run reached dab-postgres fine minutes earlier → transient, restart-drops-DNS signature.

These two queries alone account for ~0.056 of stratified pass@1 (≈ the whole observed gap vs the Opus baseline on this surface).

Fix

Add "restart": "on-failure" to the dab-postgres service, mirroring the existing dab-mongo fix. The populated data dir comes straight back up and main's service_healthy gate recovers instead of the trial losing the DB for good. No other change — the healthcheck and depends_on: {condition: service_healthy} were already present.

Test

Adds test_postgres_has_restart_policy (mirrors test_mongo_has_restart_and_cache_cap). test_compose_postgres.py + test_compose_mongo.py → 11 passed.

🤖 Generated with Claude Code

…lose the DB

The dab-postgres service had a pg_isready healthcheck and a service_healthy
startup gate but no restart policy (unlike dab-mongo). When postgres:17 dies
mid-run (crash/OOM) it is never restarted, the container leaves dab-net, and
`dab-postgres` stops resolving for the rest of the trial ('could not translate
host name dab-postgres'). For hybrid pg+duckdb datasets like PANCANCER_ATLAS the
clinical data then becomes unreachable and the agent can only abstain — observed
on PANCANCER_ATLAS q2/q3 (both 'UNABLE TO DETERMINE', reward 0), which alone
account for ~0.056 of stratified pass@1.

Add 'restart: on-failure' to dab-postgres, mirroring the dab-mongo fix, so the
populated data dir comes straight back up and main's healthcheck recovers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 16, 2026 16:50

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the reliability of the DAB plugin’s generated docker-compose stack by ensuring the dab-postgres container is automatically restarted if it crashes mid-run, matching the existing behavior for dab-mongo.

Changes:

  • Add restart: on-failure to the generated dab-postgres service in compose.py.
  • Add a unit test asserting the postgres service includes the restart policy.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/razorback-plugin-dab/src/razorback_plugin_dab/generate/compose.py Adds restart: on-failure to the generated dab-postgres service to recover from mid-run crashes/OOMs.
packages/razorback-plugin-dab/tests/unit/test_compose_postgres.py Adds a unit test to enforce the postgres restart policy in emitted compose YAML.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kentwelcome kentwelcome merged commit f4438e3 into main Jun 16, 2026
1 check passed
@kentwelcome kentwelcome deleted the fix/dab-postgres-restart-on-failure branch June 16, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants