Skip to content

fix(core): add busy_timeout + WAL to workspace SQLite connections (#648)#686

Merged
frankbria merged 5 commits into
mainfrom
fix/648-workspace-sqlite-busy-timeout-wal
Jun 15, 2026
Merged

fix(core): add busy_timeout + WAL to workspace SQLite connections (#648)#686
frankbria merged 5 commits into
mainfrom
fix/648-workspace-sqlite-busy-timeout-wal

Conversation

@frankbria

@frankbria frankbria commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #648. Core workspace SQLite connections used bare sqlite3.connect() with no busy_timeout and no WAL journaling. Under parallel batch execution (multiple subprocesses + server background agent threads writing the same workspace DB), a concurrent writer got an immediate database is locked OperationalError instead of waiting. This brings core in line with platform_store/database.py, which already sets WAL + busy_timeout=5000.

This PR also fixes the failing Deploy workflow (per request) — see the second section.

Changes (#648)

  • core/workspace.py: new private _open_db() helper applying PRAGMA journal_mode=WAL + PRAGMA busy_timeout=5000. All 6 direct connects (_init_database, _ensure_schema_upgrades, create_or_load_workspace, get_workspace, get_db_connection, update_workspace_tech_stack) route through it.
    • Deviation from the CodeRabbit plan: the helper passes db_path through unchanged rather than str(db_path). str() would coerce a non-path (e.g. a test's MagicMock) into a literal filename and silently create a junk DB file instead of raising — diverging from the prior sqlite3.connect() semantics. (str() was safe in platform_store only because its path is always real.)
  • adapters/e2b/budget.py: record_cloud_run / get_cloud_run use get_db_connection(workspace).
  • ui/routers/costs_v2.py: _query_costs / _open_workspace_conn use the shared _open_db(db_path) helper (these take db_path: str, not a Workspace, so routing through get_db_connection(workspace) as the plan suggested wasn't applicable).
  • Test: TestConcurrentWriters in tests/core/test_workspace.py — a ThreadPoolExecutor parallel-writers test (8 threads × 15 writes) that asserts no OperationalError and that all writes persist, plus a PRAGMA assertion. Verified RED before the fix (reproduced database is locked), GREEN after.

Bonus: fix failing staging/production Deploy

The Deploy workflow has been failing the health check on every push to main since #643 merged. Root cause (reproduced locally): #643 made the backend hard-fail on startup with the default AUTH_SECRET when auth is enabled (the default), but deploy.yml never wrote AUTH_SECRET into the generated .env.staging/.env.production. The server crash-looped under PM2, so /health never responded → "Verify deployment" failed after 12 attempts.

  • Write AUTH_SECRET (from the existing AUTH_SECRET GitHub Actions secret) into the generated env files for both staging and production.
  • Fail fast with a clear message if the secret is unset/empty, instead of surfacing as a 60s health-check timeout.
  • Document AUTH_SECRET in .env.staging.example.

Acceptance criteria

  • Core workspace connections set busy_timeout and WAL.
  • A concurrency test (parallel writers) no longer raises immediate database is locked.

Testing

  • uv run pytest tests/core tests/adapters tests/integration2330 passed, 2 skipped, no stray files created.
  • uv run ruff check → clean.
  • Deploy fix verified by reproducing the startup RuntimeError locally under staging-like config (no AUTH_SECRET, auth default-on, self-hosted).

Known limitations

  • The new concurrency test exercises WAL + busy_timeout via threads; it does not spawn separate OS processes (the production multi-writer scenario), but the connection-level pragmas are identical across both.
  • The Deploy fix depends on the AUTH_SECRET repo secret already being set (it is, since 2026-01-03); the new guard makes a missing secret fail loudly.

Summary by CodeRabbit

  • Documentation

    • Updated the staging environment configuration example with required AUTH_SECRET guidance, including secure random generation notes.
  • Chores

    • Staging and production deployments now inject the authentication secret and fail fast if it’s missing or blank.
    • Improved workspace SQLite connection handling (WAL + busy timeout) for more reliable budget/cost access and concurrent writer behavior.
  • Tests

    • Added concurrency tests to verify the journaling/timeout configuration and that concurrent writers wait instead of failing.

Core workspace connections used bare sqlite3.connect() with no busy_timeout
and no WAL journaling, so under parallel batch execution a concurrent writer
got an immediate "database is locked" OperationalError instead of waiting.
This matches platform_store/database.py, which already sets WAL + 5s timeout.

- Add private _open_db() helper in workspace.py (WAL + busy_timeout=5000) and
  route all 6 direct connects + get_db_connection through it. Pass db_path
  unchanged (no str() coercion) to preserve the prior connect semantics.
- Route external callers through the shared path: budget.py via
  get_db_connection(workspace); costs_v2.py via _open_db(db_path).
- Add a ThreadPoolExecutor parallel-writers test asserting no immediate
  database-locked error and that all writes persist.

Also fix the staging/production Deploy workflow, which has been failing the
health check since #643: the backend now hard-fails on the default AUTH_SECRET
when auth is enabled (the default), but deploy.yml never wrote AUTH_SECRET into
the generated env file. Write AUTH_SECRET (from the existing GitHub secret) into
.env.staging/.env.production, fail fast with a clear message when it is unset,
and document it in .env.staging.example.
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces a centralized _open_db SQLite helper in workspace.py that applies journal_mode=WAL and busy_timeout=5000ms to every connection, replacing all direct sqlite3.connect calls across workspace, budget, and costs_v2 modules. Adds concurrency tests. Separately, injects AUTH_SECRET from a GitHub secret into staging and production environment files during deployment, with a fail-fast guard when the secret is absent.

Changes

SQLite WAL and busy_timeout centralization

Layer / File(s) Summary
_open_db helper and workspace.py adoption
codeframe/core/workspace.py
Adds _open_db(db_path) applying journal_mode=WAL and busy_timeout=5000ms, then replaces all direct sqlite3.connect calls in _init_database, _ensure_schema_upgrades, create_or_load_workspace, get_workspace, get_db_connection, and update_workspace_tech_stack. Exports new get_db_connection_by_path for path-based access.
budget.py and costs_v2.py adopt shared helpers
codeframe/adapters/e2b/budget.py, codeframe/ui/routers/costs_v2.py
budget.py replaces direct sqlite3.connect with get_db_connection in record_cloud_run and get_cloud_run; costs_v2.py replaces direct sqlite3.connect with get_db_connection_by_path in _query_costs and _open_workspace_conn.
Concurrency tests
tests/core/test_workspace.py
Adds v2 marker and TestConcurrentWriters class with a PRAGMA assertion test verifying WAL mode and 5s busy_timeout, and a deterministic lock-contention test where a holder thread acquires a write lock and a waiter thread verifies it blocks instead of raising database is locked, with both writes succeeding and final row count 2.

AUTH_SECRET deployment injection

Layer / File(s) Summary
AUTH_SECRET example entry
.env.staging.example
Adds AUTH_SECRET placeholder with generation instructions and a note that the backend hard-fails on startup when it is unset.
CI/CD fail-fast guard and env-file injection
.github/workflows/deploy.yml
Exports ENV_AUTH_SECRET from the GitHub secret for both staging and production "Create environment file" steps, adds bash fail-fast guards that abort deployment when the value is empty or whitespace, and appends AUTH_SECRET=${ENV_AUTH_SECRET} to the generated .env.staging and .env.production files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • frankbria/codeframe#95: Both PRs modify .github/workflows/deploy.yml to generate staging and production .env files and gate deployments on required environment variable values.

Poem

🐇 Hop hop, no more locked rows at night,
WAL mode shines like a warm morning light.
Five seconds to wait before giving up the fight,
And AUTH_SECRET guards each deploy just right.
The rabbit writes tests so the threads won't bite! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(core): add busy_timeout + WAL to workspace SQLite connections (#648)' accurately and specifically describes the primary change in the PR.
Linked Issues check ✅ Passed All acceptance criteria from issue #648 are met: core workspace connections now set busy_timeout and WAL via _open_db() helper, and TestConcurrentWriters validates parallel writers no longer fail with database locked errors.
Out of Scope Changes check ✅ Passed The Deploy workflow AUTH_SECRET fix in .env.staging.example and deploy.yml is related to issue context but not explicitly required by #648 acceptance criteria; however, it is justified as a prerequisite for testing parallel batch execution in production.
Docstring Coverage ✅ Passed Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📋 Issue Planner

Built with CodeRabbit's Coding Plans for faster development and fewer bugs.

View plan used: #648

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/648-workspace-sqlite-busy-timeout-wal

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

Code Review — fix(core): busy_timeout + WAL for workspace SQLite (#648)

This is a clean, well-scoped fix. The root cause is correctly identified, the centralisation via _open_db() is the right pattern, and the parallel-writers test gives real regression coverage. A few items worth calling out:


Correctness / Architecture

costs_v2.py imports a private symbol from core (_open_db)

codeframe/ui/routers/costs_v2.py now does:

from codeframe.core.workspace import Workspace, _open_db

The PR description explains why get_db_connection(workspace) wasn't directly usable here (these helpers take db_path: str, not a Workspace), but importing a name-mangled private function across module boundaries is a code smell — it will silently break if _open_db is ever renamed or inlined.

The simplest fix: drop the leading underscore and make it open_db(). It is already used in two separate packages, so it is effectively public API. Alternatively, add a thin public wrapper get_db_connection_by_path(db_path) -> sqlite3.Connection in workspace.py and import that instead.


Test coverage

Missing @pytest.mark.v2 on the new class (if not set module-wide)

CLAUDE.md requires @pytest.mark.v2 on new v2 tests. The diff does not show the top of tests/core/test_workspace.py, so I cannot confirm whether pytestmark = pytest.mark.v2 is already set at module level. If it is not, TestConcurrentWriters should carry it explicitly:

class TestConcurrentWriters:
    pytestmark = pytest.mark.v2

Thread-based test vs. process-based reality (acknowledged)

The PR description notes this limitation clearly — threads share a process and the GIL, so contention is lighter than the production multi-process scenario. The 8-thread x 15-write x 1ms sleep approach is a reasonable proxy. Consider adding a one-line comment inside the test itself referencing this, so future readers do not assume it covers multi-process contention.

OperationalError catch scope is broad

except sqlite3.OperationalError as exc:  # pragma: no cover - failure path
    errors.append(exc)

Any OperationalError (schema error, disk full) would appear here as a WAL failure. Narrowing to 'database is locked' in str(exc) before appending — or at least including that check in the final assert not errors message — would give a clearer diagnostic if the test ever fails for an unrelated reason.


Deploy workflow

The fail-fast guard is correct and the error message is actionable. The check is duplicated verbatim across the staging and production jobs, which is fine for two environments; a composite action would be worth it if a third environment is ever added.


What is good

  • _open_db() passes db_path through unchanged — the MagicMock reasoning in the comment is exactly right.
  • All 6 callsites in workspace.py are migrated consistently.
  • budget.py correctly reuses the public get_db_connection(workspace) rather than reaching into _open_db directly.
  • WAL + busy_timeout settings mirror platform_store/database.py, keeping the two codepaths in sync.
  • Test was verified RED before the fix and GREEN after — that is the right bar.

Verdict

Approve with minor follow-up suggested. The one item worth addressing before merge is making _open_db public (rename to open_db) since it already crosses module boundaries into costs_v2.py. Everything else is solid.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/deploy.yml (1)

83-91: 💤 Low value

Consider enhancing the fail-fast guard to catch whitespace-only secrets.

The current check uses -z "${ENV_AUTH_SECRET}" which only catches truly empty values. Whitespace-only secrets would pass this CI check but fail later at backend startup when _read_auth_secret() strips the whitespace and falls back to DEFAULT_SECRET (see codeframe/auth/manager.py:26-28). While the backend will still refuse to start insecurely, catching it earlier in CI would provide clearer feedback.

🛡️ Optional enhancement to catch whitespace-only secrets

For staging (line 86):

-          if [ -z "${ENV_AUTH_SECRET}" ]; then
+          if [ -z "${ENV_AUTH_SECRET}" ] || [ -z "${ENV_AUTH_SECRET// /}" ]; then
             echo "❌ AUTH_SECRET GitHub Actions secret is not set (or empty)."

For production (line 314):

-          if [ -z "${ENV_AUTH_SECRET}" ]; then
+          if [ -z "${ENV_AUTH_SECRET}" ] || [ -z "${ENV_AUTH_SECRET// /}" ]; then
             echo "❌ AUTH_SECRET GitHub Actions secret is not set (or empty)."

Or use a more readable pattern:

if [ -z "${ENV_AUTH_SECRET}" ] || ! echo "${ENV_AUTH_SECRET}" | grep -q '[^[:space:]]'; then

Based on learnings from codeframe/auth/manager.py, which treats blank/whitespace AUTH_SECRET values as unset and falls back to the default secret, making the startup guard catch them as a hard-fail in hosted mode.

Also applies to: 311-319

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/deploy.yml around lines 83 - 91, The ENV_AUTH_SECRET
validation check in both the staging and production deployment sections
currently only catches empty values using the `-z` test, but whitespace-only
secrets would pass this check and fail later when the backend's
_read_auth_secret() function strips whitespace and falls back to DEFAULT_SECRET.
Enhance the condition at the staging check (around line 86) and the production
check (around line 314) to also detect whitespace-only secrets by adding an
additional test that verifies the secret contains at least one non-whitespace
character, either by using a grep pattern like `! echo "${ENV_AUTH_SECRET}" |
grep -q '[^[:space:]]'` or another suitable bash pattern that validates the
secret is not just whitespace before proceeding with the deployment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/core/test_workspace.py`:
- Around line 180-249: The TestConcurrentWriters class at the beginning of the
diff is missing the required pytest v2 marker decorator. Add the `@pytest.mark.v2`
decorator to the TestConcurrentWriters class to ensure these concurrency
regression tests are included when running tests with marker-based selection
(e.g., uv run pytest -m v2), as required by the coding guidelines for Python
tests in the tests directory.

---

Nitpick comments:
In @.github/workflows/deploy.yml:
- Around line 83-91: The ENV_AUTH_SECRET validation check in both the staging
and production deployment sections currently only catches empty values using the
`-z` test, but whitespace-only secrets would pass this check and fail later when
the backend's _read_auth_secret() function strips whitespace and falls back to
DEFAULT_SECRET. Enhance the condition at the staging check (around line 86) and
the production check (around line 314) to also detect whitespace-only secrets by
adding an additional test that verifies the secret contains at least one
non-whitespace character, either by using a grep pattern like `! echo
"${ENV_AUTH_SECRET}" | grep -q '[^[:space:]]'` or another suitable bash pattern
that validates the secret is not just whitespace before proceeding with the
deployment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9a915826-9c38-45ca-bf0e-dd180f7e96dd

📥 Commits

Reviewing files that changed from the base of the PR and between bb393b7 and 219df06.

📒 Files selected for processing (6)
  • .env.staging.example
  • .github/workflows/deploy.yml
  • codeframe/adapters/e2b/budget.py
  • codeframe/core/workspace.py
  • codeframe/ui/routers/costs_v2.py
  • tests/core/test_workspace.py

Comment thread tests/core/test_workspace.py Outdated
The original ThreadPoolExecutor stress test (8 writers) was flaky: SQLite's
busy handler is not fair, so under heavy contention one writer can be starved
past the 5s busy_timeout and still raise "database is locked" — even with the
fix. Investigation also showed Python's sqlite3.connect already defaults to a
5s busy_timeout, so the substantive fix here is enabling WAL journaling (which
removes the rollback-journal reader/writer case that fails immediately).

Replace the stress test with a deterministic test: one thread holds the write
lock (BEGIN IMMEDIATE) briefly while a second writer must wait for it and then
succeed. Update the _open_db docstring to reflect that WAL is the real change.
Addresses CodeRabbit review: _read_auth_secret() treats blank/whitespace-only
AUTH_SECRET as unset and falls back to the default secret, so a whitespace-only
GitHub secret would pass the `-z` guard yet still hard-fail the backend at
startup. Catch it in CI for clearer feedback (both staging and production).
@frankbria

Copy link
Copy Markdown
Owner Author

Thanks @coderabbitai — triaged both findings:

1. @pytest.mark.v2 on TestConcurrentWriters — skipping (false positive for this repo).
tests/core/conftest.py already auto-marks every test in tests/core/ as v2 via pytest_collection_modifyitems. Verified: uv run pytest tests/core/test_workspace.py -m v2 --co collects both new tests. Adding an explicit marker would be redundant and inconsistent with the rest of test_workspace.py (no class in the file carries an explicit v2 marker).

2. Whitespace-only AUTH_SECRET — fixed in 6a4ab84. Both the staging and production guards now reject blank/whitespace-only values (! printf '%s' "$ENV_AUTH_SECRET" | grep -q '[^[:space:]]'), matching _read_auth_secret()'s blank-is-unset handling so it fails fast in CI instead of at backend startup.

Also note: while addressing this I found Python's sqlite3.connect() already defaults timeout=5.0 (busy_timeout=5000), so the substantive change in this PR is WAL journaling (it removes the rollback-journal reader/writer case that fails immediately). The flaky 8-thread stress test was replaced with a deterministic held-lock test (commit 0b5ff85), since SQLite's busy handler isn't fair and can starve a writer past any fixed timeout.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@frankbria: Thanks for the thorough triage and the technical clarification.

  1. @pytest.mark.v2 — withdrawing this finding. The auto-marking in tests/core/conftest.py via pytest_collection_modifyitems is a clean pattern and the explicit marker would indeed be redundant noise. Good call skipping it.

  2. Whitespace-only AUTH_SECRET — acknowledged. The printf | grep '[^[:space:]]' guard is the right level of strictness and correctly mirrors _read_auth_secret()'s semantics.

  3. SQLite timeout note — that's a useful clarification. The fact that sqlite3.connect(timeout=5.0) is the Python default means the existing codebase was already getting 5 s of spinning under the rollback-journal mode; what it was missing was WAL, which eliminates the reader-blocks-writer scenario entirely. The deterministic held-lock test is also a much more reliable signal than a thread-count stress test against a fair-scheduler assumption.

🐇 ✨


✏️ Learnings added
Learnt from: frankbria
URL: https://github.com/frankbria/codeframe/pull/686

Timestamp: 2026-06-15T21:22:13.719Z
Learning: In the `frankbria/codeframe` repository, `tests/core/conftest.py` auto-marks every test under `tests/core/` as `v2` via `pytest_collection_modifyitems`. Explicit `pytest.mark.v2` markers are therefore redundant for any test class or function placed in that directory; no test class in `tests/core/test_workspace.py` carries an explicit `v2` marker.

Learnt from: frankbria
URL: https://github.com/frankbria/codeframe/pull/686

Timestamp: 2026-06-15T21:22:13.719Z
Learning: Python's `sqlite3.connect()` already defaults `timeout=5.0`, which is equivalent to a 5000 ms busy_timeout. In the `frankbria/codeframe` repository's `codeframe/core/workspace.py` refactor (PR `#686`), the main concurrency improvement from adding `_open_db()` is enabling WAL journaling (`PRAGMA journal_mode=WAL`), not the busy_timeout—WAL removes the rollback-journal mode where readers can block writers immediately.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

Follow-up review (commits 2 & 3)

Addressed since my last review:

  • Whitespace-only AUTH_SECRET guard added in commit 3 — correct approach using printf '%s' | grep -q '[^[:space:]]'.
  • Flaky stress test replaced with a deterministic lock-hold test in commit 2 — much better. The docstring explains the design trade-off clearly.
  • _open_db docstring now correctly notes that busy_timeout is set for explicitness but the real fix is WAL.

Two items still open:

1. _open_db is still a private symbol imported across module boundaries (costs_v2.py)

This was flagged in my first review and remains unchanged. Since _open_db is now consumed by two separate packages (adapters/ via get_db_connection and ui/ directly), it is effectively public API. Renaming it to open_db is a one-line change with no functional risk. Without it, any internal refactor that renames or inlines _open_db will silently break costs_v2.py (and won't surface as a linter warning since Python doesn't enforce private-by-convention across packages).

2. TestConcurrentWriters is missing the @pytest.mark.v2 marker

Per CLAUDE.md: "New v2 tests: add @pytest.mark.v2 or pytestmark = pytest.mark.v2 at module level."

The existing module (test_workspace.py) has no module-level pytestmark, and the new class adds none either. Running uv run pytest -m v2 will skip both new tests. CodeRabbit flagged this inline; it still applies. Add @pytest.mark.v2 at the class level, or add a module-level pytestmark if that is intentional for the pre-existing tests too.


Everything else looks good. These two items are the only blockers before merge.

@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

Follow-up Review

Picking up from my previous review — good iteration. Here is where things stand.

What's resolved

  • Whitespace-only AUTH_SECRET guard — the printf | grep '[^[:space:]]' pattern in the deploy workflow now correctly matches _read_auth_secret()'s blank-is-unset semantics. Solid fix.
  • @pytest.mark.v2 — the conftest auto-marking explanation is correct and consistent with the rest of test_workspace.py. No change needed.
  • Deterministic lock test — replacing the aggressively-contending 8-thread stress test with a single held-lock test is the right call. SQLite's unfair busy handler means the stress test was a flakiness risk, not a correctness signal. The held-lock test directly tests what matters (waiter does not fail immediately), and the 0.5 s hold is comfortably under the 5 s timeout.
  • Python default timeout clarificationsqlite3.connect(timeout=5.0) is already the Python default, so the headline improvement here is WAL mode, not the busy_timeout pragma. Good to have this on record.

Still outstanding from my previous review

costs_v2.py imports the private _open_db symbol — unchanged in the current diff:

from codeframe.core.workspace import Workspace, _open_db

_open_db is now imported by two separate packages (adapters/e2b/budget.py via get_db_connection, and ui/routers/costs_v2.py directly), so the underscore prefix is actively misleading. The fix is a one-line rename in workspace.py plus one import update in costs_v2.py:

# workspace.py — drop the underscore
def open_db(db_path: str | Path) -> sqlite3.Connection: ...

# costs_v2.py
from codeframe.core.workspace import Workspace, open_db

This does not need to block merge, but it is worth a follow-up. A private-named function crossing a module boundary means anyone refactoring workspace.py has no signal it is used externally.

Verdict

Ready to merge. The core fix (WAL centralisation across all workspace connections) is correct and well-tested. The deploy bonus fix is also correct. The _open_db visibility issue is a naming concern only, not a correctness risk — tracking it as a follow-up is sufficient.

…module import

Addresses Claude PR review: costs_v2.py imported the private _open_db() across
package boundaries (fragile if renamed/inlined). Add a public
get_db_connection_by_path(db_path) accessor in workspace.py (parallels
get_db_connection but takes a raw path, for the costs router's fresh/locked-DB
tolerant helpers) and route costs_v2.py through it. _open_db stays the private
primitive used internally.
@frankbria

Copy link
Copy Markdown
Owner Author

Thanks @claude — addressed the actionable item:

Private cross-module import (_open_db) → fixed in 1aeca53. Added a public get_db_connection_by_path(db_path) in workspace.py (parallels get_db_connection but takes a raw path, for the costs router's fresh/locked-DB tolerant helpers); costs_v2.py now uses it. _open_db stays the private primitive used internally.

The other points are resolved by the test rework (commit 0b5ff85): the broad OperationalError catch and thread-vs-process stress test are gone — replaced with a deterministic held-lock test. And @pytest.mark.v2 is already applied automatically by tests/core/conftest.py (verified via pytest -m v2 --co), so an explicit marker would be redundant.

@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

Claude finished @frankbria's task in 53s —— View job


Follow-up Review

  • Read changed files to verify the get_db_connection_by_path fix in commit 1aeca53
  • Confirm costs_v2.py no longer imports private _open_db
  • Confirm workspace.py exposes the new public function correctly
  • Final verdict

All previous review items resolved. Here's the final state:

_open_db cross-module visibility (commit 1aeca53) — resolved correctly. workspace.py:867 now exposes get_db_connection_by_path(db_path) as a thin public wrapper over _open_db, parallel to get_db_connection(workspace). costs_v2.py:26 imports get_db_connection_by_path — no more private symbol crossing module boundaries. _open_db stays as the private primitive it should be.

Deterministic lock test (commit 0b5ff85) — confirmed resolved in the prior round. The broad OperationalError catch and thread-pool stress test are gone.

@pytest.mark.v2 — auto-applied by tests/core/conftest.py. No change needed.

Verdict: ready to merge. The WAL centralisation is correct, well-tested, and the module boundary is now clean.

@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

Follow-up Review

Both issues flagged in my initial review have been addressed:

Addressed: Private symbol import fixed - get_db_connection_by_path is now a proper public function in workspace.py, and costs_v2.py imports it correctly. _open_db correctly remains private and never crosses module boundaries.

Addressed: Whitespace-only secret guard added - The deploy workflow now uses printf '%s' "${ENV_AUTH_SECRET}" | grep -q '[^[:space:]]' to catch secrets that are blank or whitespace-only, matching the CodeRabbit suggestion and guarding against the backend's own whitespace-stripping behaviour.


One remaining nit: pytest.mark.v2 on TestConcurrentWriters

CLAUDE.md requires new v2 tests to carry this marker. The module has no module-level pytestmark = pytest.mark.v2, and the existing classes predate the convention - but the new class should set the example:

class TestConcurrentWriters:
    pytestmark = pytest.mark.v2
    """Concurrency guards on workspace connections (issue #648)."""

Class-level pytestmark is idiomatic pytest and does not require a module-level opt-in.


Verdict: Approve. Everything else is solid - the public wrapper pattern, the WAL+busy_timeout centralisation, the deterministic lock-contention test, and the fail-fast deploy guard are all good work. The marker nit can be addressed here or in a follow-up.

CLAUDE.md asks new v2 tests to carry the marker explicitly (module-level
pytestmark), even though tests/core/ is also auto-marked by conftest. Addresses
the repeated CodeRabbit finding by following the documented project convention.
@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

Final Review - all previous items resolved.

Reviewing the current diff against all outstanding items from the review thread.

Correction on my previous nit about pytest.mark.v2:

The module-level pytestmark = pytest.mark.v2 I flagged in my last comment as still outstanding is already present in this diff. That was a misread — the marker was added to test_workspace.py at module level. Combined with the auto-marking in tests/core/conftest.py, TestConcurrentWriters is doubly covered. No change needed.

All items resolved:

  • Private _open_db crossing module boundary: Fixed via get_db_connection_by_path public wrapper; costs_v2.py imports it cleanly, _open_db stays private
  • Flaky 8-thread stress test: Fixed — replaced with deterministic held-lock test (correct design choice)
  • Whitespace-only AUTH_SECRET guard: Fixed — printf | grep pattern mirrors the backend blank-is-unset check
  • pytest.mark.v2 on TestConcurrentWriters: Present — module-level pytestmark is in the diff
  • _open_db docstring accuracy: Correct — WAL is the real fix; busy_timeout is explicit-for-clarity since Python sqlite3.connect already defaults to timeout=5.0

Verdict: Approve and merge. The WAL centralisation is correct and well-tested, module boundaries are clean, and the fail-fast deploy guard is solid. Nothing outstanding.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/core/test_workspace.py (1)

227-255: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the holder lock until the waiter is actually contending.

The fixed time.sleep(0.5) leaves a false-positive path: if the test thread is paused after Line 244, the holder can commit before Line 251 runs, so the test passes without proving that the second writer waited on a held lock. Hold the transaction with a release event, run the waiter in its own thread, assert it does not finish immediately while the lock is held, then release and join both threads.

🐛 Proposed deterministic contention check
-        lock_acquired = threading.Event()
-        hold_seconds = 0.5  # comfortably under the 5s busy timeout
+        lock_acquired = threading.Event()
+        release_lock = threading.Event()
+        waiter_finished = threading.Event()
+        errors: list[BaseException] = []

         def hold_write_lock() -> None:
             conn = get_db_connection(ws)
             conn.isolation_level = None  # take manual control of the transaction
             try:
                 conn.execute("BEGIN IMMEDIATE")
                 conn.execute("INSERT INTO lock_probe (id, label) VALUES (1, 'holder')")
                 lock_acquired.set()
-                time.sleep(hold_seconds)
+                assert release_lock.wait(timeout=5), "test never released holder lock"
                 conn.execute("COMMIT")
+            except BaseException as exc:
+                errors.append(exc)
+                raise
             finally:
                 conn.close()
 
         holder = threading.Thread(target=hold_write_lock)
         holder.start()
         assert lock_acquired.wait(timeout=5), "holder never acquired the write lock"
 
-        # The lock is held now. Without the busy timeout this write would raise
-        # ``database is locked`` immediately; with it, the write blocks until the
-        # holder commits (~hold_seconds) and then succeeds.
-        conn2 = get_db_connection(ws)
-        try:
-            conn2.execute("INSERT INTO lock_probe (id, label) VALUES (2, 'waiter')")
-            conn2.commit()
-        finally:
-            conn2.close()
-        holder.join(timeout=5)
+        def wait_for_write_lock() -> None:
+            conn2 = get_db_connection(ws)
+            try:
+                conn2.execute("INSERT INTO lock_probe (id, label) VALUES (2, 'waiter')")
+                conn2.commit()
+            except BaseException as exc:
+                errors.append(exc)
+                raise
+            finally:
+                conn2.close()
+                waiter_finished.set()
+
+        waiter = threading.Thread(target=wait_for_write_lock)
+        waiter.start()
+        time.sleep(0.1)
+        assert not waiter_finished.is_set(), "waiter finished while holder still held lock"
+
+        release_lock.set()
+        holder.join(timeout=5)
+        waiter.join(timeout=5)
+        assert not holder.is_alive(), "holder thread did not finish"
+        assert not waiter.is_alive(), "waiter thread did not finish"
+        assert not errors
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/core/test_workspace.py` around lines 227 - 255, The test uses a fixed
sleep duration in hold_write_lock() to hold a write lock, but this creates a
race condition: if the test thread pauses after the lock is acquired, the holder
can commit before the waiter actually attempts to write, resulting in a
false-positive test pass. Replace the time.sleep(hold_seconds) call in
hold_write_lock() with a wait on a new threading.Event (release_lock), create a
separate thread to run the waiter code (the conn2 insert and commit), use
threading.Event.wait(timeout=...) to assert the waiter thread does not complete
immediately while the holder lock is held, then signal the release_lock event to
allow the holder to commit and join both threads to verify they completed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/core/test_workspace.py`:
- Around line 227-255: The test uses a fixed sleep duration in hold_write_lock()
to hold a write lock, but this creates a race condition: if the test thread
pauses after the lock is acquired, the holder can commit before the waiter
actually attempts to write, resulting in a false-positive test pass. Replace the
time.sleep(hold_seconds) call in hold_write_lock() with a wait on a new
threading.Event (release_lock), create a separate thread to run the waiter code
(the conn2 insert and commit), use threading.Event.wait(timeout=...) to assert
the waiter thread does not complete immediately while the holder lock is held,
then signal the release_lock event to allow the holder to commit and join both
threads to verify they completed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 412c23a4-e6cd-4ecb-9665-7f73a2b13938

📥 Commits

Reviewing files that changed from the base of the PR and between 1aeca53 and 3ca36c3.

📒 Files selected for processing (1)
  • tests/core/test_workspace.py

@frankbria frankbria merged commit 461d353 into main Jun 15, 2026
17 checks passed
@frankbria frankbria deleted the fix/648-workspace-sqlite-busy-timeout-wal branch June 15, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P6.6.1] Add busy_timeout + WAL to core workspace SQLite connections

1 participant