fix(core): add busy_timeout + WAL to workspace SQLite connections (#648) by frankbria · Pull Request #686 · frankbria/codeframe

frankbria · 2026-06-15T20:42:04Z

Summary

Fixes #648. Core workspace SQLite connections used bare sqlite3.connect() with no busy_timeout and no WAL journaling. Under parallel batch execution (multiple subprocesses + server background agent threads writing the same workspace DB), a concurrent writer got an immediate database is locked OperationalError instead of waiting. This brings core in line with platform_store/database.py, which already sets WAL + busy_timeout=5000.

This PR also fixes the failing Deploy workflow (per request) — see the second section.

Changes (#648)

core/workspace.py: new private _open_db() helper applying PRAGMA journal_mode=WAL + PRAGMA busy_timeout=5000. All 6 direct connects (_init_database, _ensure_schema_upgrades, create_or_load_workspace, get_workspace, get_db_connection, update_workspace_tech_stack) route through it.
- Deviation from the CodeRabbit plan: the helper passes db_path through unchanged rather than str(db_path). str() would coerce a non-path (e.g. a test's MagicMock) into a literal filename and silently create a junk DB file instead of raising — diverging from the prior sqlite3.connect() semantics. (str() was safe in platform_store only because its path is always real.)
adapters/e2b/budget.py: record_cloud_run / get_cloud_run use get_db_connection(workspace).
ui/routers/costs_v2.py: _query_costs / _open_workspace_conn use the shared _open_db(db_path) helper (these take db_path: str, not a Workspace, so routing through get_db_connection(workspace) as the plan suggested wasn't applicable).
Test: TestConcurrentWriters in tests/core/test_workspace.py — a ThreadPoolExecutor parallel-writers test (8 threads × 15 writes) that asserts no OperationalError and that all writes persist, plus a PRAGMA assertion. Verified RED before the fix (reproduced database is locked), GREEN after.

Bonus: fix failing staging/production Deploy

The Deploy workflow has been failing the health check on every push to main since #643 merged. Root cause (reproduced locally): #643 made the backend hard-fail on startup with the default AUTH_SECRET when auth is enabled (the default), but deploy.yml never wrote AUTH_SECRET into the generated .env.staging/.env.production. The server crash-looped under PM2, so /health never responded → "Verify deployment" failed after 12 attempts.

Write AUTH_SECRET (from the existing AUTH_SECRET GitHub Actions secret) into the generated env files for both staging and production.
Fail fast with a clear message if the secret is unset/empty, instead of surfacing as a 60s health-check timeout.
Document AUTH_SECRET in .env.staging.example.

Acceptance criteria

Core workspace connections set busy_timeout and WAL.
A concurrency test (parallel writers) no longer raises immediate database is locked.

Testing

uv run pytest tests/core tests/adapters tests/integration → 2330 passed, 2 skipped, no stray files created.
uv run ruff check → clean.
Deploy fix verified by reproducing the startup RuntimeError locally under staging-like config (no AUTH_SECRET, auth default-on, self-hosted).

Known limitations

The new concurrency test exercises WAL + busy_timeout via threads; it does not spawn separate OS processes (the production multi-writer scenario), but the connection-level pragmas are identical across both.
The Deploy fix depends on the AUTH_SECRET repo secret already being set (it is, since 2026-01-03); the new guard makes a missing secret fail loudly.

Summary by CodeRabbit

Documentation
- Updated the staging environment configuration example with required AUTH_SECRET guidance, including secure random generation notes.
Chores
- Staging and production deployments now inject the authentication secret and fail fast if it’s missing or blank.
- Improved workspace SQLite connection handling (WAL + busy timeout) for more reliable budget/cost access and concurrent writer behavior.
Tests
- Added concurrency tests to verify the journaling/timeout configuration and that concurrent writers wait instead of failing.

Core workspace connections used bare sqlite3.connect() with no busy_timeout and no WAL journaling, so under parallel batch execution a concurrent writer got an immediate "database is locked" OperationalError instead of waiting. This matches platform_store/database.py, which already sets WAL + 5s timeout. - Add private _open_db() helper in workspace.py (WAL + busy_timeout=5000) and route all 6 direct connects + get_db_connection through it. Pass db_path unchanged (no str() coercion) to preserve the prior connect semantics. - Route external callers through the shared path: budget.py via get_db_connection(workspace); costs_v2.py via _open_db(db_path). - Add a ThreadPoolExecutor parallel-writers test asserting no immediate database-locked error and that all writes persist. Also fix the staging/production Deploy workflow, which has been failing the health check since #643: the backend now hard-fails on the default AUTH_SECRET when auth is enabled (the default), but deploy.yml never wrote AUTH_SECRET into the generated env file. Write AUTH_SECRET (from the existing GitHub secret) into .env.staging/.env.production, fail fast with a clear message when it is unset, and document it in .env.staging.example.

coderabbitai · 2026-06-15T20:42:18Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Introduces a centralized _open_db SQLite helper in workspace.py that applies journal_mode=WAL and busy_timeout=5000ms to every connection, replacing all direct sqlite3.connect calls across workspace, budget, and costs_v2 modules. Adds concurrency tests. Separately, injects AUTH_SECRET from a GitHub secret into staging and production environment files during deployment, with a fail-fast guard when the secret is absent.

Changes

SQLite WAL and busy_timeout centralization

Layer / File(s)	Summary
`_open_db` helper and workspace.py adoption `codeframe/core/workspace.py`	Adds `_open_db(db_path)` applying `journal_mode=WAL` and `busy_timeout=5000ms`, then replaces all direct `sqlite3.connect` calls in `_init_database`, `_ensure_schema_upgrades`, `create_or_load_workspace`, `get_workspace`, `get_db_connection`, and `update_workspace_tech_stack`. Exports new `get_db_connection_by_path` for path-based access.
budget.py and costs_v2.py adopt shared helpers `codeframe/adapters/e2b/budget.py`, `codeframe/ui/routers/costs_v2.py`	`budget.py` replaces direct `sqlite3.connect` with `get_db_connection` in `record_cloud_run` and `get_cloud_run`; `costs_v2.py` replaces direct `sqlite3.connect` with `get_db_connection_by_path` in `_query_costs` and `_open_workspace_conn`.
Concurrency tests `tests/core/test_workspace.py`	Adds v2 marker and `TestConcurrentWriters` class with a PRAGMA assertion test verifying WAL mode and 5s busy_timeout, and a deterministic lock-contention test where a holder thread acquires a write lock and a waiter thread verifies it blocks instead of raising `database is locked`, with both writes succeeding and final row count 2.

AUTH_SECRET deployment injection

Layer / File(s)	Summary
AUTH_SECRET example entry `.env.staging.example`	Adds `AUTH_SECRET` placeholder with generation instructions and a note that the backend hard-fails on startup when it is unset.
CI/CD fail-fast guard and env-file injection `.github/workflows/deploy.yml`	Exports `ENV_AUTH_SECRET` from the GitHub secret for both staging and production "Create environment file" steps, adds bash fail-fast guards that abort deployment when the value is empty or whitespace, and appends `AUTH_SECRET=${ENV_AUTH_SECRET}` to the generated `.env.staging` and `.env.production` files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

frankbria/codeframe#95: Both PRs modify .github/workflows/deploy.yml to generate staging and production .env files and gate deployments on required environment variable values.

Poem

🐇 Hop hop, no more locked rows at night,
WAL mode shines like a warm morning light.
Five seconds to wait before giving up the fight,
And AUTH_SECRET guards each deploy just right.
The rabbit writes tests so the threads won't bite! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(core): add busy_timeout + WAL to workspace SQLite connections (`#648`)' accurately and specifically describes the primary change in the PR.
Linked Issues check	✅ Passed	All acceptance criteria from issue `#648` are met: core workspace connections now set busy_timeout and WAL via _open_db() helper, and TestConcurrentWriters validates parallel writers no longer fail with database locked errors.
Out of Scope Changes check	✅ Passed	The Deploy workflow AUTH_SECRET fix in .env.staging.example and deploy.yml is related to issue context but not explicitly required by `#648` acceptance criteria; however, it is justified as a prerequisite for testing parallel batch execution in production.
Docstring Coverage	✅ Passed	Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📋 Issue Planner

Built with CodeRabbit's Coding Plans for faster development and fewer bugs.

View plan used: #648

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/648-workspace-sqlite-busy-timeout-wal

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-06-15T20:44:34Z

Code Review — fix(core): busy_timeout + WAL for workspace SQLite (#648)

This is a clean, well-scoped fix. The root cause is correctly identified, the centralisation via _open_db() is the right pattern, and the parallel-writers test gives real regression coverage. A few items worth calling out:

Correctness / Architecture

costs_v2.py imports a private symbol from core (_open_db)

codeframe/ui/routers/costs_v2.py now does:

from codeframe.core.workspace import Workspace, _open_db

The PR description explains why get_db_connection(workspace) wasn't directly usable here (these helpers take db_path: str, not a Workspace), but importing a name-mangled private function across module boundaries is a code smell — it will silently break if _open_db is ever renamed or inlined.

The simplest fix: drop the leading underscore and make it open_db(). It is already used in two separate packages, so it is effectively public API. Alternatively, add a thin public wrapper get_db_connection_by_path(db_path) -> sqlite3.Connection in workspace.py and import that instead.

Test coverage

Missing @pytest.mark.v2 on the new class (if not set module-wide)

CLAUDE.md requires @pytest.mark.v2 on new v2 tests. The diff does not show the top of tests/core/test_workspace.py, so I cannot confirm whether pytestmark = pytest.mark.v2 is already set at module level. If it is not, TestConcurrentWriters should carry it explicitly:

class TestConcurrentWriters:
    pytestmark = pytest.mark.v2

Thread-based test vs. process-based reality (acknowledged)

The PR description notes this limitation clearly — threads share a process and the GIL, so contention is lighter than the production multi-process scenario. The 8-thread x 15-write x 1ms sleep approach is a reasonable proxy. Consider adding a one-line comment inside the test itself referencing this, so future readers do not assume it covers multi-process contention.

OperationalError catch scope is broad

except sqlite3.OperationalError as exc:  # pragma: no cover - failure path
    errors.append(exc)

Any OperationalError (schema error, disk full) would appear here as a WAL failure. Narrowing to 'database is locked' in str(exc) before appending — or at least including that check in the final assert not errors message — would give a clearer diagnostic if the test ever fails for an unrelated reason.

Deploy workflow

The fail-fast guard is correct and the error message is actionable. The check is duplicated verbatim across the staging and production jobs, which is fine for two environments; a composite action would be worth it if a third environment is ever added.

What is good

_open_db() passes db_path through unchanged — the MagicMock reasoning in the comment is exactly right.
All 6 callsites in workspace.py are migrated consistently.
budget.py correctly reuses the public get_db_connection(workspace) rather than reaching into _open_db directly.
WAL + busy_timeout settings mirror platform_store/database.py, keeping the two codepaths in sync.
Test was verified RED before the fix and GREEN after — that is the right bar.

Verdict

Approve with minor follow-up suggested. The one item worth addressing before merge is making _open_db public (rename to open_db) since it already crosses module boundaries into costs_v2.py. Everything else is solid.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

.github/workflows/deploy.yml (1)
83-91: 💤 Low value

Consider enhancing the fail-fast guard to catch whitespace-only secrets.

The current check uses -z "${ENV_AUTH_SECRET}" which only catches truly empty values. Whitespace-only secrets would pass this CI check but fail later at backend startup when _read_auth_secret() strips the whitespace and falls back to DEFAULT_SECRET (see codeframe/auth/manager.py:26-28). While the backend will still refuse to start insecurely, catching it earlier in CI would provide clearer feedback.
🛡️ Optional enhancement to catch whitespace-only secrets

For staging (line 86):
-          if [ -z "${ENV_AUTH_SECRET}" ]; then
+          if [ -z "${ENV_AUTH_SECRET}" ] || [ -z "${ENV_AUTH_SECRET// /}" ]; then
             echo "❌ AUTH_SECRET GitHub Actions secret is not set (or empty)."
For production (line 314):
-          if [ -z "${ENV_AUTH_SECRET}" ]; then
+          if [ -z "${ENV_AUTH_SECRET}" ] || [ -z "${ENV_AUTH_SECRET// /}" ]; then
             echo "❌ AUTH_SECRET GitHub Actions secret is not set (or empty)."
Or use a more readable pattern:
if [ -z "${ENV_AUTH_SECRET}" ] || ! echo "${ENV_AUTH_SECRET}" | grep -q '[^[:space:]]'; then
Based on learnings from codeframe/auth/manager.py, which treats blank/whitespace AUTH_SECRET values as unset and falls back to the default secret, making the startup guard catch them as a hard-fail in hosted mode.

Also applies to: 311-319
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/deploy.yml around lines 83 - 91, The ENV_AUTH_SECRET
validation check in both the staging and production deployment sections
currently only catches empty values using the `-z` test, but whitespace-only
secrets would pass this check and fail later when the backend's
_read_auth_secret() function strips whitespace and falls back to DEFAULT_SECRET.
Enhance the condition at the staging check (around line 86) and the production
check (around line 314) to also detect whitespace-only secrets by adding an
additional test that verifies the secret contains at least one non-whitespace
character, either by using a grep pattern like `! echo "${ENV_AUTH_SECRET}" |
grep -q '[^[:space:]]'` or another suitable bash pattern that validates the
secret is not just whitespace before proceeding with the deployment.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/core/test_workspace.py`:
- Around line 180-249: The TestConcurrentWriters class at the beginning of the
diff is missing the required pytest v2 marker decorator. Add the `@pytest.mark.v2`
decorator to the TestConcurrentWriters class to ensure these concurrency
regression tests are included when running tests with marker-based selection
(e.g., uv run pytest -m v2), as required by the coding guidelines for Python
tests in the tests directory.

---

Nitpick comments:
In @.github/workflows/deploy.yml:
- Around line 83-91: The ENV_AUTH_SECRET validation check in both the staging
and production deployment sections currently only catches empty values using the
`-z` test, but whitespace-only secrets would pass this check and fail later when
the backend's _read_auth_secret() function strips whitespace and falls back to
DEFAULT_SECRET. Enhance the condition at the staging check (around line 86) and
the production check (around line 314) to also detect whitespace-only secrets by
adding an additional test that verifies the secret contains at least one
non-whitespace character, either by using a grep pattern like `! echo
"${ENV_AUTH_SECRET}" | grep -q '[^[:space:]]'` or another suitable bash pattern
that validates the secret is not just whitespace before proceeding with the
deployment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9a915826-9c38-45ca-bf0e-dd180f7e96dd

📥 Commits

Reviewing files that changed from the base of the PR and between bb393b7 and 219df06.

📒 Files selected for processing (6)

.env.staging.example
.github/workflows/deploy.yml
codeframe/adapters/e2b/budget.py
codeframe/core/workspace.py
codeframe/ui/routers/costs_v2.py
tests/core/test_workspace.py

The original ThreadPoolExecutor stress test (8 writers) was flaky: SQLite's busy handler is not fair, so under heavy contention one writer can be starved past the 5s busy_timeout and still raise "database is locked" — even with the fix. Investigation also showed Python's sqlite3.connect already defaults to a 5s busy_timeout, so the substantive fix here is enabling WAL journaling (which removes the rollback-journal reader/writer case that fails immediately). Replace the stress test with a deterministic test: one thread holds the write lock (BEGIN IMMEDIATE) briefly while a second writer must wait for it and then succeed. Update the _open_db docstring to reflect that WAL is the real change.

Addresses CodeRabbit review: _read_auth_secret() treats blank/whitespace-only AUTH_SECRET as unset and falls back to the default secret, so a whitespace-only GitHub secret would pass the `-z` guard yet still hard-fail the backend at startup. Catch it in CI for clearer feedback (both staging and production).

frankbria · 2026-06-15T21:21:39Z

Thanks @coderabbitai — triaged both findings:

1. @pytest.mark.v2 on TestConcurrentWriters — skipping (false positive for this repo).
tests/core/conftest.py already auto-marks every test in tests/core/ as v2 via pytest_collection_modifyitems. Verified: uv run pytest tests/core/test_workspace.py -m v2 --co collects both new tests. Adding an explicit marker would be redundant and inconsistent with the rest of test_workspace.py (no class in the file carries an explicit v2 marker).

2. Whitespace-only AUTH_SECRET — fixed in 6a4ab84. Both the staging and production guards now reject blank/whitespace-only values (! printf '%s' "$ENV_AUTH_SECRET" | grep -q '[^[:space:]]'), matching _read_auth_secret()'s blank-is-unset handling so it fails fast in CI instead of at backend startup.

Also note: while addressing this I found Python's sqlite3.connect() already defaults timeout=5.0 (busy_timeout=5000), so the substantive change in this PR is WAL journaling (it removes the rollback-journal reader/writer case that fails immediately). The flaky 8-thread stress test was replaced with a deterministic held-lock test (commit 0b5ff85), since SQLite's busy handler isn't fair and can starve a writer past any fixed timeout.

coderabbitai · 2026-06-15T21:22:14Z

Tip

For best results, initiate chat on the files or code changes.

@frankbria: Thanks for the thorough triage and the technical clarification.

@pytest.mark.v2 — withdrawing this finding. The auto-marking in tests/core/conftest.py via pytest_collection_modifyitems is a clean pattern and the explicit marker would indeed be redundant noise. Good call skipping it.
Whitespace-only AUTH_SECRET — acknowledged. The printf | grep '[^[:space:]]' guard is the right level of strictness and correctly mirrors _read_auth_secret()'s semantics.
SQLite timeout note — that's a useful clarification. The fact that sqlite3.connect(timeout=5.0) is the Python default means the existing codebase was already getting 5 s of spinning under the rollback-journal mode; what it was missing was WAL, which eliminates the reader-blocks-writer scenario entirely. The deterministic held-lock test is also a much more reliable signal than a thread-count stress test against a fair-scheduler assumption.

🐇 ✨

✏️ Learnings added

Learnt from: frankbria
URL: https://github.com/frankbria/codeframe/pull/686

Timestamp: 2026-06-15T21:22:13.719Z
Learning: In the `frankbria/codeframe` repository, `tests/core/conftest.py` auto-marks every test under `tests/core/` as `v2` via `pytest_collection_modifyitems`. Explicit `pytest.mark.v2` markers are therefore redundant for any test class or function placed in that directory; no test class in `tests/core/test_workspace.py` carries an explicit `v2` marker.

Learnt from: frankbria
URL: https://github.com/frankbria/codeframe/pull/686

Timestamp: 2026-06-15T21:22:13.719Z
Learning: Python's `sqlite3.connect()` already defaults `timeout=5.0`, which is equivalent to a 5000 ms busy_timeout. In the `frankbria/codeframe` repository's `codeframe/core/workspace.py` refactor (PR `#686`), the main concurrency improvement from adding `_open_db()` is enabling WAL journaling (`PRAGMA journal_mode=WAL`), not the busy_timeout—WAL removes the rollback-journal mode where readers can block writers immediately.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

claude · 2026-06-15T21:23:18Z

Follow-up review (commits 2 & 3)

Addressed since my last review:

Whitespace-only AUTH_SECRET guard added in commit 3 — correct approach using printf '%s' | grep -q '[^[:space:]]'.
Flaky stress test replaced with a deterministic lock-hold test in commit 2 — much better. The docstring explains the design trade-off clearly.
_open_db docstring now correctly notes that busy_timeout is set for explicitness but the real fix is WAL.

Two items still open:

1. _open_db is still a private symbol imported across module boundaries (costs_v2.py)

This was flagged in my first review and remains unchanged. Since _open_db is now consumed by two separate packages (adapters/ via get_db_connection and ui/ directly), it is effectively public API. Renaming it to open_db is a one-line change with no functional risk. Without it, any internal refactor that renames or inlines _open_db will silently break costs_v2.py (and won't surface as a linter warning since Python doesn't enforce private-by-convention across packages).

2. TestConcurrentWriters is missing the @pytest.mark.v2 marker

Per CLAUDE.md: "New v2 tests: add @pytest.mark.v2 or pytestmark = pytest.mark.v2 at module level."

The existing module (test_workspace.py) has no module-level pytestmark, and the new class adds none either. Running uv run pytest -m v2 will skip both new tests. CodeRabbit flagged this inline; it still applies. Add @pytest.mark.v2 at the class level, or add a module-level pytestmark if that is intentional for the pre-existing tests too.

Everything else looks good. These two items are the only blockers before merge.

claude · 2026-06-15T21:24:46Z

Follow-up Review

Picking up from my previous review — good iteration. Here is where things stand.

What's resolved

Whitespace-only AUTH_SECRET guard — the printf | grep '[^[:space:]]' pattern in the deploy workflow now correctly matches _read_auth_secret()'s blank-is-unset semantics. Solid fix.
@pytest.mark.v2 — the conftest auto-marking explanation is correct and consistent with the rest of test_workspace.py. No change needed.
Deterministic lock test — replacing the aggressively-contending 8-thread stress test with a single held-lock test is the right call. SQLite's unfair busy handler means the stress test was a flakiness risk, not a correctness signal. The held-lock test directly tests what matters (waiter does not fail immediately), and the 0.5 s hold is comfortably under the 5 s timeout.
Python default timeout clarification — sqlite3.connect(timeout=5.0) is already the Python default, so the headline improvement here is WAL mode, not the busy_timeout pragma. Good to have this on record.

Still outstanding from my previous review

costs_v2.py imports the private _open_db symbol — unchanged in the current diff:

from codeframe.core.workspace import Workspace, _open_db

_open_db is now imported by two separate packages (adapters/e2b/budget.py via get_db_connection, and ui/routers/costs_v2.py directly), so the underscore prefix is actively misleading. The fix is a one-line rename in workspace.py plus one import update in costs_v2.py:

# workspace.py — drop the underscore
def open_db(db_path: str | Path) -> sqlite3.Connection: ...

# costs_v2.py
from codeframe.core.workspace import Workspace, open_db

This does not need to block merge, but it is worth a follow-up. A private-named function crossing a module boundary means anyone refactoring workspace.py has no signal it is used externally.

Verdict

Ready to merge. The core fix (WAL centralisation across all workspace connections) is correct and well-tested. The deploy bonus fix is also correct. The _open_db visibility issue is a naming concern only, not a correctness risk — tracking it as a follow-up is sufficient.

…module import Addresses Claude PR review: costs_v2.py imported the private _open_db() across package boundaries (fragile if renamed/inlined). Add a public get_db_connection_by_path(db_path) accessor in workspace.py (parallels get_db_connection but takes a raw path, for the costs router's fresh/locked-DB tolerant helpers) and route costs_v2.py through it. _open_db stays the private primitive used internally.

frankbria · 2026-06-15T22:06:03Z

Thanks @claude — addressed the actionable item:

Private cross-module import (_open_db) → fixed in 1aeca53. Added a public get_db_connection_by_path(db_path) in workspace.py (parallels get_db_connection but takes a raw path, for the costs router's fresh/locked-DB tolerant helpers); costs_v2.py now uses it. _open_db stays the private primitive used internally.

The other points are resolved by the test rework (commit 0b5ff85): the broad OperationalError catch and thread-vs-process stress test are gone — replaced with a deterministic held-lock test. And @pytest.mark.v2 is already applied automatically by tests/core/conftest.py (verified via pytest -m v2 --co), so an explicit marker would be redundant.

claude · 2026-06-15T22:06:27Z

Claude finished @frankbria's task in 53s —— View job

Follow-up Review

Read changed files to verify the get_db_connection_by_path fix in commit 1aeca53
Confirm costs_v2.py no longer imports private _open_db
Confirm workspace.py exposes the new public function correctly
Final verdict

All previous review items resolved. Here's the final state:

_open_db cross-module visibility (commit 1aeca53) — resolved correctly. workspace.py:867 now exposes get_db_connection_by_path(db_path) as a thin public wrapper over _open_db, parallel to get_db_connection(workspace). costs_v2.py:26 imports get_db_connection_by_path — no more private symbol crossing module boundaries. _open_db stays as the private primitive it should be.

Deterministic lock test (commit 0b5ff85) — confirmed resolved in the prior round. The broad OperationalError catch and thread-pool stress test are gone.

@pytest.mark.v2 — auto-applied by tests/core/conftest.py. No change needed.

Verdict: ready to merge. The WAL centralisation is correct, well-tested, and the module boundary is now clean.

claude · 2026-06-15T22:09:28Z

Follow-up Review

Both issues flagged in my initial review have been addressed:

Addressed: Private symbol import fixed - get_db_connection_by_path is now a proper public function in workspace.py, and costs_v2.py imports it correctly. _open_db correctly remains private and never crosses module boundaries.

Addressed: Whitespace-only secret guard added - The deploy workflow now uses printf '%s' "${ENV_AUTH_SECRET}" | grep -q '[^[:space:]]' to catch secrets that are blank or whitespace-only, matching the CodeRabbit suggestion and guarding against the backend's own whitespace-stripping behaviour.

One remaining nit: pytest.mark.v2 on TestConcurrentWriters

CLAUDE.md requires new v2 tests to carry this marker. The module has no module-level pytestmark = pytest.mark.v2, and the existing classes predate the convention - but the new class should set the example:

class TestConcurrentWriters:
    pytestmark = pytest.mark.v2
    """Concurrency guards on workspace connections (issue #648)."""

Class-level pytestmark is idiomatic pytest and does not require a module-level opt-in.

Verdict: Approve. Everything else is solid - the public wrapper pattern, the WAL+busy_timeout centralisation, the deterministic lock-contention test, and the fail-fast deploy guard are all good work. The marker nit can be addressed here or in a follow-up.

CLAUDE.md asks new v2 tests to carry the marker explicitly (module-level pytestmark), even though tests/core/ is also auto-marked by conftest. Addresses the repeated CodeRabbit finding by following the documented project convention.

claude · 2026-06-15T22:14:03Z

Final Review - all previous items resolved.

Reviewing the current diff against all outstanding items from the review thread.

Correction on my previous nit about pytest.mark.v2:

The module-level pytestmark = pytest.mark.v2 I flagged in my last comment as still outstanding is already present in this diff. That was a misread — the marker was added to test_workspace.py at module level. Combined with the auto-marking in tests/core/conftest.py, TestConcurrentWriters is doubly covered. No change needed.

All items resolved:

Private _open_db crossing module boundary: Fixed via get_db_connection_by_path public wrapper; costs_v2.py imports it cleanly, _open_db stays private
Flaky 8-thread stress test: Fixed — replaced with deterministic held-lock test (correct design choice)
Whitespace-only AUTH_SECRET guard: Fixed — printf | grep pattern mirrors the backend blank-is-unset check
pytest.mark.v2 on TestConcurrentWriters: Present — module-level pytestmark is in the diff
_open_db docstring accuracy: Correct — WAL is the real fix; busy_timeout is explicit-for-clarity since Python sqlite3.connect already defaults to timeout=5.0

Verdict: Approve and merge. The WAL centralisation is correct and well-tested, module boundaries are clean, and the fail-fast deploy guard is solid. Nothing outstanding.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/core/test_workspace.py (1)

227-255: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the holder lock until the waiter is actually contending.

The fixed time.sleep(0.5) leaves a false-positive path: if the test thread is paused after Line 244, the holder can commit before Line 251 runs, so the test passes without proving that the second writer waited on a held lock. Hold the transaction with a release event, run the waiter in its own thread, assert it does not finish immediately while the lock is held, then release and join both threads.

🐛 Proposed deterministic contention check

-        lock_acquired = threading.Event()
-        hold_seconds = 0.5  # comfortably under the 5s busy timeout
+        lock_acquired = threading.Event()
+        release_lock = threading.Event()
+        waiter_finished = threading.Event()
+        errors: list[BaseException] = []

         def hold_write_lock() -> None:
             conn = get_db_connection(ws)
             conn.isolation_level = None  # take manual control of the transaction
             try:
                 conn.execute("BEGIN IMMEDIATE")
                 conn.execute("INSERT INTO lock_probe (id, label) VALUES (1, 'holder')")
                 lock_acquired.set()
-                time.sleep(hold_seconds)
+                assert release_lock.wait(timeout=5), "test never released holder lock"
                 conn.execute("COMMIT")
+            except BaseException as exc:
+                errors.append(exc)
+                raise
             finally:
                 conn.close()
 
         holder = threading.Thread(target=hold_write_lock)
         holder.start()
         assert lock_acquired.wait(timeout=5), "holder never acquired the write lock"
 
-        # The lock is held now. Without the busy timeout this write would raise
-        # ``database is locked`` immediately; with it, the write blocks until the
-        # holder commits (~hold_seconds) and then succeeds.
-        conn2 = get_db_connection(ws)
-        try:
-            conn2.execute("INSERT INTO lock_probe (id, label) VALUES (2, 'waiter')")
-            conn2.commit()
-        finally:
-            conn2.close()
-        holder.join(timeout=5)
+        def wait_for_write_lock() -> None:
+            conn2 = get_db_connection(ws)
+            try:
+                conn2.execute("INSERT INTO lock_probe (id, label) VALUES (2, 'waiter')")
+                conn2.commit()
+            except BaseException as exc:
+                errors.append(exc)
+                raise
+            finally:
+                conn2.close()
+                waiter_finished.set()
+
+        waiter = threading.Thread(target=wait_for_write_lock)
+        waiter.start()
+        time.sleep(0.1)
+        assert not waiter_finished.is_set(), "waiter finished while holder still held lock"
+
+        release_lock.set()
+        holder.join(timeout=5)
+        waiter.join(timeout=5)
+        assert not holder.is_alive(), "holder thread did not finish"
+        assert not waiter.is_alive(), "waiter thread did not finish"
+        assert not errors

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/core/test_workspace.py` around lines 227 - 255, The test uses a fixed
sleep duration in hold_write_lock() to hold a write lock, but this creates a
race condition: if the test thread pauses after the lock is acquired, the holder
can commit before the waiter actually attempts to write, resulting in a
false-positive test pass. Replace the time.sleep(hold_seconds) call in
hold_write_lock() with a wait on a new threading.Event (release_lock), create a
separate thread to run the waiter code (the conn2 insert and commit), use
threading.Event.wait(timeout=...) to assert the waiter thread does not complete
immediately while the holder lock is held, then signal the release_lock event to
allow the holder to commit and join both threads to verify they completed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/core/test_workspace.py`:
- Around line 227-255: The test uses a fixed sleep duration in hold_write_lock()
to hold a write lock, but this creates a race condition: if the test thread
pauses after the lock is acquired, the holder can commit before the waiter
actually attempts to write, resulting in a false-positive test pass. Replace the
time.sleep(hold_seconds) call in hold_write_lock() with a wait on a new
threading.Event (release_lock), create a separate thread to run the waiter code
(the conn2 insert and commit), use threading.Event.wait(timeout=...) to assert
the waiter thread does not complete immediately while the holder lock is held,
then signal the release_lock event to allow the holder to commit and join both
threads to verify they completed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 412c23a4-e6cd-4ecb-9665-7f73a2b13938

📥 Commits

Reviewing files that changed from the base of the PR and between 1aeca53 and 3ca36c3.

📒 Files selected for processing (1)

tests/core/test_workspace.py

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread tests/core/test_workspace.py Outdated

frankbria added 2 commits June 15, 2026 14:19

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

frankbria merged commit 461d353 into main Jun 15, 2026
17 checks passed

frankbria deleted the fix/648-workspace-sqlite-busy-timeout-wal branch June 15, 2026 22:17

coderabbitai Bot mentioned this pull request Jun 21, 2026

[P7.0.2] Hosted multi-tenancy hardening: session REST owner-scoping + TOCTOU path isolation #704

Closed

Conversation

frankbria commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes (#648)

Bonus: fix failing staging/production Deploy

Acceptance criteria

Testing

Known limitations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

claude Bot commented Jun 15, 2026

Code Review — fix(core): busy_timeout + WAL for workspace SQLite (#648)

Correctness / Architecture

Test coverage

Deploy workflow

What is good

Verdict

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

frankbria commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026

Uh oh!

claude Bot commented Jun 15, 2026

Follow-up review (commits 2 & 3)

Uh oh!

claude Bot commented Jun 15, 2026

Follow-up Review

What's resolved

Still outstanding from my previous review

Verdict

Uh oh!

frankbria commented Jun 15, 2026

Uh oh!

claude Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Follow-up Review

Uh oh!

claude Bot commented Jun 15, 2026

Follow-up Review

One remaining nit: pytest.mark.v2 on TestConcurrentWriters

Uh oh!

claude Bot commented Jun 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frankbria commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

claude Bot commented Jun 15, 2026 •

edited

Loading