Skip to content

fix: replace two-step acquire_lock with atomic upsert to prevent TOCTOU race#20

Merged
patrick-chinchill merged 1 commit into
mainfrom
fix/acquire-lock-toctou-race
Apr 7, 2026
Merged

fix: replace two-step acquire_lock with atomic upsert to prevent TOCTOU race#20
patrick-chinchill merged 1 commit into
mainfrom
fix/acquire-lock-toctou-race

Conversation

@patrick-chinchill
Copy link
Copy Markdown
Collaborator

Summary

  • Replaced the two-step INSERT/UPDATE lock acquisition in PostgresStateAdapter.acquire_lock() with a single atomic INSERT ... ON CONFLICT DO UPDATE WHERE chat_state_locks.expires_at <= now() upsert
  • The previous approach had a TOCTOU race: two concurrent callers could both see the INSERT fail, then both attempt the UPDATE on the expired row
  • The new single-statement approach lets Postgres acquire a row lock on the conflicting row, guaranteeing only one caller wins
  • Added test_acquire_lock_uses_single_atomic_upsert test that verifies exactly one SQL statement is issued per acquire attempt (new lock, contended lock, and expired lock scenarios)

Closes #19

Test plan

  • All 59 existing tests pass (including lock acquire/release/extend/force tests)
  • New test verifies single-query execution for all three lock scenarios
  • Manual verification against real Postgres with concurrent connections

🤖 Generated with Claude Code

…OU race

The previous two-step approach (INSERT ... DO NOTHING, then UPDATE ... WHERE
expired) had a time-of-check-to-time-of-use race: two concurrent callers
could both see the INSERT fail (row exists), then both attempt the UPDATE
on the expired row, with both potentially succeeding depending on timing.

The fix uses a single INSERT ... ON CONFLICT DO UPDATE ... WHERE
chat_state_locks.expires_at <= now() statement. Postgres acquires a row
lock on the conflicting row, so exactly one concurrent caller wins.

Closes #19

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the acquire_lock method in the Postgres state adapter to use a single atomic upsert, replacing the previous two-step process to eliminate a potential TOCTOU race condition. The changes include updating the SQL query to use ON CONFLICT DO UPDATE with an expiration check and adjusting the test suite to verify the single-query behavior. Review feedback suggests simplifying the SQL interval calculation for consistency and replacing a blocking time.sleep() call with await asyncio.sleep() in the asynchronous test.

"""INSERT INTO chat_state_locks (key_prefix, thread_id, token, expires_at)
VALUES ($1, $2, $3, $4)
ON CONFLICT (key_prefix, thread_id) DO NOTHING
VALUES ($1, $2, $3, now() + make_interval(secs => $4::float / 1000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the extend_lock implementation (line 269) and to simplify the SQL, you can use the interval multiplication syntax instead of make_interval. This also avoids the explicit cast to float and division.

Suggested change
VALUES ($1, $2, $3, now() + make_interval(secs => $4::float / 1000))
VALUES ($1, $2, $3, now() + $4 * interval '1 millisecond')


# Third acquire after expiry: should succeed in single query
mock_pool.executed_queries.clear()
time.sleep(0.005)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.sleep() in an asynchronous test blocks the event loop. While it might not cause issues in this specific isolated test, it is better practice to use await asyncio.sleep() to allow the event loop to continue running other tasks if necessary.

Suggested change
time.sleep(0.005)
await asyncio.sleep(0.005)

@patrick-chinchill patrick-chinchill merged commit fcfe55b into main Apr 7, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Postgres acquire_lock TOCTOU race condition

1 participant