fix: replace two-step acquire_lock with atomic upsert to prevent TOCTOU race#20
Conversation
…OU race The previous two-step approach (INSERT ... DO NOTHING, then UPDATE ... WHERE expired) had a time-of-check-to-time-of-use race: two concurrent callers could both see the INSERT fail (row exists), then both attempt the UPDATE on the expired row, with both potentially succeeding depending on timing. The fix uses a single INSERT ... ON CONFLICT DO UPDATE ... WHERE chat_state_locks.expires_at <= now() statement. Postgres acquires a row lock on the conflicting row, so exactly one concurrent caller wins. Closes #19 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the acquire_lock method in the Postgres state adapter to use a single atomic upsert, replacing the previous two-step process to eliminate a potential TOCTOU race condition. The changes include updating the SQL query to use ON CONFLICT DO UPDATE with an expiration check and adjusting the test suite to verify the single-query behavior. Review feedback suggests simplifying the SQL interval calculation for consistency and replacing a blocking time.sleep() call with await asyncio.sleep() in the asynchronous test.
| """INSERT INTO chat_state_locks (key_prefix, thread_id, token, expires_at) | ||
| VALUES ($1, $2, $3, $4) | ||
| ON CONFLICT (key_prefix, thread_id) DO NOTHING | ||
| VALUES ($1, $2, $3, now() + make_interval(secs => $4::float / 1000)) |
There was a problem hiding this comment.
For consistency with the extend_lock implementation (line 269) and to simplify the SQL, you can use the interval multiplication syntax instead of make_interval. This also avoids the explicit cast to float and division.
| VALUES ($1, $2, $3, now() + make_interval(secs => $4::float / 1000)) | |
| VALUES ($1, $2, $3, now() + $4 * interval '1 millisecond') |
|
|
||
| # Third acquire after expiry: should succeed in single query | ||
| mock_pool.executed_queries.clear() | ||
| time.sleep(0.005) |
There was a problem hiding this comment.
Using time.sleep() in an asynchronous test blocks the event loop. While it might not cause issues in this specific isolated test, it is better practice to use await asyncio.sleep() to allow the event loop to continue running other tasks if necessary.
| time.sleep(0.005) | |
| await asyncio.sleep(0.005) |
Summary
PostgresStateAdapter.acquire_lock()with a single atomicINSERT ... ON CONFLICT DO UPDATE WHERE chat_state_locks.expires_at <= now()upserttest_acquire_lock_uses_single_atomic_upserttest that verifies exactly one SQL statement is issued per acquire attempt (new lock, contended lock, and expired lock scenarios)Closes #19
Test plan
🤖 Generated with Claude Code