fix: Add retry logic for `KeyValueStore.set_value` and `Dataset.push_data` by Mantisus · Pull Request #1838 · apify/crawlee-python

Mantisus · 2026-04-07T23:36:17Z

Description

This PR adds retry logic for write operations in Dataset and KeyValueStore (push_data/set_value). This is especially critical for SqlStorageClient and RedisStorageClient due to the risk of data loss on transient failures.

Issues

Closes: SqlStorageClient silently swallows write errors, causing data loss #1831

Copilot

Pull request overview

This PR introduces a unified StorageWriteError and adds retry behavior for write operations (KeyValueStore.set_value and Dataset.push_data) to reduce the risk of data loss on transient storage failures (notably SQL/Redis).

Changes:

Add StorageWriteError and update SQL/Redis/FS storage clients to raise it on write failures instead of swallowing errors.
Add retry parameters (max_attempts, wait_time_between_retries) to KeyValueStore.set_value and Dataset.push_data.
Add unit tests asserting that client-level write failures surface as StorageWriteError.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/crawlee/errors.py`	Introduces `StorageWriteError` for consistent write-failure signaling.
`src/crawlee/storages/_key_value_store.py`	Adds retry loop to `KeyValueStore.set_value` (currently logs instead of raising after retries).
`src/crawlee/storages/_dataset.py`	Adds retry loop to `Dataset.push_data` (currently logs instead of raising after retries).
`src/crawlee/storage_clients/_sql/_key_value_store_client.py`	Converts SQL write failures into `StorageWriteError` and avoids `with_simple_commit` swallow behavior.
`src/crawlee/storage_clients/_sql/_dataset_client.py`	Converts SQL write failures into `StorageWriteError` and avoids `with_simple_commit` swallow behavior.
`src/crawlee/storage_clients/_redis/_key_value_store_client.py`	Wraps Redis write failures as `StorageWriteError`.
`src/crawlee/storage_clients/_redis/_dataset_client.py`	Wraps Redis write failures as `StorageWriteError`.
`src/crawlee/storage_clients/_file_system/_key_value_store_client.py`	Wraps filesystem write failures as `StorageWriteError`.
`src/crawlee/storage_clients/_file_system/_dataset_client.py`	Wraps filesystem write failures as `StorageWriteError`.
`src/crawlee/storage_clients/_sql/_request_queue_client.py`	Adds an extra `requests_db` emptiness guard (currently redundant).
`tests/unit/storage_clients/_sql/test_sql_kvs_client.py`	Adds test ensuring SQL KVS write errors raise `StorageWriteError`.
`tests/unit/storage_clients/_sql/test_sql_dataset_client.py`	Adds test ensuring SQL Dataset write errors raise `StorageWriteError`.
`tests/unit/storage_clients/_redis/test_redis_kvs_client.py`	Adds test ensuring Redis KVS write errors raise `StorageWriteError`.
`tests/unit/storage_clients/_redis/test_redis_dataset_client.py`	Adds test ensuring Redis Dataset write errors raise `StorageWriteError`.
`tests/unit/storage_clients/_file_system/test_fs_kvs_client.py`	Adds test ensuring FS KVS write errors raise `StorageWriteError`.
`tests/unit/storage_clients/_file_system/test_fs_dataset_client.py`	Adds test ensuring FS Dataset write errors raise `StorageWriteError`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T19:35:18Z

src/crawlee/storages/_key_value_store.py

+                logger.warning(
+                    f'Failed to set value for key "{key}" after {max_attempts} attempts '
+                    f'with error: {last_exception.cause}'
+                )


After exhausting retries, this method only logs a warning and then returns successfully. That will silently drop failed writes (similar to the original bug report) and callers will think the value was persisted. After the final attempt, re-raise the last caught StorageWriteError (or wrap/raise a new one) instead of returning.

Suggested change

)

)

raise last_exception

This is expected behavior. This PR allows users to configure retry settings, but propagating errors could cause issues.

Copilot · 2026-04-08T19:35:18Z

src/crawlee/storages/_dataset.py

+                    await asyncio.sleep(wait_time_between_retries_seconds)
+        else:
+            if last_exception:
+                logger.warning(f'Failed to push data after {max_attempts} attempts with error: {last_exception.cause}')


After exhausting retries, this method only logs a warning and then returns successfully. That means push_data failures can be silently ignored and data can be lost while the caller believes the write succeeded. After the final attempt, re-raise the last StorageWriteError (or raise a new one) rather than returning.

Suggested change

logger.warning(f'Failed to push data after {max_attempts} attempts with error: {last_exception.cause}')

logger.warning(f'Failed to push data after {max_attempts} attempts with error: {last_exception.cause}')

raise last_exception

This is expected behavior. This PR allows users to configure retry settings, but propagating errors could cause issues.

Copilot · 2026-04-08T19:35:18Z

src/crawlee/storage_clients/_sql/_request_queue_client.py

+        if not requests_db:
+            return None
+


This guard is redundant/unreachable because both the PostgreSQL/MySQL branch and the fallback branch already return early when requests_db is empty. Keeping it adds noise and suggests requests_db might be unset here when it shouldn't be.

Suggested change

if not requests_db:

return None

If an error occurs that triggers a rollback, this will prevent the failing

Add retry logic for KeyValueStore.set_value and Dataset.push_data

11f9e34

Mantisus self-assigned this Apr 7, 2026

add tests

2d6e3ad

Mantisus requested a review from Copilot April 8, 2026 19:32

Copilot started reviewing on behalf of Mantisus April 8, 2026 19:33 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

fix

cd64f6f

Mantisus marked this pull request as ready for review April 8, 2026 20:12

Mantisus requested review from Pijukatel and vdusek April 8, 2026 20:12

vdusek changed the title ~~feat: Add retry logic for KeyValueStore.set_value and Dataset.push_data~~ fix: Add retry logic for KeyValueStore.set_value and Dataset.push_data Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add retry logic for `KeyValueStore.set_value` and `Dataset.push_data`#1838

fix: Add retry logic for `KeyValueStore.set_value` and `Dataset.push_data`#1838
Mantisus wants to merge 3 commits intoapify:masterfrom
Mantisus:storages-retry

Mantisus commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Mantisus Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Mantisus Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Mantisus Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	logger.warning(f'Failed to push data after {max_attempts} attempts with error: {last_exception.cause}')
	logger.warning(f'Failed to push data after {max_attempts} attempts with error: {last_exception.cause}')
	raise last_exception

Conversation

Mantisus commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Mantisus Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Mantisus Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Mantisus Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mantisus commented Apr 7, 2026 •

edited

Loading