Skip to content

Commit 040c354

Browse files
committed
update docs
1 parent 83d4a4c commit 040c354

1 file changed

Lines changed: 10 additions & 7 deletions

File tree

docs/guides/storage_clients.mdx

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Crawlee provides three main storage client implementations:
2828

2929
- <ApiLink to="class/FileSystemStorageClient">`FileSystemStorageClient`</ApiLink> - Provides persistent file system storage with in-memory caching.
3030
- <ApiLink to="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink> - Stores data in memory with no persistence.
31-
- <ApiLink to="class/SqlStorageClient">`SqlStorageClient`</ApiLink> - Provides persistent storage using a SQL database ([SQLite](https://sqlite.org/) or [PostgreSQL](https://www.postgresql.org/)). Requires installing the extra dependency: `crawlee[sql_sqlite]` for SQLite or `crawlee[sql_postgres]` for PostgreSQL.
31+
- <ApiLink to="class/SqlStorageClient">`SqlStorageClient`</ApiLink> - Provides persistent storage using a SQL database ([SQLite](https://sqlite.org/), [PostgreSQL](https://www.postgresql.org/), [MySQL](https://www.mysql.com/) or [MariaDB](https://mariadb.org/)). Requires installing the extra dependency: `crawlee[sql_sqlite]` for SQLite, `crawlee[sql_postgres]` for PostgreSQL or `crawlee[sql_mysql]` for MySQL and MariaDB.
3232
- <ApiLink to="class/RedisStorageClient">`RedisStorageClient`</ApiLink> - Provides persistent storage using a [Redis](https://redis.io/) database v8.0+. Requires installing the extra dependency `crawlee[redis]`.
3333
- [`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient) - Manages storage on the [Apify platform](https://apify.com), implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python).
3434

@@ -144,7 +144,7 @@ The `MemoryStorageClient` does not persist data between runs. All data is lost w
144144
The `SqlStorageClient` is experimental. Its API and behavior may change in future releases.
145145
:::
146146

147-
The <ApiLink to="class/SqlStorageClient">`SqlStorageClient`</ApiLink> provides persistent storage using a SQL database (SQLite by default, or PostgreSQL). It supports all Crawlee storage types and enables concurrent access from multiple independent clients or processes.
147+
The <ApiLink to="class/SqlStorageClient">`SqlStorageClient`</ApiLink> provides persistent storage using a SQL database (SQLite by default, or PostgreSQL, MySQL, MariaDB). It supports all Crawlee storage types and enables concurrent access from multiple independent clients or processes.
148148

149149
:::note dependencies
150150
The <ApiLink to="class/SqlStorageClient">`SqlStorageClient`</ApiLink> is not included in the core Crawlee package.
@@ -154,10 +154,12 @@ To use it, you need to install Crawlee with the appropriate extra dependency:
154154
<code>pip install 'crawlee[sql_sqlite]'</code>
155155
- For PostgreSQL support, run:
156156
<code>pip install 'crawlee[sql_postgres]'</code>
157+
- For MySQL or MariaDB support, run:
158+
<code>pip install 'crawlee[sql_mysql]'</code>
157159
:::
158160

159161
By default, <ApiLink to="class/SqlStorageClient">SqlStorageClient</ApiLink> uses SQLite.
160-
To use PostgreSQL instead, just provide a PostgreSQL connection string via the `connection_string` parameter. No other code changes are needed—the same client works for both databases.
162+
To use a different database, just provide the appropriate connection string via the `connection_string` parameter. No other code changes are needed—the same client works for all supported databases.
161163

162164
<RunnableCodeBlock className="language-python" language="python">
163165
{SQLStorageClientBasicExample}
@@ -214,7 +216,6 @@ class dataset_metadata_buffer {
214216
+ id (PK)
215217
+ accessed_at
216218
+ modified_at
217-
+ dataset_id (FK)
218219
+ delta_item_count
219220
}
220221
@@ -247,7 +248,6 @@ class key_value_store_metadata_buffer {
247248
+ id (PK)
248249
+ accessed_at
249250
+ modified_at
250-
+ key_value_store_id (FK)
251251
}
252252
253253
%% ========================
@@ -321,7 +321,6 @@ class request_queue_metadata_buffer {
321321
+ id (PK)
322322
+ accessed_at
323323
+ modified_at
324-
+ request_queue_id (FK)
325324
+ client_id
326325
+ delta_handled_count
327326
+ delta_pending_count
@@ -346,11 +345,15 @@ Configuration options for the <ApiLink to="class/SqlStorageClient">`SqlStorageCl
346345

347346
Configuration options for the <ApiLink to="class/SqlStorageClient">`SqlStorageClient`</ApiLink> can be set via constructor arguments:
348347

349-
- **`connection_string`** (default: SQLite in <ApiLink to="class/Configuration">`Configuration`</ApiLink> storage dir) - SQLAlchemy connection string, e.g. `sqlite+aiosqlite:///my.db` or `postgresql+asyncpg://user:pass@host/db`.
348+
- **`connection_string`** (default: SQLite in <ApiLink to="class/Configuration">`Configuration`</ApiLink> storage dir) - SQLAlchemy connection string, e.g. `sqlite+aiosqlite:///my.db`, `postgresql+asyncpg://user:pass@host/db`, `mysql+aiomysql://user:pass@host/db` or `mariadb+aiomysql://user:pass@host/db`.
350349
- **`engine`** - Pre-configured SQLAlchemy AsyncEngine (optional).
351350

352351
For advanced scenarios, you can configure <ApiLink to="class/SqlStorageClient">`SqlStorageClient`</ApiLink> with a custom SQLAlchemy engine and additional options via the <ApiLink to="class/Configuration">`Configuration`</ApiLink> class. This is useful, for example, when connecting to an external PostgreSQL database or customizing connection pooling.
353352

353+
:::warning
354+
If you use MySQL or MariaDB, pass the `isolation_level='READ COMMITTED'` argument to `create_async_engine`.
355+
:::
356+
354357
<CodeBlock className="language-python" language="python">
355358
{SQLStorageClientConfigurationExample}
356359
</CodeBlock>

0 commit comments

Comments
 (0)