Skip to content

Commit fea23b2

Browse files
Add documentation for AioS3FileSystem and S3Executor
Add API reference entries for AioS3FileSystem, AioS3File, S3Executor, S3ThreadPoolExecutor, and S3AioExecutor to docs/api/filesystem.rst. Add user guide section to docs/aio.md explaining AioS3FileSystem's asyncio-native parallel operations, the executor strategy pattern, and usage examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 93ae890 commit fea23b2

File tree

2 files changed

+96
-0
lines changed

2 files changed

+96
-0
lines changed

docs/aio.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,3 +193,78 @@ The `as_pandas()`, `as_arrow()`, and `as_polars()` convenience methods operate o
193193
already-loaded data and remain synchronous.
194194

195195
See each cursor's documentation page for detailed usage examples.
196+
197+
(aio-s3-filesystem)=
198+
199+
## AioS3FileSystem
200+
201+
`AioS3FileSystem` is a native asyncio filesystem interface for Amazon S3, built on
202+
fsspec's `AsyncFileSystem`. It provides the same functionality as `S3FileSystem` but
203+
uses `asyncio.gather` with `asyncio.to_thread` for parallel operations instead of
204+
`ThreadPoolExecutor`.
205+
206+
### Why AioS3FileSystem?
207+
208+
The synchronous `S3FileSystem` uses `ThreadPoolExecutor` for parallel S3 operations
209+
(batch deletes, multipart uploads, range reads). When used from within an asyncio
210+
application via `AioS3FSCursor`, this creates a thread-in-thread pattern:
211+
the cursor wraps calls in `asyncio.to_thread()`, and inside that thread
212+
`S3FileSystem` spawns additional threads via `ThreadPoolExecutor`.
213+
214+
`AioS3FileSystem` eliminates this inefficiency by dispatching all parallel
215+
operations through the asyncio event loop.
216+
217+
| | S3FileSystem | AioS3FileSystem |
218+
|---|---|---|
219+
| **Parallelism** | `ThreadPoolExecutor` | `asyncio.gather` + `asyncio.to_thread` |
220+
| **File handles** | `S3File` with thread pool | `AioS3File` with `S3AioExecutor` |
221+
| **Bulk delete** | Thread pool per batch | `asyncio.gather` per batch |
222+
| **Multipart copy** | Thread pool per part | `asyncio.gather` per part |
223+
| **Best for** | Synchronous applications | Async frameworks (FastAPI, aiohttp, etc.) |
224+
225+
### Executor strategy
226+
227+
`S3FileSystem` and `S3File` use a pluggable executor abstraction (`S3Executor`) for
228+
parallel operations. Two implementations are provided:
229+
230+
- `S3ThreadPoolExecutor` — wraps `ThreadPoolExecutor` (default for sync usage)
231+
- `S3AioExecutor` — dispatches work via `asyncio.run_coroutine_threadsafe` + `asyncio.to_thread`
232+
233+
`AioS3FileSystem` automatically uses `S3AioExecutor` for file handles, so multipart
234+
uploads and parallel range reads are executed on the event loop without spawning
235+
additional threads.
236+
237+
### Usage with AioS3FSCursor
238+
239+
`AioS3FSCursor` automatically uses `AioS3FileSystem` internally. No additional
240+
configuration is needed:
241+
242+
```python
243+
from pyathena import aio_connect
244+
from pyathena.aio.s3fs.cursor import AioS3FSCursor
245+
246+
async with await aio_connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
247+
region_name="us-west-2") as conn:
248+
cursor = conn.cursor(AioS3FSCursor)
249+
await cursor.execute("SELECT * FROM many_rows")
250+
async for row in cursor:
251+
print(row)
252+
```
253+
254+
### Standalone usage
255+
256+
`AioS3FileSystem` can also be used directly for S3 operations:
257+
258+
```python
259+
from pyathena.filesystem.s3_async import AioS3FileSystem
260+
261+
# Async context
262+
fs = AioS3FileSystem(asynchronous=True)
263+
264+
files = await fs._ls("s3://my-bucket/data/")
265+
data = await fs._cat_file("s3://my-bucket/data/file.csv")
266+
await fs._rm("s3://my-bucket/data/old/", recursive=True)
267+
268+
# Sync wrappers are auto-generated by fsspec
269+
files = fs.ls("s3://my-bucket/data/")
270+
```

docs/api/filesystem.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,27 @@ S3 FileSystem
1414
.. autoclass:: pyathena.filesystem.s3.S3File
1515
:members:
1616

17+
Async S3 FileSystem
18+
-------------------
19+
20+
.. autoclass:: pyathena.filesystem.s3_async.AioS3FileSystem
21+
:members:
22+
23+
.. autoclass:: pyathena.filesystem.s3_async.AioS3File
24+
:members:
25+
26+
S3 Executor
27+
-----------
28+
29+
.. autoclass:: pyathena.filesystem.s3_executor.S3Executor
30+
:members:
31+
32+
.. autoclass:: pyathena.filesystem.s3_executor.S3ThreadPoolExecutor
33+
:members:
34+
35+
.. autoclass:: pyathena.filesystem.s3_executor.S3AioExecutor
36+
:members:
37+
1738
S3 Objects
1839
----------
1940

0 commit comments

Comments
 (0)