@@ -193,3 +193,78 @@ The `as_pandas()`, `as_arrow()`, and `as_polars()` convenience methods operate o
193193already-loaded data and remain synchronous.
194194
195195See each cursor's documentation page for detailed usage examples.
196+
197+ (aio-s3-filesystem)=
198+
199+ ## AioS3FileSystem
200+
201+ ` AioS3FileSystem ` is a native asyncio filesystem interface for Amazon S3, built on
202+ fsspec's ` AsyncFileSystem ` . It provides the same functionality as ` S3FileSystem ` but
203+ uses ` asyncio.gather ` with ` asyncio.to_thread ` for parallel operations instead of
204+ ` ThreadPoolExecutor ` .
205+
206+ ### Why AioS3FileSystem?
207+
208+ The synchronous ` S3FileSystem ` uses ` ThreadPoolExecutor ` for parallel S3 operations
209+ (batch deletes, multipart uploads, range reads). When used from within an asyncio
210+ application via ` AioS3FSCursor ` , this creates a thread-in-thread pattern:
211+ the cursor wraps calls in ` asyncio.to_thread() ` , and inside that thread
212+ ` S3FileSystem ` spawns additional threads via ` ThreadPoolExecutor ` .
213+
214+ ` AioS3FileSystem ` eliminates this inefficiency by dispatching all parallel
215+ operations through the asyncio event loop.
216+
217+ | | S3FileSystem | AioS3FileSystem |
218+ | ---| ---| ---|
219+ | ** Parallelism** | ` ThreadPoolExecutor ` | ` asyncio.gather ` + ` asyncio.to_thread ` |
220+ | ** File handles** | ` S3File ` with thread pool | ` AioS3File ` with ` S3AioExecutor ` |
221+ | ** Bulk delete** | Thread pool per batch | ` asyncio.gather ` per batch |
222+ | ** Multipart copy** | Thread pool per part | ` asyncio.gather ` per part |
223+ | ** Best for** | Synchronous applications | Async frameworks (FastAPI, aiohttp, etc.) |
224+
225+ ### Executor strategy
226+
227+ ` S3FileSystem ` and ` S3File ` use a pluggable executor abstraction (` S3Executor ` ) for
228+ parallel operations. Two implementations are provided:
229+
230+ - ` S3ThreadPoolExecutor ` — wraps ` ThreadPoolExecutor ` (default for sync usage)
231+ - ` S3AioExecutor ` — dispatches work via ` asyncio.run_coroutine_threadsafe ` + ` asyncio.to_thread `
232+
233+ ` AioS3FileSystem ` automatically uses ` S3AioExecutor ` for file handles, so multipart
234+ uploads and parallel range reads are executed on the event loop without spawning
235+ additional threads.
236+
237+ ### Usage with AioS3FSCursor
238+
239+ ` AioS3FSCursor ` automatically uses ` AioS3FileSystem ` internally. No additional
240+ configuration is needed:
241+
242+ ``` python
243+ from pyathena import aio_connect
244+ from pyathena.aio.s3fs.cursor import AioS3FSCursor
245+
246+ async with await aio_connect(s3_staging_dir = " s3://YOUR_S3_BUCKET/path/to/" ,
247+ region_name = " us-west-2" ) as conn:
248+ cursor = conn.cursor(AioS3FSCursor)
249+ await cursor.execute(" SELECT * FROM many_rows" )
250+ async for row in cursor:
251+ print (row)
252+ ```
253+
254+ ### Standalone usage
255+
256+ ` AioS3FileSystem ` can also be used directly for S3 operations:
257+
258+ ``` python
259+ from pyathena.filesystem.s3_async import AioS3FileSystem
260+
261+ # Async context
262+ fs = AioS3FileSystem(asynchronous = True )
263+
264+ files = await fs._ls(" s3://my-bucket/data/" )
265+ data = await fs._cat_file(" s3://my-bucket/data/file.csv" )
266+ await fs._rm(" s3://my-bucket/data/old/" , recursive = True )
267+
268+ # Sync wrappers are auto-generated by fsspec
269+ files = fs.ls(" s3://my-bucket/data/" )
270+ ```
0 commit comments