You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyAthena is a Python DB API 2.0 (PEP 249) compliant client for Amazon Athena. See `pyproject.toml` for Python version support and dependencies.
5
6
6
7
## Rules and Constraints
7
8
8
9
### Git Workflow
10
+
9
11
-**NEVER** commit directly to `master` — always create a feature branch and PR
10
12
- Create PRs as drafts: `gh pr create --draft`
11
13
12
14
### Import Rules
15
+
13
16
-**NEVER** use runtime imports (inside functions, methods, or conditional blocks)
14
17
- All imports must be at the top of the file, after the license header
15
18
- Exception: the existing codebase uses runtime imports for optional dependencies (`pyarrow`, `pandas`, etc.) in source code. For new code, use `TYPE_CHECKING` instead when possible
16
19
17
20
### Code Quality — Always Run Before Committing
21
+
18
22
```bash
19
23
make format # Auto-fix formatting and imports
20
24
make lint # Lint + format check + mypy
21
25
```
22
26
23
27
### Testing
28
+
24
29
```bash
25
30
# ALWAYS run `make lint` first — tests will fail if lint doesn't pass
26
-
make test# Unit tests (runs chk first)
27
-
make test-sqla # SQLAlchemy dialect tests
31
+
make test/pyathena # Unit tests (runs lint first)
32
+
make test/sqla # SQLAlchemy dialect tests
33
+
make test/sqla-async # SQLAlchemy async dialect tests
28
34
```
29
35
30
36
Tests require AWS environment variables. Use a `.env` file (gitignored):
37
+
31
38
```bash
32
39
AWS_DEFAULT_REGION=<region>
33
40
AWS_ATHENA_S3_STAGING_DIR=s3://<bucket>/<path>/
34
41
AWS_ATHENA_WORKGROUP=<workgroup>
35
42
AWS_ATHENA_SPARK_WORKGROUP=<spark-workgroup>
36
43
```
44
+
37
45
```bash
38
46
export$(cat .env | xargs)&& uv run pytest tests/pyathena/test_file.py -v
- New features require tests; changes to SQLAlchemy dialects must pass `make test-sqla`
44
52
45
53
#### Test Conventions
54
+
46
55
-**Class-based tests** for integration tests that use fixtures (cursors, engines): `class TestCursor:` with methods like `def test_fetchone(self, cursor):`
47
56
-**Standalone functions** for unit tests of pure logic (converters, parsers, utils): `def test_to_struct_json_formats(input_value, expected):`
48
57
- Test file naming mirrors source: `pyathena/parser.py` → `tests/pyathena/test_parser.py`
49
58
-**Fixtures**: Cursor/engine fixtures are defined in `conftest.py` and injected by name (e.g., `cursor`, `engine`, `async_cursor`). Use `indirect=True` parametrization to pass connection options:
-**Parametrize** with `@pytest.mark.parametrize(("input", "expected"), [...])` for data-driven tests
56
67
-**Integration tests** (need AWS) use cursor/engine fixtures with real Athena queries; **unit tests** (no AWS) call functions directly with test data
57
68
69
+
### Markdown Lint
70
+
71
+
`docs/**/*.md` and project-root `*.md` files are linted with [markdownlint-cli2](https://github.com/DavidAnson/markdownlint-cli2). The config lives at `.markdownlint-cli2.jsonc`. CI runs lint + Sphinx build on PRs that touch docs (`.github/workflows/docs-lint.yaml`).
72
+
73
+
`markdownlint-cli2` is pinned in `.mise.toml`, so [`mise`](https://mise.jdx.dev/) installs the exact version used in CI. Run locally:
74
+
75
+
```bash
76
+
mise install # one-time: installs markdownlint-cli2
77
+
make docs/lint # check
78
+
make docs/format # auto-fix what's possible
79
+
make docs/build # build the Sphinx site under docs/_build/html
80
+
```
81
+
58
82
## Architecture — Key Design Decisions
59
83
60
84
These are non-obvious conventions that can't be discovered by reading code alone.
61
85
62
86
### PEP 249 Compliance
87
+
63
88
All cursor types must implement: `execute()`, `fetchone()`, `fetchmany()`, `fetchall()`, `close()`. New cursor features must follow the DB API 2.0 specification.
64
89
65
90
### Cursor Module Pattern
91
+
66
92
Each cursor type lives in its own subpackage (`pandas/`, `arrow/`, `polars/`, `s3fs/`, `spark/`) with a consistent structure: `cursor.py`, `async_cursor.py`, `converter.py`, `result_set.py`. When adding features, consider impact on all cursor types.
67
93
68
94
### Filesystem (fsspec) Compatibility
95
+
69
96
`pyathena/filesystem/s3.py` implements fsspec's `AbstractFileSystem`. When modifying:
97
+
70
98
- Match `s3fs` library behavior where possible (users migrate from it)
71
99
- Use `delimiter="/"` in S3 API calls to minimize requests
Copy file name to clipboardExpand all lines: docs/pandas.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ df = as_pandas(cursor)
33
33
print(df.describe())
34
34
```
35
35
36
-
If you want to use the query results output to S3 directly, you can use [PandasCursor](#pandas-cursor).
36
+
If you want to use the query results output to S3 directly, you can use {ref}`pandas-cursor`.
37
37
This cursor fetches query results faster than the default cursor. (See [benchmark results](https://github.com/pyathena-dev/PyAthena/tree/master/benchmarks).)
38
38
39
39
(to-sql)=
@@ -392,7 +392,7 @@ for df in df_iter:
392
392
print(df.head())
393
393
```
394
394
395
-
**Memory-efficient iteration with iter_chunks()**
395
+
#### Memory-efficient iteration with iter_chunks()
396
396
397
397
PandasCursor provides an `iter_chunks()` method for convenient chunked processing:
398
398
@@ -456,7 +456,7 @@ df_iter.get_chunk(10)
456
456
df_iter.get_chunk(10) # raise StopIteration
457
457
```
458
458
459
-
**Auto-optimization of chunksize**
459
+
#### Auto-optimization of chunksize
460
460
461
461
PandasCursor can automatically determine optimal chunksize based on result file size when enabled:
0 commit comments