You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ci: introduce markdownlint-cli2 and verify docs build on PRs (closes#717)
Lint Markdown under docs/ and project-root *.md, and run sphinx-multiversion
build on docs-touching PRs to catch syntax/reference errors before merge.
- Pin markdownlint-cli2 0.18.1 via .mise.toml (npm backend; node auto-resolved)
- Add .markdownlint-cli2.jsonc tuned to existing PyAthena docs style:
dash bullets, 2-space indent, allow inline HTML used in MyST/README
- Add make docs-lint / docs-lint-fix targets that shell out via mise exec
- Add .github/workflows/docs-lint.yaml: lint + sphinx-multiversion build,
triggered only on docs/**, **.md, and the workflow/config files themselves
- Fix 5 remaining manual violations after auto-fix:
- MD051: use MyST {ref}`pandas-cursor` instead of broken HTML anchor
- MD036: promote three bold-as-heading spans in pandas.md to h4
- MD001: add "Basic usage" h2 wrapper in sqlalchemy.md
- Auto-fix touched README.md, CLAUDE.md, docs/*.md (mostly bullet style)
- Document mise-based local workflow in CLAUDE.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CLAUDE.md
+23Lines changed: 23 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,26 +1,31 @@
1
1
# PyAthena Development Guide for AI Assistants
2
2
3
3
## Project Overview
4
+
4
5
PyAthena is a Python DB API 2.0 (PEP 249) compliant client for Amazon Athena. See `pyproject.toml` for Python version support and dependencies.
5
6
6
7
## Rules and Constraints
7
8
8
9
### Git Workflow
10
+
9
11
-**NEVER** commit directly to `master` — always create a feature branch and PR
10
12
- Create PRs as drafts: `gh pr create --draft`
11
13
12
14
### Import Rules
15
+
13
16
-**NEVER** use runtime imports (inside functions, methods, or conditional blocks)
14
17
- All imports must be at the top of the file, after the license header
15
18
- Exception: the existing codebase uses runtime imports for optional dependencies (`pyarrow`, `pandas`, etc.) in source code. For new code, use `TYPE_CHECKING` instead when possible
16
19
17
20
### Code Quality — Always Run Before Committing
21
+
18
22
```bash
19
23
make format # Auto-fix formatting and imports
20
24
make lint # Lint + format check + mypy
21
25
```
22
26
23
27
### Testing
28
+
24
29
```bash
25
30
# ALWAYS run `make lint` first — tests will fail if lint doesn't pass
- New features require tests; changes to SQLAlchemy dialects must pass `make test-sqla`
44
49
45
50
#### Test Conventions
51
+
46
52
-**Class-based tests** for integration tests that use fixtures (cursors, engines): `class TestCursor:` with methods like `def test_fetchone(self, cursor):`
47
53
-**Standalone functions** for unit tests of pure logic (converters, parsers, utils): `def test_to_struct_json_formats(input_value, expected):`
48
54
- Test file naming mirrors source: `pyathena/parser.py` → `tests/pyathena/test_parser.py`
-**Parametrize** with `@pytest.mark.parametrize(("input", "expected"), [...])` for data-driven tests
56
62
-**Integration tests** (need AWS) use cursor/engine fixtures with real Athena queries; **unit tests** (no AWS) call functions directly with test data
57
63
64
+
### Markdown Lint
65
+
66
+
`docs/**/*.md` and project-root `*.md` files are linted with [markdownlint-cli2](https://github.com/DavidAnson/markdownlint-cli2). The config lives at `.markdownlint-cli2.jsonc`. CI runs lint + Sphinx build on PRs that touch docs (`.github/workflows/docs-lint.yaml`).
67
+
68
+
`markdownlint-cli2` is pinned in `.mise.toml` (along with `node`), so [`mise`](https://mise.jdx.dev/) installs the exact version used in CI. Run locally:
69
+
70
+
```bash
71
+
mise install # one-time: installs node + markdownlint-cli2
72
+
make docs-lint # check
73
+
make docs-lint-fix # auto-fix what's possible
74
+
```
75
+
58
76
## Architecture — Key Design Decisions
59
77
60
78
These are non-obvious conventions that can't be discovered by reading code alone.
61
79
62
80
### PEP 249 Compliance
81
+
63
82
All cursor types must implement: `execute()`, `fetchone()`, `fetchmany()`, `fetchall()`, `close()`. New cursor features must follow the DB API 2.0 specification.
64
83
65
84
### Cursor Module Pattern
85
+
66
86
Each cursor type lives in its own subpackage (`pandas/`, `arrow/`, `polars/`, `s3fs/`, `spark/`) with a consistent structure: `cursor.py`, `async_cursor.py`, `converter.py`, `result_set.py`. When adding features, consider impact on all cursor types.
67
87
68
88
### Filesystem (fsspec) Compatibility
89
+
69
90
`pyathena/filesystem/s3.py` implements fsspec's `AbstractFileSystem`. When modifying:
70
91
- Match `s3fs` library behavior where possible (users migrate from it)
71
92
- Use `delimiter="/"` in S3 API calls to minimize requests
Copy file name to clipboardExpand all lines: docs/pandas.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ df = as_pandas(cursor)
33
33
print(df.describe())
34
34
```
35
35
36
-
If you want to use the query results output to S3 directly, you can use [PandasCursor](#pandas-cursor).
36
+
If you want to use the query results output to S3 directly, you can use {ref}`pandas-cursor`.
37
37
This cursor fetches query results faster than the default cursor. (See [benchmark results](https://github.com/pyathena-dev/PyAthena/tree/master/benchmarks).)
38
38
39
39
(to-sql)=
@@ -392,7 +392,7 @@ for df in df_iter:
392
392
print(df.head())
393
393
```
394
394
395
-
**Memory-efficient iteration with iter_chunks()**
395
+
#### Memory-efficient iteration with iter_chunks()
396
396
397
397
PandasCursor provides an `iter_chunks()` method for convenient chunked processing:
398
398
@@ -456,7 +456,7 @@ df_iter.get_chunk(10)
456
456
df_iter.get_chunk(10) # raise StopIteration
457
457
```
458
458
459
-
**Auto-optimization of chunksize**
459
+
#### Auto-optimization of chunksize
460
460
461
461
PandasCursor can automatically determine optimal chunksize based on result file size when enabled:
0 commit comments