Skip to content

Commit 31af950

Browse files
ci: introduce markdownlint-cli2 and verify docs build on PRs (closes #717)
Lint Markdown under docs/ and project-root *.md, and run sphinx-multiversion build on docs-touching PRs to catch syntax/reference errors before merge. - Pin markdownlint-cli2 0.18.1 via .mise.toml (npm backend; node auto-resolved) - Add .markdownlint-cli2.jsonc tuned to existing PyAthena docs style: dash bullets, 2-space indent, allow inline HTML used in MyST/README - Add make docs-lint / docs-lint-fix targets that shell out via mise exec - Add .github/workflows/docs-lint.yaml: lint + sphinx-multiversion build, triggered only on docs/**, **.md, and the workflow/config files themselves - Fix 5 remaining manual violations after auto-fix: - MD051: use MyST {ref}`pandas-cursor` instead of broken HTML anchor - MD036: promote three bold-as-heading spans in pandas.md to h4 - MD001: add "Basic usage" h2 wrapper in sqlalchemy.md - Auto-fix touched README.md, CLAUDE.md, docs/*.md (mostly bullet style) - Document mise-based local workflow in CLAUDE.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 7a6326f commit 31af950

12 files changed

Lines changed: 175 additions & 68 deletions

File tree

.github/workflows/docs-lint.yaml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Docs Lint
2+
3+
on:
4+
pull_request:
5+
paths:
6+
- 'docs/**'
7+
- '**.md'
8+
- '.markdownlint-cli2.jsonc'
9+
- '.mise.toml'
10+
- 'Makefile'
11+
- '.github/workflows/docs-lint.yaml'
12+
13+
permissions:
14+
contents: read
15+
16+
jobs:
17+
lint:
18+
runs-on: ubuntu-latest
19+
steps:
20+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
21+
- uses: jdx/mise-action@1648a7812b9aeae629881980618f079932869151 # v4.0.1
22+
- run: make docs-lint
23+
24+
build:
25+
runs-on: ubuntu-latest
26+
steps:
27+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
28+
with:
29+
fetch-depth: 0 # Fetch all history for sphinx-multiversion
30+
- uses: jdx/mise-action@1648a7812b9aeae629881980618f079932869151 # v4.0.1
31+
- uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7.6.0
32+
with:
33+
enable-cache: true
34+
- run: |
35+
uv sync --group dev
36+
make docs

.markdownlint-cli2.jsonc

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
{
2+
// https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md
3+
"config": {
4+
"default": true,
5+
// Line length: docs have long URLs, code blocks, and prose lines
6+
"MD013": false,
7+
// Match existing style: dash bullets, 2-space nested indent
8+
"MD004": { "style": "dash" },
9+
"MD007": { "indent": 2 },
10+
// Allow inline HTML used by README (centered images) and MyST admonitions
11+
"MD033": {
12+
"allowed_elements": ["details", "summary", "br", "kbd", "sub", "sup", "div", "img", "p", "a"]
13+
},
14+
// Cursor docs intentionally repeat subsection names ("Basic usage") under different cursors
15+
"MD024": { "siblings_only": true },
16+
// Don't require fenced code blocks to specify a language
17+
"MD040": false,
18+
// Don't require blank lines around code fences / lists — existing docs intermix them
19+
"MD031": false,
20+
"MD032": false,
21+
// Allow `$ command` style in shell snippets (testing.md, README)
22+
"MD014": false,
23+
// Allow bare URLs
24+
"MD034": false,
25+
// First line need not be a top-level heading (some files start with MyST ref targets)
26+
"MD041": false
27+
},
28+
"globs": [
29+
"docs/**/*.md",
30+
"*.md"
31+
],
32+
"ignores": [
33+
"docs/_build/**",
34+
"node_modules/**",
35+
".venv/**",
36+
".tox/**",
37+
".pytest_cache/**",
38+
".serena/**"
39+
]
40+
}

.mise.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
[tools]
22
python = "3.12"
3+
"npm:markdownlint-cli2" = "0.18.1"

CLAUDE.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,31 @@
11
# PyAthena Development Guide for AI Assistants
22

33
## Project Overview
4+
45
PyAthena is a Python DB API 2.0 (PEP 249) compliant client for Amazon Athena. See `pyproject.toml` for Python version support and dependencies.
56

67
## Rules and Constraints
78

89
### Git Workflow
10+
911
- **NEVER** commit directly to `master` — always create a feature branch and PR
1012
- Create PRs as drafts: `gh pr create --draft`
1113

1214
### Import Rules
15+
1316
- **NEVER** use runtime imports (inside functions, methods, or conditional blocks)
1417
- All imports must be at the top of the file, after the license header
1518
- Exception: the existing codebase uses runtime imports for optional dependencies (`pyarrow`, `pandas`, etc.) in source code. For new code, use `TYPE_CHECKING` instead when possible
1619

1720
### Code Quality — Always Run Before Committing
21+
1822
```bash
1923
make format # Auto-fix formatting and imports
2024
make lint # Lint + format check + mypy
2125
```
2226

2327
### Testing
28+
2429
```bash
2530
# ALWAYS run `make lint` first — tests will fail if lint doesn't pass
2631
make test # Unit tests (runs chk first)
@@ -43,6 +48,7 @@ export $(cat .env | xargs) && uv run pytest tests/pyathena/test_file.py -v
4348
- New features require tests; changes to SQLAlchemy dialects must pass `make test-sqla`
4449

4550
#### Test Conventions
51+
4652
- **Class-based tests** for integration tests that use fixtures (cursors, engines): `class TestCursor:` with methods like `def test_fetchone(self, cursor):`
4753
- **Standalone functions** for unit tests of pure logic (converters, parsers, utils): `def test_to_struct_json_formats(input_value, expected):`
4854
- Test file naming mirrors source: `pyathena/parser.py``tests/pyathena/test_parser.py`
@@ -55,24 +61,41 @@ export $(cat .env | xargs) && uv run pytest tests/pyathena/test_file.py -v
5561
- **Parametrize** with `@pytest.mark.parametrize(("input", "expected"), [...])` for data-driven tests
5662
- **Integration tests** (need AWS) use cursor/engine fixtures with real Athena queries; **unit tests** (no AWS) call functions directly with test data
5763

64+
### Markdown Lint
65+
66+
`docs/**/*.md` and project-root `*.md` files are linted with [markdownlint-cli2](https://github.com/DavidAnson/markdownlint-cli2). The config lives at `.markdownlint-cli2.jsonc`. CI runs lint + Sphinx build on PRs that touch docs (`.github/workflows/docs-lint.yaml`).
67+
68+
`markdownlint-cli2` is pinned in `.mise.toml` (along with `node`), so [`mise`](https://mise.jdx.dev/) installs the exact version used in CI. Run locally:
69+
70+
```bash
71+
mise install # one-time: installs node + markdownlint-cli2
72+
make docs-lint # check
73+
make docs-lint-fix # auto-fix what's possible
74+
```
75+
5876
## Architecture — Key Design Decisions
5977

6078
These are non-obvious conventions that can't be discovered by reading code alone.
6179

6280
### PEP 249 Compliance
81+
6382
All cursor types must implement: `execute()`, `fetchone()`, `fetchmany()`, `fetchall()`, `close()`. New cursor features must follow the DB API 2.0 specification.
6483

6584
### Cursor Module Pattern
85+
6686
Each cursor type lives in its own subpackage (`pandas/`, `arrow/`, `polars/`, `s3fs/`, `spark/`) with a consistent structure: `cursor.py`, `async_cursor.py`, `converter.py`, `result_set.py`. When adding features, consider impact on all cursor types.
6787

6888
### Filesystem (fsspec) Compatibility
89+
6990
`pyathena/filesystem/s3.py` implements fsspec's `AbstractFileSystem`. When modifying:
7091
- Match `s3fs` library behavior where possible (users migrate from it)
7192
- Use `delimiter="/"` in S3 API calls to minimize requests
7293
- Handle edge cases: empty paths, trailing slashes, bucket-only paths
7394

7495
### Version Management
96+
7597
Versions are derived from git tags via `hatch-vcs` — never edit `pyathena/_version.py` manually.
7698

7799
### Google-style Docstrings
100+
78101
Use Google-style docstrings for public methods. See existing code for examples.

Makefile

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,14 @@ docs:
3636
echo 'pyathena.dev' > docs/_build/html/CNAME
3737
touch docs/_build/html/.nojekyll
3838

39+
.PHONY: docs-lint
40+
docs-lint:
41+
mise exec -- markdownlint-cli2
42+
43+
.PHONY: docs-lint-fix
44+
docs-lint-fix:
45+
mise exec -- markdownlint-cli2 --fix
46+
3947
.PHONY: tool
4048
tool:
4149
uv tool install ruff@$(RUFF_VERSION)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ PyAthena is a Python [DB API 2.0 (PEP 249)](https://www.python.org/dev/peps/pep-
2121

2222
## Requirements
2323

24-
* Python
24+
- Python
2525

2626
- CPython 3.10, 3.11, 3.12, 3.13, 3.14
2727

docs/cursor.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,6 @@ cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
293293
region_name="us-west-2").cursor(cursor=AsyncDictCursor, dict_type=OrderedDict)
294294
```
295295

296-
297296
## AioCursor
298297

299298
See {ref}`aio-cursor`.

docs/introduction.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
## Requirements
88

9-
* Python
9+
- Python
1010

1111
- CPython 3.10, 3.11, 3.12, 3.13, 3.14
1212

@@ -35,23 +35,23 @@ Extra packages:
3535
PyAthena provides comprehensive support for Amazon Athena's data types and features:
3636

3737
**Core Features:**
38-
- **DB API 2.0 Compliance**: Full PEP 249 compatibility for database operations
39-
- **SQLAlchemy Integration**: Native dialect support with table reflection and ORM capabilities
40-
- **Multiple Cursor Types**: Standard, Pandas, Arrow, Polars, S3FS and Spark cursor implementations
41-
- **Async Support**: Asynchronous query execution for non-blocking operations
38+
- **DB API 2.0 Compliance**: Full PEP 249 compatibility for database operations
39+
- **SQLAlchemy Integration**: Native dialect support with table reflection and ORM capabilities
40+
- **Multiple Cursor Types**: Standard, Pandas, Arrow, Polars, S3FS and Spark cursor implementations
41+
- **Async Support**: Asynchronous query execution for non-blocking operations
4242

4343
**Data Type Support:**
44-
- **STRUCT/ROW Types**: {ref}`Complete support <sqlalchemy>` for complex nested data structures
45-
- **ARRAY Types**: {ref}`Complete support <sqlalchemy>` for ordered collections with automatic Python list conversion
46-
- **MAP Types**: {ref}`Complete support <sqlalchemy>` for key-value dictionary-like data structures
47-
- **JSON Integration**: Seamless JSON data parsing and conversion
48-
- **Performance Optimized**: Smart format detection for efficient data processing
44+
- **STRUCT/ROW Types**: {ref}`Complete support <sqlalchemy>` for complex nested data structures
45+
- **ARRAY Types**: {ref}`Complete support <sqlalchemy>` for ordered collections with automatic Python list conversion
46+
- **MAP Types**: {ref}`Complete support <sqlalchemy>` for key-value dictionary-like data structures
47+
- **JSON Integration**: Seamless JSON data parsing and conversion
48+
- **Performance Optimized**: Smart format detection for efficient data processing
4949

5050
**Additional Features:**
51-
- **Connection Management**: Efficient connection pooling and configuration
52-
- **Result Caching**: Athena query result reuse capabilities
53-
- **Error Handling**: Comprehensive exception handling and recovery
54-
- **S3 Integration**: Direct S3 data access and staging support
51+
- **Connection Management**: Efficient connection pooling and configuration
52+
- **Result Caching**: Athena query result reuse capabilities
53+
- **Error Handling**: Comprehensive exception handling and recovery
54+
- **S3 Integration**: Direct S3 data access and staging support
5555

5656
(license)=
5757

docs/pandas.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ df = as_pandas(cursor)
3333
print(df.describe())
3434
```
3535

36-
If you want to use the query results output to S3 directly, you can use [PandasCursor](#pandas-cursor).
36+
If you want to use the query results output to S3 directly, you can use {ref}`pandas-cursor`.
3737
This cursor fetches query results faster than the default cursor. (See [benchmark results](https://github.com/pyathena-dev/PyAthena/tree/master/benchmarks).)
3838

3939
(to-sql)=
@@ -392,7 +392,7 @@ for df in df_iter:
392392
print(df.head())
393393
```
394394

395-
**Memory-efficient iteration with iter_chunks()**
395+
#### Memory-efficient iteration with iter_chunks()
396396

397397
PandasCursor provides an `iter_chunks()` method for convenient chunked processing:
398398

@@ -456,7 +456,7 @@ df_iter.get_chunk(10)
456456
df_iter.get_chunk(10) # raise StopIteration
457457
```
458458

459-
**Auto-optimization of chunksize**
459+
#### Auto-optimization of chunksize
460460

461461
PandasCursor can automatically determine optimal chunksize based on result file size when enabled:
462462

@@ -506,7 +506,7 @@ AthenaPandasResultSet.AUTO_CHUNK_SIZE_LARGE = 200_000 # Larger chunks
506506
AthenaPandasResultSet.AUTO_CHUNK_SIZE_MEDIUM = 100_000
507507
```
508508

509-
**Performance tuning options**
509+
#### Performance tuning options
510510

511511
PandasCursor accepts additional pandas.read_csv() options for performance optimization:
512512

@@ -829,4 +829,3 @@ async with await aio_connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
829829
await cursor.execute("SELECT * FROM many_rows")
830830
df = cursor.as_pandas()
831831
```
832-

docs/polars.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -649,4 +649,3 @@ async with await aio_connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
649649
await cursor.execute("SELECT * FROM many_rows")
650650
df = cursor.as_polars()
651651
```
652-

0 commit comments

Comments
 (0)