Skip to content

Commit 1e5e7b5

Browse files
timsaucerclaude
andcommitted
docs: execute examples via myst-nb; native tables and validated refs
Removes the last RST-syntax islands from the converted MyST markdown so the docs are markdown-native for both human and LLM authors. Executable examples (A): replace IPython.sphinxext.ipython_directive with myst-nb. The 83 `{eval-rst}` + `.. ipython:: python` blocks become native `{code-cell} ipython3` blocks, and the 14 pages that carry them gain jupytext/kernelspec front matter so myst-nb runs them. conf.py routes .md through myst-nb with nb_execution_mode="force" and nb_execution_raise_on_error=True, so a failing example now fails the build. myst-nb gives each page its own kernel instead of the IPython directive's single namespace shared across all documents in build order. That isolation surfaced expressions.md, which only ever worked by inheriting `col`/`lit` from an earlier-built page — it now imports them itself. It also changes the execution working directory to each page's own folder, so build.sh symlinks the example data next to every page that reads it by relative name and registers the python3 kernel; CI now calls build.sh so it matches local. Tables (B): the 3 `.. list-table::` directives become GFM markdown tables. Cross-references (C): the two intra-page links in distributing-work.md that the conversion left as undefined markdown references (and that built green while rendering literal brackets) become `{ref}` roles backed by explicit `(label)=` targets, so a future break fails the build instead of shipping silently. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 36d8dcc commit 1e5e7b5

21 files changed

Lines changed: 1631 additions & 907 deletions

.github/workflows/build.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -552,9 +552,10 @@ jobs:
552552
run: |
553553
set -x
554554
cd docs
555-
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
556-
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
557-
uv run --no-project make html
555+
# build.sh downloads the example data, registers the Jupyter kernel
556+
# myst-nb needs, symlinks the data next to each executed page, and
557+
# runs sphinx. Using it here keeps CI identical to a local build.
558+
uv run --no-project bash ./build.sh
558559
559560
- name: Copy & push the generated HTML
560561
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref_type == 'tag')

docs/build.sh

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,21 @@ rm -rf build 2> /dev/null
3636
rm -rf temp 2> /dev/null
3737
mkdir temp
3838
cp -rf source/* temp/
39+
40+
# myst-nb executes each page as a notebook from the directory that page
41+
# lives in, so the example data files must sit alongside every page that
42+
# loads them by relative name (e.g. `ctx.read_csv("pokemon.csv")`). Symlink
43+
# them into each directory that has such a page rather than copying the
44+
# 20 MB parquet repeatedly.
45+
for d in temp temp/user-guide temp/user-guide/common-operations; do
46+
ln -sf "$script_dir/pokemon.csv" "$d/pokemon.csv"
47+
ln -sf "$script_dir/yellow_tripdata_2021-01.parquet" "$d/yellow_tripdata_2021-01.parquet"
48+
done
49+
50+
# myst-nb runs `{code-cell}` blocks against a Jupyter kernel named "python3".
51+
# Register the active environment's interpreter as that kernel (idempotent).
52+
python -m ipykernel install --sys-prefix --name python3 --display-name "Python 3"
53+
3954
make SOURCEDIR=`pwd`/temp html
4055

4156
cd "$original_dir" || exit

docs/source/conf.py

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,20 +48,33 @@
4848
extensions = [
4949
"sphinx.ext.mathjax",
5050
"sphinx.ext.napoleon",
51-
"myst_parser",
52-
"IPython.sphinxext.ipython_directive",
51+
# myst_nb is a superset of myst_parser: it provides the MyST markdown
52+
# parser plus executable `{code-cell}` notebook directives. Do NOT also
53+
# list "myst_parser" — myst_nb activates it internally and listing both
54+
# raises an extension conflict.
55+
"myst_nb",
5356
"autoapi.extension",
5457
]
5558

5659
# NOTE: .rst stays alongside .md because sphinx-autoapi generates RST
5760
# under autoapi/ and Sphinx needs the suffix to parse it. The human-
58-
# authored docs are all MyST .md now; the .rst entry is only for the
59-
# autoapi build artifacts.
61+
# authored docs are all MyST .md now. ".md" is routed through myst-nb so
62+
# pages carrying jupytext/kernelspec front matter execute their
63+
# `{code-cell}` blocks; pages without that front matter render as plain
64+
# MyST markdown. The ".rst" entry is only for the autoapi build artifacts.
6065
source_suffix = {
6166
".rst": "restructuredtext",
62-
".md": "markdown",
67+
".md": "myst-nb",
6368
}
6469

70+
# Execute notebook code cells at build time and fail the build if any cell
71+
# raises — this replaces the old IPython sphinx directive, whose executed
72+
# examples are now `{code-cell}` blocks. "force" re-executes every build so
73+
# stale cached output can never ship.
74+
nb_execution_mode = "force"
75+
nb_execution_timeout = 120
76+
nb_execution_raise_on_error = True
77+
6578
# Add any paths that contain templates here, relative to this directory.
6679
templates_path = ["_templates"]
6780

docs/source/index.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
kernelspec:
7+
name: python3
8+
display_name: Python 3
9+
---
110
<!---
211
Licensed to the Apache Software Foundation (ASF) under one
312
or more contributor license agreements. See the NOTICE file
@@ -39,16 +48,14 @@ pip install datafusion
3948

4049
## Example
4150

42-
```{eval-rst}
43-
.. ipython:: python
51+
```{code-cell} ipython3
52+
from datafusion import SessionContext
4453
45-
from datafusion import SessionContext
54+
ctx = SessionContext()
4655
47-
ctx = SessionContext()
56+
df = ctx.read_csv("pokemon.csv")
4857
49-
df = ctx.read_csv("pokemon.csv")
50-
51-
df.show()
58+
df.show()
5259
5360
```
5461

docs/source/user-guide/basics.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
kernelspec:
7+
name: python3
8+
display_name: Python 3
9+
---
110
<!---
211
Licensed to the Apache Software Foundation (ASF) under one
312
or more contributor license agreements. See the NOTICE file
@@ -25,22 +34,20 @@ In this section, we will cover a basic example to introduce a few key concepts.
2534
2021 Yellow Taxi Trip Records ([download](https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet)),
2635
from the [TLC Trip Record Data](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page).
2736

28-
```{eval-rst}
29-
.. ipython:: python
37+
```{code-cell} ipython3
38+
from datafusion import SessionContext, col, lit, functions as f
3039
31-
from datafusion import SessionContext, col, lit, functions as f
40+
ctx = SessionContext()
3241
33-
ctx = SessionContext()
34-
35-
df = ctx.read_parquet("yellow_tripdata_2021-01.parquet")
42+
df = ctx.read_parquet("yellow_tripdata_2021-01.parquet")
3643
37-
df = df.select(
38-
"trip_distance",
39-
col("total_amount").alias("total"),
40-
(f.round(lit(100.0) * col("tip_amount") / col("total_amount"), lit(1))).alias("tip_percent"),
41-
)
44+
df = df.select(
45+
"trip_distance",
46+
col("total_amount").alias("total"),
47+
(f.round(lit(100.0) * col("tip_amount") / col("total_amount"), lit(1))).alias("tip_percent"),
48+
)
4249
43-
df.show()
50+
df.show()
4451
```
4552

4653
## Session Context

0 commit comments

Comments
 (0)