Skip to content

Commit ea13eeb

Browse files
committed
ci: gate the tree-sitter job on tree-sitter/** changes; parallelize generate
The derived tree-sitter parser is a pure function of the committed tree-sitter/** (grammar.js + scanner.c + queries), and the `test` job fails if those drift from the grammar sources — so every grammar change necessarily lands as a tree-sitter/** diff. Re-running the ~5-min `tree-sitter generate` when nothing under tree-sitter/** changed was pure waste on every push. - Gate the job's expensive steps on a tree-sitter/** diff. The job still runs and reports success, so a required status check is never pending. - Run the 6-grammar conflict gate in parallel (was sequential ~12 min → the slowest single grammar) and build the wasms from the parser.c just generated, dropping the redundant per-grammar re-generate. - schedule (nightly) + workflow_dispatch force a full run, covering the one input the diff can't see (a tree-sitter-cli bump in the lockfile) and re-verifying the "beats official" accuracy claim. State count is at the floor for a unified-grammar-derived parser (#46), so this addresses the generate cost at the test-harness layer instead.
1 parent c17c521 commit ea13eeb

1 file changed

Lines changed: 63 additions & 31 deletions

File tree

.github/workflows/ci.yml

Lines changed: 63 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ on:
44
push:
55
branches: [master]
66
pull_request:
7+
# Nightly + on-demand FULL run: the tree-sitter job below only generates when tree-sitter/**
8+
# changed (the materialized grammar is its sole input), so these backstop the one input it can't
9+
# see in that diff — a tree-sitter-cli bump (lockfile) — and re-verify the "beats official" claim.
10+
schedule:
11+
- cron: '0 9 * * *'
12+
workflow_dispatch:
713

814
permissions:
915
contents: read
@@ -53,43 +59,69 @@ jobs:
5359
runs-on: ubuntu-latest
5460
steps:
5561
- uses: actions/checkout@v4
62+
with:
63+
fetch-depth: 0 # need history to diff against the base for the path gate below
64+
65+
# `tree-sitter generate` is ~5 min for the TS grammar (issue #46: the state count is at the
66+
# floor for a unified-grammar-derived parser, so the cost is irreducible) — but the generated
67+
# parser is a PURE FUNCTION of the committed tree-sitter/** (grammar.js + scanner.c + queries),
68+
# and the `test` job fails if those drift from the grammar sources, so EVERY grammar change
69+
# necessarily lands as a tree-sitter/** diff. Re-running generate when nothing under
70+
# tree-sitter/** changed is pure waste, so gate the expensive steps on it. The job still RUNS
71+
# (reports success) — only the steps are skipped — so a required status check is never pending.
72+
# schedule / workflow_dispatch force the full run regardless (the lockfile/cli-bump backstop).
73+
- name: Did the tree-sitter inputs change?
74+
id: changed
75+
run: |
76+
if [ "${{ github.event_name }}" != "push" ] && [ "${{ github.event_name }}" != "pull_request" ]; then
77+
echo "value=true" >> "$GITHUB_OUTPUT"; echo "forced full run (${{ github.event_name }})"; exit 0
78+
fi
79+
if [ "${{ github.event_name }}" = "pull_request" ]; then base="${{ github.event.pull_request.base.sha }}"; else base="${{ github.event.before }}"; fi
80+
if [ -z "$base" ] || ! git cat-file -e "$base^{commit}" 2>/dev/null; then
81+
echo "value=true" >> "$GITHUB_OUTPUT"; echo "no usable base — running the gate"; exit 0
82+
fi
83+
if git diff --name-only "$base" HEAD | grep -qE '^tree-sitter/'; then
84+
echo "value=true" >> "$GITHUB_OUTPUT"; echo "tree-sitter/** changed — running the gate"
85+
else
86+
echo "value=false" >> "$GITHUB_OUTPUT"; echo "no tree-sitter/** change — skipping generate/build/bench"
87+
fi
88+
5689
- uses: actions/setup-node@v4
90+
if: steps.changed.outputs.value == 'true'
5791
with:
5892
node-version: 24
59-
- run: npm ci
93+
- if: steps.changed.outputs.value == 'true'
94+
run: npm ci
6095

61-
# Cheap LR-conflict gate: `tree-sitter generate` (no wasm) for every derived
62-
# grammar that is a tree-sitter target, so a conflict introduced by a grammar
63-
# change is caught even for the dialects whose wasm is not built below (tsx/js/jsx)
64-
# exactly the gap that let an unresolved `type`/`class_heritage` conflict ship.
65-
# yaml is now included (issue #3): its indent/scalar tokens are wired as tree-sitter
66-
# externals and the C indentation scanner is implemented, so its grammar generates + builds.
67-
- name: Generate every derived tree-sitter grammar (conflict gate, no wasm)
96+
# Conflict gate: `tree-sitter generate` for every derived grammar IN PARALLEL (was sequential
97+
# ~12 min; parallel ≈ the slowest single grammar, ts/tsx ~5 min). A conflict introduced by a
98+
# grammar change is caught even for the dialects whose wasm is not built below (tsx/js/jsx)
99+
# exactly the gap that once let an unresolved `type`/`class_heritage` conflict ship. yaml
100+
# included (issue #3): its indent/scalar externals + C scanner make it generate + build.
101+
- name: Generate every derived tree-sitter grammar (parallel conflict gate)
102+
if: steps.changed.outputs.value == 'true'
68103
run: |
69-
for g in typescript typescriptreact javascript javascriptreact html yaml; do
70-
echo "── tree-sitter generate: $g"
71-
( cd "tree-sitter/$g" && npx tree-sitter generate )
104+
langs=(typescript typescriptreact javascript javascriptreact html yaml)
105+
pids=()
106+
for g in "${langs[@]}"; do
107+
( cd "tree-sitter/$g" && npx tree-sitter generate ) >"/tmp/gen-$g.log" 2>&1 &
108+
pids+=($!)
109+
done
110+
fail=0
111+
for i in "${!langs[@]}"; do
112+
if wait "${pids[$i]}"; then echo "✓ ${langs[$i]}"; else echo "✗ ${langs[$i]}"; cat "/tmp/gen-${langs[$i]}.log"; fail=1; fi
72113
done
114+
exit $fail
73115
74-
- name: Build the derived tree-sitter grammar to wasm
116+
# Build the gated wasms FROM the parser.c just generated (no re-generate) and run the accuracy
117+
# benches: ts must beat official (the thesis proof), html vs parse5. The YAML wasm is built to
118+
# prove its C indentation scanner compiles + links; its accuracy bench needs the yaml-test-suite
119+
# checkout, so it runs in the readme-bench workflow.
120+
- name: Build wasm + accuracy gate (typescript / html / yaml)
121+
if: steps.changed.outputs.value == 'true'
75122
run: |
76-
cd tree-sitter/typescript
77-
npx tree-sitter generate
78-
npx tree-sitter build --wasm .
79-
- name: Tree-sitter accuracy gate (≥ floor, must beat official)
80-
run: node test/treesitter-bench.ts
81-
- name: Build + gate the derived HTML tree-sitter grammar (v1, vs parse5)
82-
run: |
83-
cd tree-sitter/html
84-
npx tree-sitter generate
85-
npx tree-sitter build --wasm .
86-
cd ../..
123+
( cd tree-sitter/typescript && npx tree-sitter build --wasm . )
124+
( cd tree-sitter/html && npx tree-sitter build --wasm . )
125+
( cd tree-sitter/yaml && npx tree-sitter build --wasm . )
126+
node test/treesitter-bench.ts
87127
node test/html-treesitter.ts
88-
# The derived YAML tree-sitter (issue #3) — build the wasm (its C indentation scanner must
89-
# compile + link). The accuracy bench (test/treesitter-yaml-bench.ts) needs the yaml-test-suite
90-
# checkout, so it runs in the readme-bench workflow where the suite is already cloned.
91-
- name: Build the derived YAML tree-sitter grammar to wasm
92-
run: |
93-
cd tree-sitter/yaml
94-
npx tree-sitter generate
95-
npx tree-sitter build --wasm .

0 commit comments

Comments
 (0)