Skip to content

Commit feb932c

Browse files
committed
Merge origin/master into feat/1263-freeform-intermediate-match
# Conflicts: # CHANGELOG.md
2 parents 0c65e20 + d12b440 commit feb932c

3 files changed

Lines changed: 256 additions & 2 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
3131

3232
### Internal
3333
- **GFQL / Cypher lowering — scoped failfast for free-form intermediate MATCH after WITH (#1263, #999 partial)**: Replaced the generic `Cypher MATCH after WITH currently requires the trailing MATCH to start from the same carried node alias` error at `_compile_bounded_reentry_query` with a `#1263`-scoped error that names the LDBC SNB IC3 endpoint and points at the design doc for closure. The scoped wording calls out the gap shape — trailing MATCH whose first alias is not in the prefix `WITH`'s carried set — so users can identify whether their query falls into this lane or one of the other open IC3 sub-cases. Two regression-lock tests added: `test_string_cypher_failfast_rejects_simple_freeform_intermediate_reentry_match` (single-alias prefix; isolates the free-form gate without the slice 4.3a/b bare-ref interaction) and `test_string_cypher_failfast_rejects_intermediate_reentry_match_with_no_carried_source` retargeted from the prior #1256 wording. Closure of the gap (admitting the trailing MATCH as a fresh seed pattern that cross-joins with the carried row table at runtime) remains follow-up under #1263.
34+
- **AI skills / workflow**: Updated the local `plan` skill template to reflect current pygraphistry validation commands, and added a new `review` skill with phased multi-wave review flow, adversarial confirmation, and explicit GPU validation guidance (local `docker/test-gpu-local.sh` or `dgx-spark` evidence path when no local GPU is available).
3435
- **GFQL / Cypher lowering — secondary whole-row carry survives chained reentry boundaries (#1256, #989, #999 partial)**: Two extensions to `_demote_secondary_whole_row_aliases` in `graphistry/compute/gfql/cypher/lowering.py` so multi-alias carry survives a chained reentry compile. (1) **Forwarding-item drop** — bare-identifier projection items in downstream `WITH` stages whose name is a secondary alias are dropped at compile time (the same intent as slice 4.3c, integrated into the post-#1071 active rewrite path). Without this, a forwarding pattern like `WITH a, x, friend, ...` triggered the bare-ref failfast even though the carry already lives as a hidden scalar on the reentry-source's row table. (2) **Hidden-column forwarding** — for every `(secondary_alias, prop)` reference collected from trailing clauses, the synthesized `__cypher_reentry_<S>_<X>__` hidden alias is appended as a bare passthrough item to every downstream `WITH` stage so each recursive `compile_cypher_query` call sees it as a scalar carry. Without this, the inner compile failed alias resolution on the rewritten hidden identifier (`Unknown Cypher alias '__cypher_reentry_x_id__' in RETURN clause`) once a chained reentry stage narrowed the projection scope. New positive tests cover (a) chained reentry where the trailing MATCH continues to use the same primary alias, (b) chained reentry where the source is rebound between boundaries (`MATCH (a)-[:R]->(friend) ... MATCH (friend)-[:S]->(c)`), and (c) multi-alias `DISTINCT` forwarding through a single boundary. New failfast test pins the remaining gap — a trailing MATCH that does not start from any carried alias (free-form intermediate MATCH, the LDBC SNB IC3 prefix shape) — to the existing scoped error so future closure of that lane is regression-locked. Closes the chained-reentry portion of #1256; the free-form intermediate-MATCH case remains open follow-up.
3536
- **GFQL / Cypher lowering — `ReentryPlan` IR + multi-alias whole-row carry slices (#989, #1026, #999 partial)**: Introduces an explicit `ReentryPlan` + `CarriedAlias` dataclass (`graphistry/compute/gfql/cypher/reentry_plan.py`) as the compile-time contract between a prefix `WITH` stage and the trailing `MATCH`, replacing the implicit handshake spread across tuple returns from `_bounded_reentry_carry_columns`, the `scalar_reentry_alias` / `scalar_reentry_columns` fields on `CompiledCypherExecutionExtras`, and runtime contract re-extraction in `_compiled_query_reentry_contract` (#987 step 1). Plan is exposed via `compiled_query.reentry_plan` and threaded through `_map_terminal_reentry_query` + `_attach_graph_context`. Builds three additional admit slices on top of the #1071 lift: (1) **slice 4.3a** lifts the residual single-whole-row gate at the compile site (`lowering.py`) and runtime contract (`gfql_unified.py`) so `WITH a, x` admits whenever only the trailing-MATCH source alias is referenced downstream — `ReentryPlan.aliases` records all whole-row aliases as `CarriedAlias` entries, with downstream non-source-alias bare references emitting an actionable failfast pointing to #989; (2) **slice 4.3b** adds a compile-time prefix rewrite (`_rewrite_multi_whole_row_prefix`) that turns `WITH a, x` into `WITH a, x.id AS __carry_x__id__` for every property of `x` referenced in trailing clauses (collected via `_collect_non_source_alias_property_refs` walking `WhereClause.expr_tree`, `ProjectionStage.where`, `ReturnItem`, `OrderItem`), and AST-rewrites trailing `<non_source>.<prop>` references to property access on the reentry-alias's hidden column — closes the multi-alias case of `#1026` regression-lock (`MATCH (a), (x) WITH a, x OPTIONAL MATCH (a)-->(b) RETURN x.id, b.id` now executes correctly with left-outer-join semantics, previously raised `GFQLValidationError`); (3) **slice 4.3c** drops bare carried-alias items at compile time when downstream `WITH a, x, y, collect(...)` re-projects them, so the bare-ref failfast does not false-positive on forwarding patterns. Adds 5 new positive/structure tests + 2 failfast-scope tests, retargets one prior `pytest.raises(GFQLValidationError)` regression-lock to a positive row assertion, and adds the multi-alias `OPTIONAL MATCH` regression test `test_issue_1026_multi_alias_with_optional_match_carries_secondary_property`. Cross-reentry-boundary forwarding (carry survival across `MATCH (a)-[:KNOWS]-(friend)`, the IC3 LDBC SNB shape) remains follow-up work tracked as #999 / slice 4.3d (#989, #1026, #999, #987).
3637
- **GFQL / Cypher lowering — multi-alias carry through `WITH` before MATCH re-entry (#1071)**: Lifted the residual constraint that `WITH` before a MATCH re-entry must project exactly one whole-row node alias. The local Cypher compiler now accepts patterns like `MATCH (p)-[:KNOWS]-(friend) WITH p, friend MATCH (friend)-[:IS_LOCATED_IN]->(c) RETURN p.firstName, c.name` (LDBC SNB IC1 shape). Implementation pre-rewrites the prefix `WITH` in `_compile_bounded_reentry_query`: the trailing-MATCH primary alias is preserved as the sole whole-row carry; secondary aliases are demoted to scalar property carries via synthesized `S.X AS __cypher_reentry_<S>_<X>__` items, with downstream `S.X` references rewritten to bare hidden identifiers that compose with the existing single-alias carried-scalar machinery (#1047 / #1068). Returning a secondary alias as a whole-row entity (`RETURN s`) and re-binding a secondary alias as a node variable in the trailing MATCH remain unsupported with precise errors. Two prior failfast tests retargeted to assert correct multi-alias behavior; new tests cover IC1-shape, three-alias carry, and the secondary-whole-row-return rejection.

agents/skills/plan/SKILL.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ STOP → reload plan → find last ✅ step → continue from there
7070

7171
```markdown
7272
# [Task] Plan
73-
🔴 COLD START: reload skill first → `agents/skills/plan/SKILL.md`
73+
🔴 COLD START: reload skill first → `.agents/skills/plan/SKILL.md`
7474
File: `plans/[task]/plan.md` | Date: [DATE TZ] | Branch: [branch] | PR#[N][target] | Base: [branch @ SHA]
7575

7676
## Context (read-only)
@@ -90,8 +90,10 @@ Record ALL commands verbatim — exact flags, paths, env vars. No paraphrasing.
9090
Preferred:
9191
```bash
9292
cat plans/[task]/plan.md | head -200 # reload plan
93-
PYTHONPATH=. uv run --no-project --with lark --with pytest python -m pytest [test_path] -q
93+
python -m pytest -q [test_path] # targeted tests
94+
./bin/test-minimal-lite.sh # fast repo sentinel tests
9495
./bin/ruff.sh && ./bin/typecheck.sh # lint + types
96+
./bin/test.sh [optional_test_args] # broader test pass
9597
gh pr checks [PR] --watch # CI
9698
gh run view [RUN_ID] --json status,conclusion,jobs
9799
git log --oneline -10

agents/skills/review/SKILL.md

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
---
2+
name: review
3+
description: |
4+
Structured PR review for pygraphistry. Input: PR number/branch (default current branch PR).
5+
Output: findings and convergence artifacts under plans/<task>/.
6+
Method: multi-wave, evidence-first review across spec, correctness, tests, security,
7+
code quality, DRY, concurrency, performance, architecture, operability, and conventions.
8+
---
9+
10+
# PR Review (pygraphistry)
11+
12+
## Invocation
13+
14+
```text
15+
/review [<PR-number-or-branch>] [mode=findings|pr-comments|both] [fixes=deferred|inline]
16+
```
17+
18+
Defaults:
19+
- Target: current branch PR (`gh pr view --json number,headRefName,baseRefName,url`)
20+
- `mode=findings`
21+
- `fixes=deferred`
22+
23+
`mode`:
24+
- `findings`: local artifacts only under `plans/<task>/`
25+
- `pr-comments`: draft locally, post only after explicit user confirmation in the same session
26+
- `both`: run findings then comment flow
27+
28+
`fixes`:
29+
- `deferred`: read-only review
30+
- `inline`: after each converged wave, apply confirmed `BLOCKER`/`IMPORTANT` fixes in separate commits
31+
32+
## Runtime Assumptions
33+
34+
- Run from repo root.
35+
- `gh` is authenticated (`gh auth status`).
36+
- Local branch reflects PR head (`origin/<base>...HEAD` matches intended review scope).
37+
- `plans/` is local working memory and normally gitignored.
38+
39+
## Plan-First Requirement
40+
41+
Always use the plan skill flow:
42+
43+
1. If `plans/<task>/plan.md` exists, reuse it and append a review section.
44+
2. Else reload `.agents/skills/plan/SKILL.md` and create `plans/<task>/plan.md`.
45+
3. Record PR metadata, `mode`, `fixes`, and timestamp.
46+
4. Reload plan before every step; update plan immediately after every step.
47+
48+
`<task>`: prefer `review-pr-<N>` or `<branch>-review`.
49+
50+
## Phase 0: Resolve Scope + Stack Context
51+
52+
1. Resolve PR context (number/title/url/head/base).
53+
2. Record stack context:
54+
55+
```bash
56+
gh pr view <PR> --json baseRefName,headRefName,title,body
57+
gh pr list --base <headRefName>
58+
```
59+
60+
3. If stacked, explicitly mark out-of-scope upstream/downstream work in `plan.md`.
61+
4. Set diff range reference: `origin/<base>...HEAD`.
62+
63+
## Phase 1: Research Criteria Before Findings
64+
65+
Create `plans/<task>/research/` with:
66+
- `context.md`
67+
- `policies.md`
68+
- `credentials.md`
69+
- `canvas-<dimension>.md` (only for dimensions that apply)
70+
71+
### 1a) Collect context + changed files
72+
73+
```bash
74+
gh pr view <PR> --json number,title,headRefName,baseRefName,url,body
75+
git fetch origin <baseRefName> <headRefName>
76+
git diff --name-only origin/<base>...HEAD
77+
git log --oneline origin/<base>..HEAD
78+
```
79+
80+
Record linked issue/spec refs from PR body and summarize PR intent.
81+
82+
### 1b) Discover applicable repo rules
83+
84+
Always inspect:
85+
- `AGENTS.md`, `DEVELOP.md`, `ARCHITECTURE.md`, `CONTRIBUTING.md`, `README.md`, `CHANGELOG.md`
86+
- `docs/source/**` relevant to changed areas
87+
- CI/workflow context under `.github/workflows/` when checks/publish behavior are touched
88+
- Tooling configs (`pyproject.toml`, `mypy.ini`, `pytest.ini`) and helper scripts in `bin/`
89+
90+
Walk up from each changed file to repo root and include nearby `.md` guidance.
91+
92+
### 1c) Credentials gate (always, early)
93+
94+
Run before wave analysis:
95+
96+
```bash
97+
git diff origin/<base>...HEAD -- '*.env*' 'custom.env*' 'docker-compose*.y*ml' '*.conf' '*.config.*' | \
98+
grep -iE '(api_key|secret|token|password|bearer|authorization|aws_|azure_|openai_|anthropic_).{0,4}=' || \
99+
echo "[clean] no obvious credential strings"
100+
101+
git diff origin/<base>...HEAD | \
102+
grep -oE '(sk-[A-Za-z0-9]{20,}|AKIA[0-9A-Z]{16}|Bearer\s+[A-Za-z0-9._-]{20,})' | head
103+
```
104+
105+
If any likely secret is found, raise `BLOCKER` and stop until surfaced.
106+
107+
## Phase 2: Multi-Wave Review Loop
108+
109+
Run waves until 2 consecutive waves show no significant advance.
110+
111+
Severity:
112+
- `BLOCKER`: merge must not proceed
113+
- `IMPORTANT`: should fix before merge
114+
- `SUGGESTION`: non-blocking improvement
115+
116+
Suggested folder layout:
117+
118+
```text
119+
plans/<task>/waves/wave-<N>/
120+
<dimension>/findings-<file-slug>.md
121+
<dimension>/report.md
122+
adversarial/<finding-id>.md
123+
adversarial/report.md
124+
wave-report.md
125+
```
126+
127+
### Wave gate (start of each wave)
128+
129+
1. Reload `.agents/skills/plan/SKILL.md`
130+
2. Reload `plans/<task>/plan.md`
131+
3. Confirm current diff SHA and wave number
132+
4. If prior wave had substantive findings, perform both:
133+
- targeted amplification pass on prior findings/fixes
134+
- clean-slate full pass on current diff
135+
136+
### Dimensions
137+
138+
Apply only relevant dimensions per PR:
139+
- Spec conformance
140+
- Correctness
141+
- Testing
142+
- Security
143+
- Code quality
144+
- DRY / reuse
145+
- Concurrency / parallelism
146+
- Performance
147+
- Architecture
148+
- Operability
149+
- Repo conventions
150+
151+
Guidance:
152+
- Keep dimensions independent (avoid blended "general review" prompts).
153+
- For file-heavy diffs, parallelize by `(dimension, file)` and aggregate.
154+
- Verify pre-existing patterns are not misreported as regressions:
155+
156+
```bash
157+
git show origin/<base>:<file>
158+
```
159+
160+
### Pygraphistry-specific review checks
161+
162+
- Test coverage should mirror changed areas in `graphistry/tests/**`.
163+
- Prefer behavioral tests over implementation-detail assertions.
164+
- Run focused validation before escalating severity when feasible:
165+
166+
```bash
167+
python -m pytest -q [targeted_test]
168+
./bin/ruff.sh [changed_path_or_pkg]
169+
./bin/typecheck.sh [changed_path_or_pkg]
170+
```
171+
172+
- For GPU-affecting PRs, require GPU-path validation evidence:
173+
- Local GPU path: `cd docker && ./test-gpu-local.sh [targeted_test_or_path]`
174+
- No local GPU available: run equivalent GPU validation on `dgx-spark` and record exact command + output artifact path in wave evidence.
175+
- If startup/runtime claims are made, verify entrypoints/scripts in `bin/` and workflow behavior.
176+
- For docs-only PRs, prioritize spec/documentation accuracy and navigability (toctree links, anchors, cross-refs).
177+
178+
### Per-file findings format
179+
180+
```markdown
181+
## <file>
182+
### Findings
183+
- [BLOCKER] <description> — <file>:<line>
184+
- [IMPORTANT] <description> — <file>:<line>
185+
- [SUGGESTION] <description> — <file>:<line>
186+
187+
### Evidence
188+
- <diff/code/test evidence with file:line references>
189+
```
190+
191+
### Adversarial pass (required)
192+
193+
For each finding, attempt disproof and write verdict:
194+
- `CONFIRMED`
195+
- `DOWNGRADED`
196+
- `REJECTED`
197+
198+
Only `CONFIRMED` and `DOWNGRADED` findings carry forward.
199+
200+
Each adversarial file should include:
201+
- original claim
202+
- disprove attempt
203+
- verdict
204+
- concrete proof (`file:line`, diff excerpt, or test output)
205+
206+
### Wave summary
207+
208+
Each wave writes `wave-report.md` with:
209+
- inputs (diff SHA/range, dimensions, files)
210+
- post-adversarial counts by severity
211+
- fixes applied (if `fixes=inline`)
212+
- finding resolution ledger (`FIXED`, `UNFIXED`, `DEFERRED`)
213+
- convergence signal
214+
215+
## Phase 3: Convergence Report
216+
217+
Write `plans/<task>/final-report.md` with:
218+
- summary (waves run, convergence status)
219+
- per-wave table (new findings + advance signal)
220+
- resolution matrix (status of non-rejected findings)
221+
- sections: `Blockers`, `Important`, `Suggestions`, `Rejected/False Positives`, `Methodology`
222+
223+
## Phase 4: Output Behavior
224+
225+
`findings` mode:
226+
- Keep full artifacts in `plans/<task>/`
227+
- Provide concise terminal summary with:
228+
- target + mode + fixes policy
229+
- waves + convergence status
230+
- per-wave counts
231+
- `FIXED` vs `UNFIXED`/`DEFERRED`
232+
- path to `final-report.md`
233+
234+
`pr-comments` mode:
235+
1. Draft local comments in `plans/<task>/comment-drafts/`
236+
2. Ask for explicit confirmation
237+
3. Post inline + top-level comments with `gh`
238+
239+
`both` mode:
240+
- Complete findings artifacts first, then comment flow.
241+
242+
## Guardrails
243+
244+
- `fixes=deferred`: read-only; do not edit source files.
245+
- `fixes=inline`: edit only files in PR diff and only for confirmed `BLOCKER`/`IMPORTANT` findings.
246+
- Never skip lint/tests when applying inline fixes.
247+
- Never force-push from review flow.
248+
- Never post PR comments without explicit user confirmation.
249+
- Always run credentials scan in Phase 1 before waves.
250+
- Prefer exact file/line evidence over broad statements.
251+
- If uncertain whether issue is new, compare against `origin/<base>` before filing.

0 commit comments

Comments
 (0)