11# PPL Bugfix Harness
22
3- ---
4-
5- ## Phase 0: Triage & Classification
3+ ## Phase 0: Triage
64
7- ### 0.0 Load Issue Context
5+ ### 0.1 Load & Reproduce
86
97``` bash
108gh issue view < issue_number> --repo opensearch-project/sql
119```
1210
13- Read the issue title, description, and any reproduction steps before proceeding.
14-
15- ### 0.1 Reproduce the Bug
16-
17- ``` bash
18- # Write a failing integration test, or run an existing one
19- ./gradlew :integ-test:integTest -Dtests.class=" *<TestClass>" -Dtests.method=" <testMethod>"
20- ```
11+ Write a failing test or run an existing one to reproduce the bug on ` main ` .
2112
22- ** If the bug does not reproduce on latest ` main ` ** — meaning the test scenario runs but produces correct results instead of the reported error — the bug is already fixed. This is distinct from "can't set up the scenario" (which is an infra issue).
13+ If the bug ** does not reproduce** ( correct results, not infra failure):
2314
2415| Finding | Action |
2516| ---------| --------|
26- | Already fixed by another commit | ` gh issue comment ` with fixing commit/PR, then ` gh issue close ` |
27- | Only on older version | ` gh issue comment ` with version where it's fixed, then ` gh issue close ` |
28- | Intermittent / environment-specific | Label ` flaky ` or ` needs-info ` , do NOT close |
29- | Insufficient info to reproduce | Comment asking for repro steps, label ` needs-info ` |
17+ | Already fixed | ` gh issue comment ` + ` gh issue close ` |
18+ | Older version only | ` gh issue comment ` + ` gh issue close ` |
19+ | Intermittent | Label ` flaky ` or ` needs-info ` , do NOT close |
20+ | Can't reproduce | Comment asking for repro steps, label ` needs-info ` |
3021
31- ** HARD STOP** : Do NOT proceed to Phase 1, 2, or 3. Do NOT write tests. Do NOT create a PR. Report back with the finding and the action taken (comment/close). Your job is done .
22+ ** HARD STOP** — do not proceed. Report back.
3223
33- ### 0.2 Classify and Route
24+ ### 0.2 Classify
3425
35- Identify the bug layer, then use the routing table to determine the fix path and required tests:
26+ Identify the bug layer (Grammar, AST/Functions, Type System, Optimizer, Execution, DI/Resource) and record it. Consult ` .claude/harness/ppl-bugfix-reference.md ` for fix- path-specific guidance if needed.
3627
37- | Layer | Typical Symptoms | Fix Path | Required Tests |
38- | -------| -----------------| ----------| ---------------|
39- | ** Grammar/Parser** | ` SyntaxCheckException ` , unrecognized syntax | Path A | AstBuilderTest |
40- | ** AST/Functions** | Parses OK but AST wrong, function output wrong | Path B | CalcitePPL\* Test + IT + YAML |
41- | ** Semantic Analysis** | ` SemanticCheckException ` , type mismatch | Path C | Dedicated UT + CalcitePPLIT + YAML |
42- | ** Type System** | Field type lost, implicit conversion errors | Path C | Dedicated UT + CalcitePPLIT + YAML |
43- | ** Optimizer** | Plan bloat, predicate pushdown failure | Path D | CalcitePPL\* Test + CalcitePPLIT + NoPushdownIT + YAML |
44- | ** Execution** | Wrong results, OOM, memory leaks | Path E | IT + YAML |
45- | ** DI / Resource** | NPE, extensions not loaded, long-running OOM | Path E | IT + YAML |
28+ ### 0.3 Guardrails
4629
47- Record before proceeding:
30+ Stop and report back if:
31+ - Root cause unclear after reading 15+ source files
32+ - Fix breaks 5+ unrelated tests
33+ - Same build error 3 times in a row
4834
49- ```
50- Bug Layer: <from table above>
51- Fix Path: <A / B / C / D / E>
52- Issue: #XXXX
53- ```
54-
55- ### 0.3 Execution Flow
56-
57- Phases interleave TDD-style — write a failing test first, then fix, then complete test coverage:
35+ ### 0.4 Execution Flow
5836
5937```
60- Phase 0: Triage → Classify → Route
61- → Phase 2 (partial): Write FAILING test reproducing the bug
62- → Phase 1: Implement fix via routed Path
63- → Phase 2 (remaining): Happy path, edge cases, YAML REST test
64- → Phase 3: Verify → Commit → PR → Decision Log
38+ Triage → Write FAILING test → Fix → Remaining tests → Verify → Commit → PR → Decision Log → Completion Gate
6539```
6640
6741---
6842
69- ## Phase 1: Fix Implementation
70-
71- ### Path A — Grammar / Parser
72-
73- 1 . ** Update grammar files** (must stay in sync):
74- - ` language-grammar/src/main/antlr4/OpenSearchPPLParser.g4 ` (primary)
75- - ` ppl/src/main/antlr/OpenSearchPPLParser.g4 `
76- - ` async-query-core/src/main/antlr/OpenSearchPPLParser.g4 ` (if applicable)
77- 2 . ** Regenerate** : ` ./gradlew generateGrammarSource `
78- 3 . ** Update AstBuilder** : ` ppl/.../parser/AstBuilder.java ` — modify ` visit*() ` to match new rule
79- 4 . ** Test** : ` AstBuilderTest ` using ` assertEqual(pplQuery, expectedAstNode) ` pattern
80-
81- > Ref: ` 734394d ` — fixed ` renameClasue ` → ` renameClause ` across 3 grammars + AstBuilder
82-
83- ### Path B — AST / Function Implementation
84-
85- 1 . ** Locate** : AST nodes in ` core/.../ast/tree/ ` , functions in ` core/.../expression/function/ ` or ` PPLBuiltinOperators `
86- 2 . ** Fix** : Watch Visitor pattern — changes may need syncing to ` AbstractNodeVisitor ` , ` Analyzer ` , ` CalciteRelNodeVisitor ` , ` PPLQueryDataAnonymizer `
87- 3 . ** Test** : ` verifyLogical() ` , ` verifyPPLToSparkSQL() ` , ` verifyResult() `
88-
89- > Ref: ` 26674f9 ` — rex nested capture group fix, ordinal index → named group extraction
90-
91- ### Path C — Type System / Semantic Analysis
92-
93- 1 . ** Locate** : ` OpenSearchTypeFactory.java ` (Calcite type factory), ` Analyzer.java ` / ` ExpressionAnalyzer.java `
94- 2 . ** Fix** : Preserve nullable semantics when overriding Calcite methods; protect UDT from ` leastRestrictive() ` downgrade
95- 3 . ** Test (critical)** : Cover type preservation, nullable propagation, fallback to parent, mixed types — every edge case
96-
97- > Ref: ` ada2e34 ` — UNION lost timestamp UDT, fixed ` leastRestrictive() ` , added 8 UTs + 4 ITs
98-
99- ### Path D — Optimizer / Predicate Pushdown
100-
101- 1 . ** Locate** : ` PredicateAnalyzer.java ` , ` LogicalPlanOptimizer ` , ` QueryService.java `
102- 2 . ** Fix** : Watch ` nullAs ` semantics (TRUE/FALSE/UNKNOWN); for plan bloat consider Calcite rules like ` FilterMergeRule `
103- 3 . ** Verify** : ` EXPLAIN ` output comparison + integration test result correctness
43+ ## Phase 1: Fix
10444
105- > Ref: ` b4df010 ` — ` isnotnull() ` not pushed down with multiple ` != ` ; ` e045d15 ` — multi-filter OOM, inserted ` FilterMergeRule `
106-
107- ### Path E — Execution / Resource Management
108-
109- 1 . ** Locate** : DQE operators in ` dqe/ ` , ` OpenSearchExecutionEngine.java ` , ` SQLPlugin.java ` , ` OpenSearchPluginModule.java `
110- 2 . ** Common patterns** :
111-
112- | Problem | Fix | Example |
113- | ---------| -----| ---------|
114- | Cache key collision | ` IndexReader.CacheHelper.getKey() ` | ` 97d5d26 ` |
115- | Memory leak (no eviction) | Close listener + upper bound | ` 97d5d26 ` |
116- | Unbounded growth | ` MAX_CAPACITY ` check, throw with user guidance | ` f024b4f ` |
117- | Non-singleton repeated registration | ` @Singleton ` ; ` put ` instead of ` compute/append ` | ` 90393bf ` |
118- | DI not injected | Holder class → Guice → constructor injection | ` f6be830 ` |
119-
120- > Ref: ` f024b4f ` — ` LongOpenHashSet ` capacity 1024→8 (8KB→64B per group), 8M cap on 5 HashMap variants
45+ Find and fix the root cause. Consult ` .claude/harness/ppl-bugfix-reference.md ` for path-specific patterns and examples.
12146
12247---
12348
124- ## Phase 2: Writing Tests
125-
126- ### 2.1 Test Templates
127-
128- ** Unit test** (extend ` CalcitePPLAbstractTest ` ):
129- ``` java
130- public class CalcitePPLYourFixTest extends CalcitePPLAbstractTest {
131- public CalcitePPLYourFixTest () {
132- super (CalciteAssert . SchemaSpec . SCOTT_WITH_TEMPORAL );
133- }
134-
135- @Before
136- public void init () {
137- doReturn(true ). when(settings)
138- .getSettingValue(Settings . Key . CALCITE_ENGINE_ENABLED );
139- }
140-
141- @Test
142- public void testBugScenario () {
143- verifyLogical(" source=EMP | where SAL > 1000" ,
144- " LogicalFilter(condition=[>($5, 1000)])\n "
145- + " LogicalTableScan(table=[[scott, EMP]])\n " );
146- }
147- }
148- ```
149-
150- ** Integration test** (extend ` CalcitePPLIT ` ):
151- ``` java
152- public class CalcitePPLYourFixIT extends CalcitePPLIT {
153- @Override
154- public void init () throws IOException {
155- super . init();
156- enableCalcite();
157- }
158-
159- @Test
160- public void testBugFixEndToEnd () throws IOException {
161- JSONObject result = executeQuery(" source=<index> | <your PPL>" );
162- verifySchema(result, schema(" field" , " alias" , " type" ));
163- verifyDataRows(result, rows(" expected_value_1" ), rows(" expected_value_2" ));
164- }
165- }
166- ```
167-
168- ** YAML REST test** — place at ` integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/<ISSUE>.yml ` , auto-discovered by ` RestHandlerClientYamlTestSuiteIT ` :
169- ``` yaml
170- setup :
171- - do :
172- indices.create :
173- index : test_issue_<ISSUE>
174- body :
175- settings : { number_of_shards: 1, number_of_replicas: 0 }
176- mappings : { properties: { <field>: { type: <type> } } }
177- - do :
178- query.settings :
179- body : { transient: { plugins.calcite.enabled: true } }
180- ---
181- teardown :
182- - do :
183- query.settings :
184- body : { transient: { plugins.calcite.enabled: false } }
185- ---
186- " <Bug description for issue #ISSUE> " :
187- - skip : { features: [headers, allowed_warnings] }
188- - do :
189- bulk : { index: test_issue_<ISSUE>, refresh: true, body: ['{"index": {}}', '{"<field>": "<value>"}'] }
190- - do :
191- headers : { Content-Type: 'application/json' }
192- ppl : { body: { query: "source=test_issue_<ISSUE> | <your PPL>" } }
193- - match : { total: <expected> }
194- - length : { datarows: <expected> }
195- ` ` `
49+ ## Phase 2: Tests
19650
197- ### 2.2 Test Checklist
51+ Consult ` .claude/harness/ppl-bugfix-reference.md ` for test templates.
19852
199- - [ ] Failing test reproducing the original bug (write FIRST, before fixing)
200- - [ ] Happy path after fix
201- - [ ] Edge cases (null, empty, extreme volumes)
202- - [ ] YAML REST test (named by issue number)
203- - [ ] Optimizer bugs: both pushdown enabled and disabled
204- - [ ] Type system bugs: nullable / non-nullable combinations
205- - [ ] New AST nodes: update ` PPLQueryDataAnonymizerTest`
53+ Required deliverables:
54+ - Failing test reproducing the bug (written BEFORE the fix)
55+ - Unit tests covering happy path and edge cases
56+ - Integration test (` *IT.java ` extending ` CalcitePPLIT ` )
57+ - YAML REST test at ` integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/<ISSUE>.yml `
20658
20759---
20860
209- # # Phase 3: Verification & Submission
61+ ## Phase 3: Verify & Submit
21062
211- # ## 3.1 Local Verification
63+ ### 3.1 Verify
21264
21365``` bash
214- ./gradlew spotlessApply # 1. Format
215- ./gradlew :<module>:test --tests "<TestClass>" # 2. Affected module UT
216- ./gradlew test # 3. Full UT regression
217- ./gradlew :integ-test:integTest -Dtests.class="*<YourIT>" # 4. New IT
218- ./gradlew :integ-test:integTest # 5. Full IT regression
219- ./gradlew :integ-test:yamlRestTest # 6. YAML REST tests
220- # If grammar modified:
221- ./gradlew generateGrammarSource && ./gradlew :ppl:test # 7. Parser tests
66+ ./gradlew spotlessApply
67+ ./gradlew :< module> :test --tests " <TestClass>"
68+ ./gradlew test
69+ ./gradlew :integ-test:integTest -Dtests.class=" *<YourIT>"
22270```
22371
224- # ## 3.2 Commit & Push
72+ Run ` ./gradlew :integ-test:yamlRestTest ` if YAML tests were added. Run ` ./gradlew generateGrammarSource && ./gradlew :ppl:test ` if grammar was modified.
73+
74+ ### 3.2 Commit & PR
22575
22676``` bash
227- # Commit with DCO sign-off — do NOT add Co-Authored-By lines
22877git add < changed_files>
22978git commit -s -m " [BugFix] Fix <description> (#<issue_number>)"
230-
231- # Sync with main (merge, not rebase — PRs use squash-merge)
23279git fetch origin && git merge origin/main
233- # Always re-run tests after merge — even a trivial merge can introduce regressions
23480./gradlew test && ./gradlew :integ-test:integTest -Dtests.class=" *<YourIT>"
23581
236- # Resolve fork remote — find or add it
237- # 1. Check if a fork remote already exists (not opensearch-project/sql)
238- git remote -v
239- # 2. If only origin (upstream) exists, add the fork:
82+ # Resolve fork remote (check git remote -v; add if missing)
24083git remote add fork https://github.com/< fork_owner> /sql.git
241- # Use the git user name to infer the fork owner, or fall back to "qianheng-aws"
242-
243- # Push
24484git push -u fork < branch_name>
24585```
24686
247- # ## 3.3 Create Draft PR
87+ Do NOT add Co-Authored-By lines. Use the git user name to infer the fork owner, or fall back to "qianheng-aws".
24888
24989``` bash
250- gh pr create --draft --repo opensearch-project/sql --title "[BugFix] Fix <description> (#<issue_number>)" --body "$(cat <<'EOF'
90+ gh pr create --draft --repo opensearch-project/sql \
91+ --title " [BugFix] Fix <description> (#<issue_number>)" \
92+ --body " $( cat << 'EOF '
25193### Description
25294<Brief description of fix and root cause>
25395
@@ -256,19 +98,17 @@ Resolves #<issue_number>
25698
25799### Check List
258100- [x] New functionality includes testing
259- - [x] Javadoc added for new public API
260101- [x] Commits signed per DCO (`-s`)
261102- [x] `spotlessCheck` passed
262103- [x] Unit tests passed
263104- [x] Integration tests passed
264- - [ ] Grammar changes : all three `.g4` files synced (if applicable)
265105EOF
266106) "
267107```
268108
269- ### 3.4 Persist Decision Context
109+ ### 3.3 Decision Log
270110
271- Before the subagent completes, persist the * why * behind key decisions — the PR diff only shows the * what * . The ** PR comment is the single source of truth ** — it survives across sessions, GA runs, and is visible to both human reviewers and follow-up agents.
111+ Post as a PR comment:
272112
273113``` bash
274114gh pr comment < pr_number> --body " $( cat << 'EOF '
@@ -284,48 +124,28 @@ EOF
284124
285125---
286126
287- ## Phase 4: Retrospective
127+ ## Completion Gate
288128
289- After each bugfix, check if the harness itself needs updating :
129+ Do NOT report "done" until every item below is checked. List each in your final report :
290130
291- - ** Template wrong** (API name, assertion field)? → Fix in Phase 2 templates
292- - ** New bug pattern** not covered? → Add fix path in Phase 1 or symptom in Quick Reference
293- - ** Verification gap** caused rework? → Add step to 3.1 or 2.2 checklist
294- - ** Representative fix** ? → Add row to Case Index below
131+ - [ ] ** Unit tests** : New test class or methods
132+ - [ ] ** Integration test** : New ` *IT.java ` test
133+ - [ ] ** YAML REST test** : ` issues/<ISSUE>.yml `
134+ - [ ] ** spotlessApply** : Ran successfully
135+ - [ ] ** Tests pass** : Affected modules
136+ - [ ] ** Commit** : DCO sign-off, ` [BugFix] ` prefix, no Co-Authored-By
137+ - [ ] ** Draft PR** : ` --draft ` , body contains ` Resolves #<issue> `
138+ - [ ] ** Decision Log** : PR comment posted
295139
296- Include harness improvements in the same PR as the bugfix — they share the same review context .
140+ If any item is blocked, report which and why .
297141
298142---
299143
300- ## Quick Reference: Symptom → Fix Path
301-
302- ```
303- SyntaxCheckException / unrecognized syntax → Path A: Grammar/Parser
304- SemanticCheckException / type mismatch → Path C: Type System / Analysis
305- Field type wrong (timestamp→string) → Path C: check leastRestrictive / coercion
306- EXPLAIN shows predicate not pushed down → Path D: Optimizer / Pushdown
307- Multi-condition query: missing/extra rows → Path D: PredicateAnalyzer nullAs handling
308- OOM / memory growth over time → Path E: singletons, cache eviction, bounds
309- NPE in Transport layer → Path E: DI / Guice injection chain
310- "node must be boolean/number, found XXX" → Path B: OpenSearchJsonContent parse*Value
311- Regex/function extraction offset → Path B: ordinal vs named references
312- ```
144+ ## Phase 4: Retrospective
313145
314- ---
146+ - [ ] Symptom in Quick Reference? Add if missing.
147+ - [ ] Classification correct? Fix routing if misleading.
148+ - [ ] Test template worked as-is? Fix if broken.
149+ - [ ] New pattern? Add to Case Index.
315150
316- ## Appendix: Case Index
317-
318- | Commit | Bug | Layer | Tests |
319- | --------| -----| -------| -------|
320- | ` ada2e34 ` | UNION loses UDT type | Type System | 8 UT + 4 IT |
321- | ` 26674f9 ` | rex capture group index shift | AST/Functions | Multiple UTs |
322- | ` b4df010 ` | isnotnull not pushed down with != | Optimizer | 2 UT + IT |
323- | ` e045d15 ` | Multiple filters OOM | Optimizer | 26 output updates |
324- | ` f024b4f ` | High-cardinality GROUP BY OOM | Execution | Benchmark |
325- | ` 97d5d26 ` | OrdinalMap cache collision + leak | Execution | — |
326- | ` 90393bf ` | Non-singleton ExecutionEngine leak | Resource | — |
327- | ` f6be830 ` | Transport extensions not injected | DI | — |
328- | ` 734394d ` | Grammar rule typo | Grammar | — |
329- | ` 246ed0d ` | Float precision flaky test | Test Infra | — |
330- | ` d56b8fa ` | Wildcard index type conflict | Value Parsing | 3 UT + 1 IT + 1 YAML |
331- | ` 5a78b78 ` | Boolean coercion from numeric in wildcard queries | Value Parsing | 3 UT + 1 IT + 1 YAML |
151+ Include harness improvements in the same PR.
0 commit comments