|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +OpenSearch SQL plugin — enables SQL and PPL (Piped Processing Language) queries against OpenSearch. This is a multi-module Gradle project (Java 21) that functions as an OpenSearch plugin. |
| 8 | + |
| 9 | +## Build Commands |
| 10 | + |
| 11 | +```bash |
| 12 | +# Full build (compiles, tests, checks) |
| 13 | +./gradlew build |
| 14 | + |
| 15 | +# Fast build (skip integration tests) |
| 16 | +./gradlew build -x integTest |
| 17 | + |
| 18 | +# Build specific module |
| 19 | +./gradlew :core:build |
| 20 | +./gradlew :sql:build |
| 21 | +./gradlew :ppl:build |
| 22 | + |
| 23 | +# Run unit tests only |
| 24 | +./gradlew test |
| 25 | + |
| 26 | +# Run a single unit test class |
| 27 | +./gradlew :core:test --tests "org.opensearch.sql.analysis.AnalyzerTest" |
| 28 | + |
| 29 | +# Run integration tests |
| 30 | +./gradlew :integ-test:integTest |
| 31 | + |
| 32 | +# Run a single integration test |
| 33 | +./gradlew :integ-test:integTest -Dtests.class="*QueryIT" |
| 34 | + |
| 35 | +# Skip Prometheus if unavailable |
| 36 | +./gradlew :integ-test:integTest -DignorePrometheus |
| 37 | + |
| 38 | +# Code formatting |
| 39 | +./gradlew spotlessCheck # Check |
| 40 | +./gradlew spotlessApply # Auto-fix |
| 41 | + |
| 42 | +# Regenerate ANTLR parsers from grammar files |
| 43 | +./gradlew generateGrammarSource |
| 44 | + |
| 45 | +# Run plugin locally with OpenSearch |
| 46 | +./gradlew :opensearch-sql-plugin:run |
| 47 | +./gradlew :opensearch-sql-plugin:run -DdebugJVM # With remote debug on port 5005 |
| 48 | + |
| 49 | +# Run doctests |
| 50 | +./gradlew :doctest:doctest |
| 51 | +./gradlew :doctest:doctest -Pdocs=search # Single file |
| 52 | +``` |
| 53 | + |
| 54 | +## Code Style |
| 55 | + |
| 56 | +- **Google Java Format** enforced via Spotless (2-space indent, 100 char line limit) |
| 57 | +- **Lombok** is used throughout — `@Getter`, `@Builder`, `@RequiredArgsConstructor`, etc. |
| 58 | +- **License header** required on all Java files (Apache 2.0). Missing headers fail the build. |
| 59 | +- Pre-commit hooks run `spotlessApply` automatically |
| 60 | + |
| 61 | +## Architecture |
| 62 | + |
| 63 | +### Query Pipeline |
| 64 | + |
| 65 | +``` |
| 66 | +User Query (SQL/PPL) |
| 67 | + → Parsing (ANTLR) — produces parse tree |
| 68 | + → AST Construction (AstBuilder visitor) — produces UnresolvedPlan |
| 69 | + → Semantic Analysis (Analyzer) — resolves symbols/types → LogicalPlan |
| 70 | + → Planning (Planner + LogicalPlanOptimizer) — produces PhysicalPlan |
| 71 | + → Execution (ExecutionEngine) — streams ExprValue results |
| 72 | + → Response Formatting (ResponseFormatter — JSON/CSV/JDBC) |
| 73 | +``` |
| 74 | + |
| 75 | +### Module Dependency Graph |
| 76 | + |
| 77 | +``` |
| 78 | +plugin (OpenSearch plugin entry point, Guice DI wiring) |
| 79 | + ├── sql — SQL parsing (ANTLR → AST via SQLSyntaxParser/AstBuilder) |
| 80 | + ├── ppl — PPL parsing (ANTLR → AST via PPLSyntaxParser/AstBuilder) |
| 81 | + ├── core — Central module: Analyzer, Planner, ExecutionEngine interfaces, |
| 82 | + │ AST/LogicalPlan/PhysicalPlan node types, expression system, type system |
| 83 | + ├── opensearch — OpenSearch storage engine, execution engine, client |
| 84 | + ├── protocol — Response formatters (JSON, CSV, JDBC, YAML) |
| 85 | + ├── common — Shared settings and utilities |
| 86 | + ├── legacy — V1 SQL engine (backward compatibility fallback) |
| 87 | + ├── datasources — Multi-datasource support (Glue, Security Lake, Prometheus) |
| 88 | + ├── async-query / async-query-core — Spark-based async query execution |
| 89 | + ├── direct-query / direct-query-core — Direct external datasource queries |
| 90 | + └── language-grammar — Centralized ANTLR .g4 grammar files |
| 91 | +``` |
| 92 | + |
| 93 | +`core` has no dependency on other modules. `sql` and `ppl` depend on `core` and `language-grammar`. `opensearch` implements `core` interfaces. |
| 94 | + |
| 95 | +### Key Source Locations |
| 96 | + |
| 97 | +| Area | Key Files | |
| 98 | +|------|-----------| |
| 99 | +| Plugin entry | `plugin/.../SQLPlugin.java`, `plugin/.../OpenSearchPluginModule.java` | |
| 100 | +| SQL parsing | `sql/.../sql/parser/AstBuilder.java`, `sql/.../SQLService.java` | |
| 101 | +| PPL parsing | `ppl/.../ppl/parser/AstBuilder.java`, `ppl/.../PPLService.java` | |
| 102 | +| ANTLR grammars | `language-grammar/src/main/antlr4/` (OpenSearchSQLParser.g4, OpenSearchPPLParser.g4) | |
| 103 | +| Analysis | `core/.../analysis/Analyzer.java`, `core/.../analysis/ExpressionAnalyzer.java` | |
| 104 | +| Planning | `core/.../planner/Planner.java`, `core/.../planner/logical/LogicalPlan.java` | |
| 105 | +| Execution | `core/.../executor/ExecutionEngine.java`, `opensearch/.../OpenSearchExecutionEngine.java` | |
| 106 | +| Storage | `opensearch/.../storage/OpenSearchStorageEngine.java` | |
| 107 | +| Query orchestration | `core/.../executor/QueryService.java`, `core/.../executor/QueryPlanFactory.java` | |
| 108 | + |
| 109 | +### Core Abstractions |
| 110 | + |
| 111 | +- **`Node<T>`** — Base AST node with visitor pattern support |
| 112 | +- **`UnresolvedPlan`** / **`LogicalPlan`** / **`PhysicalPlan`** — Query plan hierarchy (unresolved → logical → physical) |
| 113 | +- **`Expression`** — Resolved expression with `valueOf()` and `type()` |
| 114 | +- **`ExprValue`** — Runtime value types (ExprIntegerValue, ExprStringValue, etc.) |
| 115 | +- **`ExprType`** — Type system (DATE, TIMESTAMP, DOUBLE, STRUCT, etc.) |
| 116 | +- **`StorageEngine`** / **`Table`** — Pluggable storage abstraction |
| 117 | +- **`ExecutionEngine`** — Executes physical plans, returns QueryResponse |
| 118 | + |
| 119 | +### Design Patterns |
| 120 | + |
| 121 | +- **Visitor pattern** used pervasively: `AbstractNodeVisitor`, `LogicalPlanNodeVisitor`, `PhysicalPlanNodeVisitor`, `ExpressionNodeVisitor` |
| 122 | +- **PhysicalPlan** implements `Iterator<ExprValue>` for streaming execution |
| 123 | +- **Guice** dependency injection in `OpenSearchPluginModule` |
| 124 | +- Storage engines implement `Table.optimize()` and `Table.implement()` for push-down optimization |
| 125 | + |
| 126 | +## Adding New PPL Commands |
| 127 | + |
| 128 | +Follow the checklist in `docs/dev/ppl-commands.md`: |
| 129 | +1. Update lexer/parser grammars (OpenSearchPPLLexer.g4, OpenSearchPPLParser.g4) |
| 130 | +2. Add AST node under `org.opensearch.sql.ast.tree` |
| 131 | +3. Add `visit*` method in `AbstractNodeVisitor`, override in `Analyzer`, `CalciteRelNodeVisitor`, `PPLQueryDataAnonymizer` |
| 132 | +4. Unit tests extending `CalcitePPLAbstractTest` (include `verifyLogical()` and `verifyPPLToSparkSQL()`) |
| 133 | +5. Integration tests extending `PPLIntegTestCase` |
| 134 | +6. Add user docs under `docs/user/ppl/cmd/` |
| 135 | + |
| 136 | +## Adding New PPL Functions |
| 137 | + |
| 138 | +Follow `docs/dev/ppl-functions.md`. Three approaches: |
| 139 | +1. Reuse existing Calcite operators from `SqlStdOperatorTable`/`SqlLibraryOperators` |
| 140 | +2. Adapt static Java methods via `UserDefinedFunctionUtils.adapt*ToUDF` |
| 141 | +3. Implement `ImplementorUDF` interface from scratch, register in `PPLBuiltinOperators` |
| 142 | + |
| 143 | +## Calcite Engine |
| 144 | + |
| 145 | +The project has two execution engines: the legacy **v2 engine** and the newer **Calcite engine** (Apache Calcite-based). Calcite is toggled via `plugins.calcite.enabled` setting (default: off in production, toggled per-test in integration tests). |
| 146 | + |
| 147 | +- In integration tests, call `enableCalcite()` in `init()` to activate the Calcite path |
| 148 | +- Some features (e.g., graphLookup) require pushdown optimization — use `enabledOnlyWhenPushdownIsEnabled()` to skip tests in the `CalciteNoPushdownIT` suite |
| 149 | +- `CalciteNoPushdownIT` is a JUnit `@Suite` that re-runs Calcite test classes with pushdown disabled; add new test classes to its `@Suite.SuiteClasses` list |
| 150 | + |
| 151 | +## Integration Tests |
| 152 | + |
| 153 | +Located in `integ-test/src/test/java/`. Organized by area: `sql/`, `ppl/`, `calcite/`, `legacy/`, `jdbc/`, `datasource/`, `asyncquery/`, `security/`. Uses OpenSearch test framework (in-memory cluster per test class). YAML REST tests in `integ-test/src/yamlRestTest/resources/rest-api-spec/test/`. |
| 154 | + |
| 155 | +Key base classes: |
| 156 | +- `PPLIntegTestCase` — base for PPL integration tests (v2 engine) |
| 157 | +- `CalcitePPLIT` — base for Calcite PPL integration tests (calls `enableCalcite()`) |
| 158 | +- `CalcitePPLAbstractTest` — base for Calcite PPL unit tests (`verifyLogical()`, `verifyPPLToSparkSQL()`) |
| 159 | +- `CalciteExplainIT` — explain plan tests using YAML expected output files in `integ-test/src/test/resources/expectedOutput/calcite/` |
0 commit comments