Skip to content

Commit d8c1b6e

Browse files
qianheng-awsahkcs
authored andcommitted
Init CLAUDE.md (opensearch-project#5259)
Signed-off-by: Heng Qian <qianheng@amazon.com>
1 parent ccd1665 commit d8c1b6e

1 file changed

Lines changed: 159 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
OpenSearch SQL plugin — enables SQL and PPL (Piped Processing Language) queries against OpenSearch. This is a multi-module Gradle project (Java 21) that functions as an OpenSearch plugin.
8+
9+
## Build Commands
10+
11+
```bash
12+
# Full build (compiles, tests, checks)
13+
./gradlew build
14+
15+
# Fast build (skip integration tests)
16+
./gradlew build -x integTest
17+
18+
# Build specific module
19+
./gradlew :core:build
20+
./gradlew :sql:build
21+
./gradlew :ppl:build
22+
23+
# Run unit tests only
24+
./gradlew test
25+
26+
# Run a single unit test class
27+
./gradlew :core:test --tests "org.opensearch.sql.analysis.AnalyzerTest"
28+
29+
# Run integration tests
30+
./gradlew :integ-test:integTest
31+
32+
# Run a single integration test
33+
./gradlew :integ-test:integTest -Dtests.class="*QueryIT"
34+
35+
# Skip Prometheus if unavailable
36+
./gradlew :integ-test:integTest -DignorePrometheus
37+
38+
# Code formatting
39+
./gradlew spotlessCheck # Check
40+
./gradlew spotlessApply # Auto-fix
41+
42+
# Regenerate ANTLR parsers from grammar files
43+
./gradlew generateGrammarSource
44+
45+
# Run plugin locally with OpenSearch
46+
./gradlew :opensearch-sql-plugin:run
47+
./gradlew :opensearch-sql-plugin:run -DdebugJVM # With remote debug on port 5005
48+
49+
# Run doctests
50+
./gradlew :doctest:doctest
51+
./gradlew :doctest:doctest -Pdocs=search # Single file
52+
```
53+
54+
## Code Style
55+
56+
- **Google Java Format** enforced via Spotless (2-space indent, 100 char line limit)
57+
- **Lombok** is used throughout — `@Getter`, `@Builder`, `@RequiredArgsConstructor`, etc.
58+
- **License header** required on all Java files (Apache 2.0). Missing headers fail the build.
59+
- Pre-commit hooks run `spotlessApply` automatically
60+
- All commits must include a DCO sign-off: `Signed-off-by: Name <email>` (use `git commit -s`).
61+
62+
## Architecture
63+
64+
### Query Pipeline
65+
66+
```
67+
User Query (SQL/PPL)
68+
→ Parsing (ANTLR) — produces parse tree
69+
→ AST Construction (AstBuilder visitor) — produces UnresolvedPlan
70+
→ Semantic Analysis (Analyzer) — resolves symbols/types → LogicalPlan
71+
→ Planning (Planner + LogicalPlanOptimizer) — produces PhysicalPlan
72+
→ Execution (ExecutionEngine) — streams ExprValue results
73+
→ Response Formatting (ResponseFormatter — JSON/CSV/JDBC)
74+
```
75+
76+
### Module Dependency Graph
77+
78+
```
79+
plugin (OpenSearch plugin entry point, Guice DI wiring)
80+
├── sql — SQL parsing (ANTLR → AST via SQLSyntaxParser/AstBuilder)
81+
├── ppl — PPL parsing (ANTLR → AST via PPLSyntaxParser/AstBuilder)
82+
├── core — Central module: Analyzer, Planner, ExecutionEngine interfaces,
83+
│ AST/LogicalPlan/PhysicalPlan node types, expression system, type system
84+
├── opensearch — OpenSearch storage engine, execution engine, client
85+
├── protocol — Response formatters (JSON, CSV, JDBC, YAML)
86+
├── common — Shared settings and utilities
87+
├── legacy — V1 SQL engine (backward compatibility fallback)
88+
├── datasources — Multi-datasource support (Glue, Security Lake, Prometheus)
89+
├── async-query / async-query-core — Spark-based async query execution
90+
├── direct-query / direct-query-core — Direct external datasource queries
91+
└── language-grammar — Centralized ANTLR .g4 grammar files
92+
```
93+
94+
`core` has no dependency on other modules. `sql` and `ppl` depend on `core` and `language-grammar`. `opensearch` implements `core` interfaces.
95+
96+
### Key Source Locations
97+
98+
| Area | Key Files |
99+
|------|-----------|
100+
| Plugin entry | `plugin/.../SQLPlugin.java`, `plugin/.../OpenSearchPluginModule.java` |
101+
| SQL parsing | `sql/.../sql/parser/AstBuilder.java`, `sql/.../SQLService.java` |
102+
| PPL parsing | `ppl/.../ppl/parser/AstBuilder.java`, `ppl/.../PPLService.java` |
103+
| ANTLR grammars | `language-grammar/src/main/antlr4/` (OpenSearchSQLParser.g4, OpenSearchPPLParser.g4) |
104+
| Analysis | `core/.../analysis/Analyzer.java`, `core/.../analysis/ExpressionAnalyzer.java` |
105+
| Planning | `core/.../planner/Planner.java`, `core/.../planner/logical/LogicalPlan.java` |
106+
| Execution | `core/.../executor/ExecutionEngine.java`, `opensearch/.../OpenSearchExecutionEngine.java` |
107+
| Storage | `opensearch/.../storage/OpenSearchStorageEngine.java` |
108+
| Query orchestration | `core/.../executor/QueryService.java`, `core/.../executor/QueryPlanFactory.java` |
109+
110+
### Core Abstractions
111+
112+
- **`Node<T>`** — Base AST node with visitor pattern support
113+
- **`UnresolvedPlan`** / **`LogicalPlan`** / **`PhysicalPlan`** — Query plan hierarchy (unresolved → logical → physical)
114+
- **`Expression`** — Resolved expression with `valueOf()` and `type()`
115+
- **`ExprValue`** — Runtime value types (ExprIntegerValue, ExprStringValue, etc.)
116+
- **`ExprType`** — Type system (DATE, TIMESTAMP, DOUBLE, STRUCT, etc.)
117+
- **`StorageEngine`** / **`Table`** — Pluggable storage abstraction
118+
- **`ExecutionEngine`** — Executes physical plans, returns QueryResponse
119+
120+
### Design Patterns
121+
122+
- **Visitor pattern** used pervasively: `AbstractNodeVisitor`, `LogicalPlanNodeVisitor`, `PhysicalPlanNodeVisitor`, `ExpressionNodeVisitor`
123+
- **PhysicalPlan** implements `Iterator<ExprValue>` for streaming execution
124+
- **Guice** dependency injection in `OpenSearchPluginModule`
125+
126+
## Adding New PPL Commands
127+
128+
Follow the checklist in `docs/dev/ppl-commands.md`:
129+
1. Update lexer/parser grammars (OpenSearchPPLLexer.g4, OpenSearchPPLParser.g4)
130+
2. Add AST node under `org.opensearch.sql.ast.tree`
131+
3. Add `visit*` method in `AbstractNodeVisitor`, override in `Analyzer`, `CalciteRelNodeVisitor`, `PPLQueryDataAnonymizer`
132+
4. Unit tests extending `CalcitePPLAbstractTest` (include `verifyLogical()` and `verifyPPLToSparkSQL()`)
133+
5. Integration tests extending `PPLIntegTestCase`
134+
6. Add user docs under `docs/user/ppl/cmd/`
135+
136+
## Adding New PPL Functions
137+
138+
Follow `docs/dev/ppl-functions.md`. Three approaches:
139+
1. Reuse existing Calcite operators from `SqlStdOperatorTable`/`SqlLibraryOperators`
140+
2. Adapt static Java methods via `UserDefinedFunctionUtils.adapt*ToUDF`
141+
3. Implement `ImplementorUDF` interface from scratch, register in `PPLBuiltinOperators`
142+
143+
## Calcite Engine
144+
145+
The project has two execution engines: the legacy **v2 engine** and the newer **Calcite engine** (Apache Calcite-based). Calcite is toggled via `plugins.calcite.enabled` setting (default: off in production, toggled per-test in integration tests).
146+
147+
- In integration tests, call `enableCalcite()` in `init()` to activate the Calcite path
148+
- Some features (e.g., graphLookup) require pushdown optimization — use `enabledOnlyWhenPushdownIsEnabled()` to skip tests in the `CalciteNoPushdownIT` suite
149+
- `CalciteNoPushdownIT` is a JUnit `@Suite` that re-runs Calcite test classes with pushdown disabled; add new test classes to its `@Suite.SuiteClasses` list
150+
151+
## Integration Tests
152+
153+
Located in `integ-test/src/test/java/`. Organized by area: `sql/`, `ppl/`, `calcite/`, `legacy/`, `jdbc/`, `datasource/`, `asyncquery/`, `security/`. Uses OpenSearch test framework (in-memory cluster per test class). YAML REST tests in `integ-test/src/yamlRestTest/resources/rest-api-spec/test/`.
154+
155+
Key base classes:
156+
- `PPLIntegTestCase` — base for PPL integration tests (v2 engine)
157+
- `CalcitePPLIT` — base for Calcite PPL integration tests (calls `enableCalcite()`)
158+
- `CalcitePPLAbstractTest` — base for Calcite PPL unit tests (`verifyLogical()`, `verifyPPLToSparkSQL()`)
159+
- `CalciteExplainIT` — explain plan tests using YAML expected output files in `integ-test/src/test/resources/expectedOutput/calcite/`

0 commit comments

Comments
 (0)