You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DEVELOPER_GUIDE.rst
+5-125Lines changed: 5 additions & 125 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -211,133 +211,13 @@ Java files are formatted using `Spotless <https://github.com/diffplug/spotless>`
211
211
* - Javadoc format can be maintained by wrapping javadoc with `<pre></pre>` HTML tags
212
212
* - Strings can be formatted on multiple lines with a `+` with the correct indentation for the string.
213
213
214
+
Development Guidelines
215
+
----------------------
214
216
215
-
New PPL Command Checklist
216
-
=========================
217
+
For detailed development documentation, please refer to the `development documentation <docs/dev/index.md>`_. For specific guidance on implementing PPL components, see the following resources:
217
218
218
-
If you are working on contributing a new PPL command, please read this guide and review all items in the checklist are done before code review. You also can leverage this checklist to guide how to add new PPL command.
219
-
220
-
Prerequisite
221
-
------------
222
-
223
-
|✅ Open an RFC issue before starting to code:
224
-
- Describe the purpose of the new command
225
-
- Include at least syntax definition, usage and examples
226
-
- Implementation options are welcome if you have multiple ways to implement it
227
-
228
-
|✅ Obtain PM review approval for the RFC:
229
-
- If PM unavailable, consult repository maintainers as alternative
230
-
- An offline meeting might be required to discuss the syntax and usage
231
-
232
-
Coding & Tests
233
-
--------------
234
-
235
-
|✅ Lexer/Parser Updates:
236
-
- Add new keywords to OpenSearchPPLLexer.g4
237
-
- Add grammar rules to OpenSearchPPLParser.g4
238
-
- Update ``commandName`` and ``keywordsCanBeId``
239
-
|✅ AST Implementation:
240
-
- Add new tree nodes under package ``org.opensearch.sql.ast.tree``
241
-
- Prefer reusing ``Argument`` for command arguments **over** creating new expression nodes under ``org.opensearch.sql.ast.expression``
242
-
|✅ Visitor Pattern:
243
-
- Add ``visit*`` in ``AbstractNodeVisitor``
244
-
- Overriding ``visit*`` in ``Analyzer``, ``CalciteRelNodeVisitor`` and ``PPLQueryDataAnonymizer``
245
-
|✅ Unit Tests:
246
-
- Extend ``CalcitePPLAbstractTest``
247
-
- Keep test queries minimal
248
-
- Include ``verifyLogical()`` and ``verifyPPLToSparkSQL()``
249
-
|✅ Integration tests (pushdown):
250
-
- Extend ``PPLIntegTestCase``
251
-
- Use complex real-world queries
252
-
- Include ``verifySchema()`` and ``verifyDataRows()``
253
-
|✅ Integration tests (Non-pushdown):
254
-
- Add test class to ``CalciteNoPushdownIT``
255
-
|✅ Explain tests:
256
-
- Add tests to ``ExplainIT`` or ``CalciteExplainIT``
257
-
|✅ Unsupported in v2 test:
258
-
- Add a test in ``NewAddedCommandsIT``
259
-
|✅ Anonymizer tests:
260
-
- Add a test in ``PPLQueryDataAnonymizerTest``
261
-
|✅ Cross-cluster Tests (optional, nice to have):
262
-
- Add a test in ``CrossClusterSearchIT``
263
-
|✅ User doc:
264
-
- Add a xxx.rst under ``docs/user/ppl/cmd`` and link the new doc to ``docs/user/ppl/index.rst``
265
-
266
-
Developing PPL Functions
267
-
========================
268
-
269
-
PPL functions include user-defined functions (UDFs) and user-defined aggregation functions (UDAFs). This section
270
-
provides guidance on implementing and integrating these functions with the OpenSearch SQL engine.
271
-
272
-
Prerequisites
273
-
-------------
274
-
275
-
|✅ Create an issue describing the purpose and expected behavior of the function
276
-
277
-
|✅ Ensure the function name is recognized by PPL syntax by checking ``OpenSearchPPLLexer.g4``, ``OpenSearchPPLParser.g4``, and ``BuiltinFunctionName.java``
278
-
279
-
|✅ Plan the documentation of the function under ``docs/user/ppl/functions/`` directory
280
-
281
-
Developing User-Defined Functions (UDFs)
282
-
----------------------------------------
283
-
284
-
|✅ Creating UDFs: A user-defined function is an instance of `SqlOperator <https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlOperator.html>`_ that transforms input row expressions (`RexNode <https://calcite.apache.org/javadocAggregate/org/apache/calcite/rex/RexNode.html>`_) into a new one. There are three approaches to implementing UDFs:
285
-
286
-
- Use existing Calcite operators: Leverage operators already declared in Calcite's `SqlStdOperatorTable <https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/fun/SqlStdOperatorTable.html>`_ or `SqlLibraryOperators <https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/fun/SqlLibraryOperators.html>`_, and defined in `RexImpTable.java <https://calcite.apache.org/javadocAggregate/org/apache/calcite/adapter/enumerable/RexImpTable.html>`_
287
-
- Adapt existing static methods: Convert Java static methods to UDFs using utility functions like ``UserDefinedFunctionUtils.adaptExprMethodToUDF``
288
-
- Implement from scratch
289
-
290
-
* Implement the ``ImplementorUDF`` interface
291
-
* Instantiate and convert it to a ``SqlOperator`` in ``PPLBuiltinOperators``
292
-
* For optimal UDF performance, implement any data-independent logic during the compilation phase instead of at runtime. Specifically, use `linq4j expressions <https://calcite.apache.org/javadocAggregate/org/apache/calcite/linq4j/tree/Expression.html>`_ for these operations rather than internal static method calls, as expressions are evaluated during compilation.
293
-
294
-
|✅ Type Checking for UDFs
295
-
- Each ``SqlOperator`` provides an operand type checker via the ``getOperandTypeChecker`` method
296
-
- Calcite's built-in operators come with predefined type checkers of type ``SqlOperandTypeChecker``
297
-
- For custom UDFs, the ``UDFOperandMetadata`` interface is used to feed function type information so that a ``SqlOperandTypeChecker`` can be retrieved in a same way as Calcite's built-in operators. Most of the operand types are defined in ``PPLOperandTypes`` as instances of ``UDFOperandMetadata``.
298
-
- ``SqlOperandTypeChecker`` works on parsed SQL tree, which is not tapped in our architecture. Therefore, ``PPLTypeChecker`` interface is created to perform actual type checking. most of instances of ``PPLTypeChecker`` are created by wrapping Calcite's built-in type checkers.
299
-
300
-
|✅ Registering UDFs: UDF should be registered in ``PPLFuncImpTable``.
301
-
- The preferred API is ``AbstractBuilder::registerOperator(BuiltinFunctionName functionName, SqlOperator... operators)``
302
-
303
-
* This automatically extracts type checkers from operators and converts them to ``PPLTypeChecker`` instances
304
-
* Multiple implementations can be registered to the same function name for overloading
305
-
* The system will try to resolve functions based on argument types, with automatic coercion when needed
306
-
307
-
- A lower-level registration API is also available: ``AbstractBuilder::register(BuiltinFunctionName functionName, FunctionImp functionImp, PPLTypeChecker typeChecker)``
308
-
309
-
* This explicitly defines how ``RexNode`` expressions should be converted and checked
310
-
* Use this when you need a custom type checker or to customize an existing function by tweaking its arguments
311
-
* Setting ``typeChecker`` to ``null`` will bypass type checking (use with caution)
312
-
313
-
|✅ External Functions: Some functions require integration with underlying data sources:
314
-
- Register external functions using ``PPLFuncImpTable::registerExternalOperator``
315
-
- For example, the ``GEOIP`` function relies on the `opensearch-geospatial <https://github.com/opensearch-project/geospatial>`_ plugin.
316
-
It is registered as an external function in ``OpenSearchExecutionEngine``.
317
-
318
-
|✅ Testing UDFs
319
-
- Integration tests in ``Calcite*IT`` classes to verify function correctness
320
-
- Unit tests in ``CalcitePPLFunctionTypeTest`` to validate type checker behavior
321
-
- Push-down tests in ``CalciteExplainIT`` if the function can be pushed down as a domain-specific language (DSL)
If you are working on contributing a new PPL command, please read this guide and review all items in the checklist are done before code review. You also can leverage this checklist to guide how to add new PPL command.
4
+
5
+
## Prerequisite
6
+
7
+
-[ ]**Open an RFC issue before starting to code:**
8
+
- Describe the purpose of the new command
9
+
- Include at least syntax definition, usage and examples
10
+
- Implementation options are welcome if you have multiple ways to implement it
11
+
12
+
-[ ]**Obtain PM review approval for the RFC:**
13
+
- If PM unavailable, consult repository maintainers as alternative
14
+
- An offline meeting might be required to discuss the syntax and usage
15
+
16
+
## Coding & Tests
17
+
18
+
-[ ]**Lexer/Parser Updates:**
19
+
- Add new keywords to OpenSearchPPLLexer.g4
20
+
- Add grammar rules to OpenSearchPPLParser.g4
21
+
- Update `commandName` and `keywordsCanBeId`
22
+
23
+
-[ ]**AST Implementation:**
24
+
- Add new tree nodes under package `org.opensearch.sql.ast.tree`
25
+
- Prefer reusing `Argument` for command arguments **over** creating new expression nodes under `org.opensearch.sql.ast.expression`
26
+
27
+
-[ ]**Visitor Pattern:**
28
+
- Add `visit*` in `AbstractNodeVisitor`
29
+
- Overriding `visit*` in `Analyzer`, `CalciteRelNodeVisitor` and `PPLQueryDataAnonymizer`
30
+
31
+
-[ ]**Unit Tests:**
32
+
- Extend `CalcitePPLAbstractTest`
33
+
- Keep test queries minimal
34
+
- Include `verifyLogical()` and `verifyPPLToSparkSQL()`
35
+
36
+
-[ ]**Integration tests (pushdown):**
37
+
- Extend `PPLIntegTestCase`
38
+
- Use complex real-world queries
39
+
- Include `verifySchema()` and `verifyDataRows()`
40
+
41
+
-[ ]**Integration tests (Non-pushdown):**
42
+
- Add test class to `CalciteNoPushdownIT`
43
+
44
+
-[ ]**Explain tests:**
45
+
- Add tests to `ExplainIT` or `CalciteExplainIT`
46
+
47
+
-[ ]**Unsupported in v2 test:**
48
+
- Add a test in `NewAddedCommandsIT`
49
+
50
+
-[ ]**Anonymizer tests:**
51
+
- Add a test in `PPLQueryDataAnonymizerTest`
52
+
53
+
-[ ]**Cross-cluster Tests (optional, nice to have):**
54
+
- Add a test in `CrossClusterSearchIT`
55
+
56
+
-[ ]**User doc:**
57
+
- Add a xxx.rst under `docs/user/ppl/cmd` and link the new doc to `docs/user/ppl/index.rst`
0 commit comments