Skip to content

Commit b1e3ab4

Browse files
committed
Move ppl command and function dev guidance from DEVELOEPR_GUIDE to docs/dev
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
1 parent ce005e1 commit b1e3ab4

6 files changed

Lines changed: 315 additions & 154 deletions

File tree

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Resolves #[Issue number to be closed when this PR is merged]
1010
- [ ] New functionality has been documented.
1111
- [ ] New functionality has javadoc added.
1212
- [ ] New functionality has a user manual doc added.
13-
- [ ] New PPL command [checklist](https://github.com/opensearch-project/sql/blob/main/DEVELOPER_GUIDE.rst#new-ppl-command-checklist) all confirmed.
13+
- [ ] New PPL command [checklist](https://github.com/opensearch-project/sql/blob/main/docs/dev/ppl-commands.md) all confirmed.
1414
- [ ] API changes companion pull request [created](https://github.com/opensearch-project/opensearch-api-specification/blob/main/DEVELOPER_GUIDE.md).
1515
- [ ] Commits are signed per the DCO using `--signoff` or `-s`.
1616
- [ ] Public documentation issue/PR [created](https://github.com/opensearch-project/documentation-website/issues/new/choose).

DEVELOPER_GUIDE.rst

Lines changed: 5 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -211,133 +211,13 @@ Java files are formatted using `Spotless <https://github.com/diffplug/spotless>`
211211
* - Javadoc format can be maintained by wrapping javadoc with `<pre></pre>` HTML tags
212212
* - Strings can be formatted on multiple lines with a `+` with the correct indentation for the string.
213213

214+
Development Guidelines
215+
----------------------
214216

215-
New PPL Command Checklist
216-
=========================
217+
For detailed development documentation, please refer to the `development documentation <docs/dev/index.md>`_. For specific guidance on implementing PPL components, see the following resources:
217218

218-
If you are working on contributing a new PPL command, please read this guide and review all items in the checklist are done before code review. You also can leverage this checklist to guide how to add new PPL command.
219-
220-
Prerequisite
221-
------------
222-
223-
| ✅ Open an RFC issue before starting to code:
224-
- Describe the purpose of the new command
225-
- Include at least syntax definition, usage and examples
226-
- Implementation options are welcome if you have multiple ways to implement it
227-
228-
| ✅ Obtain PM review approval for the RFC:
229-
- If PM unavailable, consult repository maintainers as alternative
230-
- An offline meeting might be required to discuss the syntax and usage
231-
232-
Coding & Tests
233-
--------------
234-
235-
| ✅ Lexer/Parser Updates:
236-
- Add new keywords to OpenSearchPPLLexer.g4
237-
- Add grammar rules to OpenSearchPPLParser.g4
238-
- Update ``commandName`` and ``keywordsCanBeId``
239-
| ✅ AST Implementation:
240-
- Add new tree nodes under package ``org.opensearch.sql.ast.tree``
241-
- Prefer reusing ``Argument`` for command arguments **over** creating new expression nodes under ``org.opensearch.sql.ast.expression``
242-
| ✅ Visitor Pattern:
243-
- Add ``visit*`` in ``AbstractNodeVisitor``
244-
- Overriding ``visit*`` in ``Analyzer``, ``CalciteRelNodeVisitor`` and ``PPLQueryDataAnonymizer``
245-
| ✅ Unit Tests:
246-
- Extend ``CalcitePPLAbstractTest``
247-
- Keep test queries minimal
248-
- Include ``verifyLogical()`` and ``verifyPPLToSparkSQL()``
249-
| ✅ Integration tests (pushdown):
250-
- Extend ``PPLIntegTestCase``
251-
- Use complex real-world queries
252-
- Include ``verifySchema()`` and ``verifyDataRows()``
253-
| ✅ Integration tests (Non-pushdown):
254-
- Add test class to ``CalciteNoPushdownIT``
255-
| ✅ Explain tests:
256-
- Add tests to ``ExplainIT`` or ``CalciteExplainIT``
257-
| ✅ Unsupported in v2 test:
258-
- Add a test in ``NewAddedCommandsIT``
259-
| ✅ Anonymizer tests:
260-
- Add a test in ``PPLQueryDataAnonymizerTest``
261-
| ✅ Cross-cluster Tests (optional, nice to have):
262-
- Add a test in ``CrossClusterSearchIT``
263-
| ✅ User doc:
264-
- Add a xxx.rst under ``docs/user/ppl/cmd`` and link the new doc to ``docs/user/ppl/index.rst``
265-
266-
Developing PPL Functions
267-
========================
268-
269-
PPL functions include user-defined functions (UDFs) and user-defined aggregation functions (UDAFs). This section
270-
provides guidance on implementing and integrating these functions with the OpenSearch SQL engine.
271-
272-
Prerequisites
273-
-------------
274-
275-
| ✅ Create an issue describing the purpose and expected behavior of the function
276-
277-
| ✅ Ensure the function name is recognized by PPL syntax by checking ``OpenSearchPPLLexer.g4``, ``OpenSearchPPLParser.g4``, and ``BuiltinFunctionName.java``
278-
279-
| ✅ Plan the documentation of the function under ``docs/user/ppl/functions/`` directory
280-
281-
Developing User-Defined Functions (UDFs)
282-
----------------------------------------
283-
284-
| ✅ Creating UDFs: A user-defined function is an instance of `SqlOperator <https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlOperator.html>`_ that transforms input row expressions (`RexNode <https://calcite.apache.org/javadocAggregate/org/apache/calcite/rex/RexNode.html>`_) into a new one. There are three approaches to implementing UDFs:
285-
286-
- Use existing Calcite operators: Leverage operators already declared in Calcite's `SqlStdOperatorTable <https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/fun/SqlStdOperatorTable.html>`_ or `SqlLibraryOperators <https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/fun/SqlLibraryOperators.html>`_, and defined in `RexImpTable.java <https://calcite.apache.org/javadocAggregate/org/apache/calcite/adapter/enumerable/RexImpTable.html>`_
287-
- Adapt existing static methods: Convert Java static methods to UDFs using utility functions like ``UserDefinedFunctionUtils.adaptExprMethodToUDF``
288-
- Implement from scratch
289-
290-
* Implement the ``ImplementorUDF`` interface
291-
* Instantiate and convert it to a ``SqlOperator`` in ``PPLBuiltinOperators``
292-
* For optimal UDF performance, implement any data-independent logic during the compilation phase instead of at runtime. Specifically, use `linq4j expressions <https://calcite.apache.org/javadocAggregate/org/apache/calcite/linq4j/tree/Expression.html>`_ for these operations rather than internal static method calls, as expressions are evaluated during compilation.
293-
294-
| ✅ Type Checking for UDFs
295-
- Each ``SqlOperator`` provides an operand type checker via the ``getOperandTypeChecker`` method
296-
- Calcite's built-in operators come with predefined type checkers of type ``SqlOperandTypeChecker``
297-
- For custom UDFs, the ``UDFOperandMetadata`` interface is used to feed function type information so that a ``SqlOperandTypeChecker`` can be retrieved in a same way as Calcite's built-in operators. Most of the operand types are defined in ``PPLOperandTypes`` as instances of ``UDFOperandMetadata``.
298-
- ``SqlOperandTypeChecker`` works on parsed SQL tree, which is not tapped in our architecture. Therefore, ``PPLTypeChecker`` interface is created to perform actual type checking. most of instances of ``PPLTypeChecker`` are created by wrapping Calcite's built-in type checkers.
299-
300-
| ✅ Registering UDFs: UDF should be registered in ``PPLFuncImpTable``.
301-
- The preferred API is ``AbstractBuilder::registerOperator(BuiltinFunctionName functionName, SqlOperator... operators)``
302-
303-
* This automatically extracts type checkers from operators and converts them to ``PPLTypeChecker`` instances
304-
* Multiple implementations can be registered to the same function name for overloading
305-
* The system will try to resolve functions based on argument types, with automatic coercion when needed
306-
307-
- A lower-level registration API is also available: ``AbstractBuilder::register(BuiltinFunctionName functionName, FunctionImp functionImp, PPLTypeChecker typeChecker)``
308-
309-
* This explicitly defines how ``RexNode`` expressions should be converted and checked
310-
* Use this when you need a custom type checker or to customize an existing function by tweaking its arguments
311-
* Setting ``typeChecker`` to ``null`` will bypass type checking (use with caution)
312-
313-
| ✅ External Functions: Some functions require integration with underlying data sources:
314-
- Register external functions using ``PPLFuncImpTable::registerExternalOperator``
315-
- For example, the ``GEOIP`` function relies on the `opensearch-geospatial <https://github.com/opensearch-project/geospatial>`_ plugin.
316-
It is registered as an external function in ``OpenSearchExecutionEngine``.
317-
318-
| ✅ Testing UDFs
319-
- Integration tests in ``Calcite*IT`` classes to verify function correctness
320-
- Unit tests in ``CalcitePPLFunctionTypeTest`` to validate type checker behavior
321-
- Push-down tests in ``CalciteExplainIT`` if the function can be pushed down as a domain-specific language (DSL)
322-
323-
Developing User-Defined Aggregation Functions (UDAFs)
324-
-----------------------------------------------------
325-
326-
| ✅ User-defined aggregation functions aggregate data across multiple rows. There are two main approaches to create a UDAF
327-
- Use existing Calcite aggregation operators
328-
- Implement from scratch:
329-
330-
* Extend ``SqlUserDefinedAggFunction`` with custom aggregation logic
331-
* Instantiate the new aggregation function in ``PPLBuiltinOperators``
332-
333-
| ✅ Registering UDAFs
334-
- Use ``AggBuilder::registerOperator(BuiltinFunctionName functionName, SqlAggFunction aggFunction)`` for standard registration
335-
- For more control, use ``AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler, PPLTypeChecker typeChecker)``
336-
- For functions dependent on data engines, use ``PPLFuncImpTable::registerExternalAggOperator``
337-
338-
| ✅ Testing UDAFs
339-
- Verify result correctness in ``CalcitePPLAggregationIT``
340-
- Test logical plans in ``CalcitePPLAggregationTest``
219+
- `PPL Commands <docs/dev/ppl-commands.md>`_: Guidelines for adding new commands to PPL
220+
- `PPL Functions <docs/dev/ppl-functions.md>`_: Instructions for implementing and integrating custom functions
341221

342222
Building and Running Tests
343223
==========================

core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,38 @@ public UDFOperandMetadata getOperandMetadata() {
228228
};
229229
}
230230

231+
/**
232+
* Adapts a method from the v2 implementation whose parameters include a {@link
233+
* FunctionProperties} at the beginning to a Calcite-compatible UserDefinedFunctionBuilder.
234+
*/
235+
public static ImplementorUDF adaptExprMethodWithPropertiesToUDF(
236+
java.lang.reflect.Type type,
237+
String methodName,
238+
SqlReturnTypeInference returnTypeInference,
239+
NullPolicy nullPolicy,
240+
UDFOperandMetadata operandMetadata) {
241+
NotNullImplementor implementor =
242+
(translator, call, translatedOperands) -> {
243+
List<Expression> operands =
244+
convertToExprValues(
245+
translatedOperands, call.getOperands().stream().map(RexNode::getType).toList());
246+
List<Expression> operandsWithProperties = prependFunctionProperties(operands, translator);
247+
Expression exprResult = Expressions.call(type, methodName, operandsWithProperties);
248+
return Expressions.call(exprResult, "valueForCalcite");
249+
};
250+
return new ImplementorUDF(implementor, nullPolicy) {
251+
@Override
252+
public SqlReturnTypeInference getReturnTypeInference() {
253+
return returnTypeInference;
254+
}
255+
256+
@Override
257+
public UDFOperandMetadata getOperandMetadata() {
258+
return operandMetadata;
259+
}
260+
};
261+
}
262+
231263
/**
232264
* Adapt a static math function (e.g., Math.expm1, Math.rint) to a UserDefinedFunctionBuilder.
233265
* This method generates a Calcite-compatible UDF by boxing the operand, converting it to a
@@ -278,32 +310,4 @@ public static List<Expression> prependFunctionProperties(
278310
operandsWithProperties.addFirst(properties);
279311
return Collections.unmodifiableList(operandsWithProperties);
280312
}
281-
282-
public static ImplementorUDF adaptExprMethodWithPropertiesToUDF(
283-
java.lang.reflect.Type type,
284-
String methodName,
285-
SqlReturnTypeInference returnTypeInference,
286-
NullPolicy nullPolicy,
287-
UDFOperandMetadata operandMetadata) {
288-
NotNullImplementor implementor =
289-
(translator, call, translatedOperands) -> {
290-
List<Expression> operands =
291-
convertToExprValues(
292-
translatedOperands, call.getOperands().stream().map(RexNode::getType).toList());
293-
List<Expression> operandsWithProperties = prependFunctionProperties(operands, translator);
294-
Expression exprResult = Expressions.call(type, methodName, operandsWithProperties);
295-
return Expressions.call(exprResult, "valueForCalcite");
296-
};
297-
return new ImplementorUDF(implementor, nullPolicy) {
298-
@Override
299-
public SqlReturnTypeInference getReturnTypeInference() {
300-
return returnTypeInference;
301-
}
302-
303-
@Override
304-
public UDFOperandMetadata getOperandMetadata() {
305-
return operandMetadata;
306-
}
307-
};
308-
}
309313
}

docs/dev/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@
3838
+ [Nested Function In Select Clause](sql-nested-function-select-clause.md): Nested function support in sql select clause
3939
+ [Nested Function In Where Clause](sql-nested-function-where-clause.md): Nested function support in sql where clause
4040
+ **Piped Processing Language**
41+
+ [PPL Command Checklist](ppl-commands.md): A checklist of developing a new PPL command
42+
+ [PPL Functions](ppl-functions.md): Guidance on developing a PPL function
4143

4244
### Query Processing
4345

docs/dev/ppl-commands.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# New PPL Command Checklist
2+
3+
If you are working on contributing a new PPL command, please read this guide and review all items in the checklist are done before code review. You also can leverage this checklist to guide how to add new PPL command.
4+
5+
## Prerequisite
6+
7+
- [ ] **Open an RFC issue before starting to code:**
8+
- Describe the purpose of the new command
9+
- Include at least syntax definition, usage and examples
10+
- Implementation options are welcome if you have multiple ways to implement it
11+
12+
- [ ] **Obtain PM review approval for the RFC:**
13+
- If PM unavailable, consult repository maintainers as alternative
14+
- An offline meeting might be required to discuss the syntax and usage
15+
16+
## Coding & Tests
17+
18+
- [ ] **Lexer/Parser Updates:**
19+
- Add new keywords to OpenSearchPPLLexer.g4
20+
- Add grammar rules to OpenSearchPPLParser.g4
21+
- Update `commandName` and `keywordsCanBeId`
22+
23+
- [ ] **AST Implementation:**
24+
- Add new tree nodes under package `org.opensearch.sql.ast.tree`
25+
- Prefer reusing `Argument` for command arguments **over** creating new expression nodes under `org.opensearch.sql.ast.expression`
26+
27+
- [ ] **Visitor Pattern:**
28+
- Add `visit*` in `AbstractNodeVisitor`
29+
- Overriding `visit*` in `Analyzer`, `CalciteRelNodeVisitor` and `PPLQueryDataAnonymizer`
30+
31+
- [ ] **Unit Tests:**
32+
- Extend `CalcitePPLAbstractTest`
33+
- Keep test queries minimal
34+
- Include `verifyLogical()` and `verifyPPLToSparkSQL()`
35+
36+
- [ ] **Integration tests (pushdown):**
37+
- Extend `PPLIntegTestCase`
38+
- Use complex real-world queries
39+
- Include `verifySchema()` and `verifyDataRows()`
40+
41+
- [ ] **Integration tests (Non-pushdown):**
42+
- Add test class to `CalciteNoPushdownIT`
43+
44+
- [ ] **Explain tests:**
45+
- Add tests to `ExplainIT` or `CalciteExplainIT`
46+
47+
- [ ] **Unsupported in v2 test:**
48+
- Add a test in `NewAddedCommandsIT`
49+
50+
- [ ] **Anonymizer tests:**
51+
- Add a test in `PPLQueryDataAnonymizerTest`
52+
53+
- [ ] **Cross-cluster Tests (optional, nice to have):**
54+
- Add a test in `CrossClusterSearchIT`
55+
56+
- [ ] **User doc:**
57+
- Add a xxx.rst under `docs/user/ppl/cmd` and link the new doc to `docs/user/ppl/index.rst`

0 commit comments

Comments
 (0)