Skip to content

Commit ffefc08

Browse files
committed
Merge remote-tracking branch 'origin/main' into issues/4004
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
2 parents d60f68f + 8923551 commit ffefc08

74 files changed

Lines changed: 3967 additions & 212 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
# This should match the owning team set up in https://github.com/orgs/opensearch-project/teams
2-
* @pjfitzgibbons @ps48 @kavithacm @derek-ho @joshuali925 @dai-chen @YANG-DB @rupal-bq @mengweieric @vamsi-amazon @swiddis @penghuo @seankao-az @MaxKsyunz @Yury-Fridlyand @anirudha @forestmvey @acarbonetto @GumpacG @ykmr1224 @LantaoJin @noCharger @qianheng-aws
2+
* @ps48 @kavithacm @derek-ho @joshuali925 @dai-chen @YANG-DB @mengweieric @vamsimanohar @swiddis @penghuo @seankao-az @MaxKsyunz @Yury-Fridlyand @anirudha @forestmvey @acarbonetto @GumpacG @ykmr1224 @LantaoJin @noCharger @qianheng-aws @yuancu

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ Resolves #[Issue number to be closed when this PR is merged]
1010
- [ ] New functionality has been documented.
1111
- [ ] New functionality has javadoc added.
1212
- [ ] New functionality has a user manual doc added.
13+
- [ ] New PPL command [checklist](https://github.com/opensearch-project/sql/blob/main/DEVELOPER_GUIDE.rst#new-ppl-command-checklist) all confirmed.
1314
- [ ] API changes companion pull request [created](https://github.com/opensearch-project/opensearch-api-specification/blob/main/DEVELOPER_GUIDE.md).
14-
- [ ] Commits are signed per the DCO using `--signoff`.
15+
- [ ] Commits are signed per the DCO using `--signoff` or `-s`.
1516
- [ ] Public documentation issue/PR [created](https://github.com/opensearch-project/documentation-website/issues/new/choose).
1617

1718
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

.github/maven-publish-utils.sh

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44

55
set -e
66

7+
# Flag to disable commit mapping functionality
8+
# Set to "true" to disable commit mapping operations
9+
DISABLE_COMMIT_MAPPING="${DISABLE_COMMIT_MAPPING:-true}"
10+
711
# Function to execute curl commands with retry and error handling
812
execute_curl_with_retry() {
913
local url="$1"
@@ -111,6 +115,11 @@ update_version_metadata() {
111115
local commit_id="$3"
112116
local snapshot_repo_url="${4:-$SNAPSHOT_REPO_URL}"
113117

118+
if [ "$DISABLE_COMMIT_MAPPING" = "true" ]; then
119+
echo "Skipping version metadata update (commit mapping disabled)"
120+
return 0
121+
fi
122+
114123
echo "Updating version metadata for ${artifact_id} version ${version} with commit ID ${commit_id}"
115124

116125
TEMP_DIR=$(mktemp -d)
@@ -204,6 +213,11 @@ update_commit_mapping() {
204213
local commit_map_filename="${5:-$COMMIT_MAP_FILENAME}"
205214
local snapshot_repo_url="${6:-$SNAPSHOT_REPO_URL}"
206215

216+
if [ "$DISABLE_COMMIT_MAPPING" = "true" ]; then
217+
echo "Skipping commit-version mapping update (commit mapping disabled)"
218+
return 0
219+
fi
220+
207221
echo "Updating commit-version mapping for ${artifact_id}"
208222

209223
# Create temp directory for work

DEVELOPER_GUIDE.rst

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,56 @@ Java files are formatted using `Spotless <https://github.com/diffplug/spotless>`
217217
* - Javadoc format can be maintained by wrapping javadoc with `<pre></pre>` HTML tags
218218
* - Strings can be formatted on multiple lines with a `+` with the correct indentation for the string.
219219

220+
New PPL Command Checklist
221+
=========================
222+
223+
If you are working on contributing a new PPL command, please read this guide and review all items in the checklist are done before code review. You also can leverage this checklist to guide how to add new PPL command.
224+
225+
Prerequisite
226+
------------
227+
228+
| ✅ Open an RFC issue before starting to code:
229+
- Describe the purpose of the new command
230+
- Include at least syntax definition, usage and examples
231+
- Implementation options are welcome if you have multiple ways to implement it
232+
| ✅ Obtain PM review approval for the RFC:
233+
- If PM unavailable, consult repository maintainers as alternative
234+
- An offline meeting might be required to discuss the syntax and usage
235+
236+
Coding & Tests
237+
--------------
238+
239+
| ✅ Lexer/Parser Updates:
240+
- Add new keywords to OpenSearchPPLLexer.g4
241+
- Add grammar rules to OpenSearchPPLParser.g4
242+
- Update ``commandName`` and ``keywordsCanBeId``
243+
| ✅ AST Implementation:
244+
- Add new tree nodes under package ``org.opensearch.sql.ast.tree``
245+
- Prefer reusing ``Argument`` for command arguments **over** creating new expression nodes under ``org.opensearch.sql.ast.expression``
246+
| ✅ Visitor Pattern:
247+
- Add ``visit*`` in ``AbstractNodeVisitor``
248+
- Overriding ``visit*`` in ``Analyzer``, ``CalciteRelNodeVisitor`` and ``PPLQueryDataAnonymizer``
249+
| ✅ Unit Tests:
250+
- Extend ``CalcitePPLAbstractTest``
251+
- Keep test queries minimal
252+
- Include ``verifyLogical()`` and ``verifyPPLToSparkSQL()``
253+
| ✅ Integration tests (pushdown):
254+
- Extend ``PPLIntegTestCase``
255+
- Use complex real-world queries
256+
- Include ``verifySchema()`` and ``verifyDataRows()``
257+
| ✅ Integration tests (Non-pushdown):
258+
- Add test class to ``CalciteNoPushdownIT``
259+
| ✅ Explain tests:
260+
- Add tests to ``ExplainIT`` or ``CalciteExplainIT``
261+
| ✅ Unsupported in v2 test:
262+
- Add a test in ``NewAddedCommandsIT``
263+
| ✅ Anonymizer tests:
264+
- Add a test in ``PPLQueryDataAnonymizerTest``
265+
| ✅ Cross-cluster Tests (optional, nice to have):
266+
- Add a test in ``CrossClusterSearchIT``
267+
| ✅ User doc:
268+
- Add a xxx.rst under ``docs/user/ppl/cmd`` and link the new doc to ``docs/user/ppl/index.rst``
269+
220270
Building and Running Tests
221271
==========================
222272

MAINTAINERS.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,8 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
1010
| Joshua Li | [joshuali925](https://github.com/joshuali925) | Amazon |
1111
| Shenoy Pratik | [ps48](https://github.com/ps48) | Amazon |
1212
| Kavitha Mohan | [kavithacm](https://github.com/kavithacm) | Amazon |
13-
| Rupal Mahajan | [rupal-bq](https://github.com/rupal-bq) | Amazon |
1413
| Derek Ho | [derek-ho](https://github.com/derek-ho) | Amazon |
1514
| Lior Perry | [YANG-DB](https://github.com/YANG-DB) | Amazon |
16-
| Peter Fitzgibbons | [pjfitzgibbons](https://github.com/pjfitzgibbons) | Amazon |
1715
| Simeon Widdis | [swiddis](https://github.com/swiddis) | Amazon |
1816
| Chen Dai | [dai-chen](https://github.com/dai-chen) | Amazon |
1917
| Vamsi Manohar | [vamsimanohar](https://github.com/vamsimanohar) | Amazon |
@@ -24,6 +22,7 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
2422
| Lantao Jin | [LantaoJin](https://github.com/LantaoJin) | Amazon |
2523
| Louis Chu | [noCharger](https://github.com/noCharger) | Amazon |
2624
| Heng Qian | [qianheng-aws](https://github.com/qianheng-aws) | Amazon |
25+
| Yuanchun Shen | [yuancu](https://github.com/yuancu) | Amazon |
2726
| Max Ksyunz | [MaxKsyunz](https://github.com/MaxKsyunz) | Improving |
2827
| Yury Fridlyand | [Yury-Fridlyand](https://github.com/Yury-Fridlyand) | Improving |
2928
| Andrew Carbonetto | [acarbonetto](https://github.com/acarbonetto) | Improving |
@@ -40,3 +39,6 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
4039
| Eugene Lee | [eugenesk24](https://github.com/eugenesk24) | Amazon |
4140
| Zhongnan Su | [zhongnansu](https://github.com/zhongnansu) | Amazon |
4241
| Chloe Zhang | [chloe-zh](https://github.com/chloe-zh) | Amazon |
42+
| Peter Fitzgibbons | [pjfitzgibbons](https://github.com/pjfitzgibbons) | Amazon |
43+
| Rupal Mahajan | [rupal-bq](https://github.com/rupal-bq) | Amazon |
44+

async-query-core/src/main/java/org/opensearch/sql/spark/utils/SQLQueryUtils.java

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,24 @@ public class SQLQueryUtils {
4343
private static final Logger logger = LogManager.getLogger(SQLQueryUtils.class);
4444

4545
public static List<FullyQualifiedTableName> extractFullyQualifiedTableNames(String sqlQuery) {
46-
SqlBaseParser sqlBaseParser =
47-
new SqlBaseParser(
48-
new CommonTokenStream(new SqlBaseLexer(new CaseInsensitiveCharStream(sqlQuery))));
49-
sqlBaseParser.addErrorListener(new SyntaxAnalysisErrorListener());
46+
return extractFullyQualifiedTableNamesWithMetadata(sqlQuery).getFullyQualifiedTableNames();
47+
}
48+
49+
public static TableExtractionResult extractFullyQualifiedTableNamesWithMetadata(String sqlQuery) {
50+
SqlBaseParser sqlBaseParser = getBaseParser(sqlQuery);
5051
StatementContext statement = sqlBaseParser.statement();
51-
SparkSqlTableNameVisitor sparkSqlTableNameVisitor = new SparkSqlTableNameVisitor();
52-
statement.accept(sparkSqlTableNameVisitor);
53-
return sparkSqlTableNameVisitor.getFullyQualifiedTableNames();
52+
SparkSqlTableNameVisitor visitor = new SparkSqlTableNameVisitor();
53+
statement.accept(visitor);
54+
55+
// Remove duplicate table names
56+
List<FullyQualifiedTableName> uniqueFullyQualifiedTableNames = new LinkedList<>();
57+
for (FullyQualifiedTableName fullyQualifiedTableName : visitor.getFullyQualifiedTableNames()) {
58+
if (!uniqueFullyQualifiedTableNames.contains(fullyQualifiedTableName)) {
59+
uniqueFullyQualifiedTableNames.add(fullyQualifiedTableName);
60+
}
61+
}
62+
63+
return new TableExtractionResult(uniqueFullyQualifiedTableNames, visitor.isCreateTable());
5464
}
5565

5666
public static IndexQueryDetails extractIndexDetails(String sqlQuery) {
@@ -93,6 +103,8 @@ public static class SparkSqlTableNameVisitor extends SqlBaseParserBaseVisitor<Vo
93103
@Getter
94104
private final List<FullyQualifiedTableName> fullyQualifiedTableNames = new LinkedList<>();
95105

106+
@Getter private boolean isCreateTable = false;
107+
96108
public SparkSqlTableNameVisitor() {}
97109

98110
@Override
@@ -131,6 +143,12 @@ public Void visitCreateTableHeader(SqlBaseParser.CreateTableHeaderContext ctx) {
131143
}
132144
return super.visitCreateTableHeader(ctx);
133145
}
146+
147+
@Override
148+
public Void visitCreateTable(SqlBaseParser.CreateTableContext ctx) {
149+
isCreateTable = true;
150+
return super.visitCreateTable(ctx);
151+
}
134152
}
135153

136154
public static class FlintSQLIndexDetailsVisitor extends FlintSparkSqlExtensionsBaseVisitor<Void> {
@@ -381,4 +399,15 @@ public String removeUnwantedQuotes(String input) {
381399
return input.replaceAll("^\"|\"$", "");
382400
}
383401
}
402+
403+
public static class TableExtractionResult {
404+
@Getter private final List<FullyQualifiedTableName> fullyQualifiedTableNames;
405+
@Getter private final boolean isCreateTableQuery;
406+
407+
public TableExtractionResult(
408+
List<FullyQualifiedTableName> fullyQualifiedTableNames, boolean isCreateTableQuery) {
409+
this.fullyQualifiedTableNames = fullyQualifiedTableNames;
410+
this.isCreateTableQuery = isCreateTableQuery;
411+
}
412+
}
384413
}

async-query-core/src/test/java/org/opensearch/sql/spark/utils/SQLQueryUtilsTest.java

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.opensearch.sql.spark.dispatcher.model.IndexQueryActionType;
2626
import org.opensearch.sql.spark.dispatcher.model.IndexQueryDetails;
2727
import org.opensearch.sql.spark.flint.FlintIndexType;
28+
import org.opensearch.sql.spark.utils.SQLQueryUtils.TableExtractionResult;
2829

2930
@ExtendWith(MockitoExtension.class)
3031
public class SQLQueryUtilsTest {
@@ -444,6 +445,69 @@ void testRecoverIndex() {
444445
assertEquals(IndexQueryActionType.RECOVER, indexDetails.getIndexQueryActionType());
445446
}
446447

448+
@Test
449+
void testExtractFullyQualifiedTableNamesWithMetadata() {
450+
// Test CREATE TABLE queries
451+
String createTableQuery =
452+
"CREATE EXTERNAL TABLE\n"
453+
+ "myS3.default.alb_logs\n"
454+
+ "[ PARTITIONED BY (col_name [, … ] ) ]\n"
455+
+ "[ ROW FORMAT DELIMITED row_format ]\n"
456+
+ "STORED AS file_format\n"
457+
+ "LOCATION { 's3://bucket/folder/' }";
458+
459+
TableExtractionResult result =
460+
SQLQueryUtils.extractFullyQualifiedTableNamesWithMetadata(createTableQuery);
461+
assertTrue(result.isCreateTableQuery());
462+
assertEquals(1, result.getFullyQualifiedTableNames().size());
463+
assertFullyQualifiedTableName(
464+
"myS3", "default", "alb_logs", result.getFullyQualifiedTableNames().get(0));
465+
466+
String createTableQuery2 =
467+
"CREATE TABLE myS3.default.new_table (id INT, name STRING) USING PARQUET";
468+
result = SQLQueryUtils.extractFullyQualifiedTableNamesWithMetadata(createTableQuery2);
469+
assertTrue(result.isCreateTableQuery());
470+
assertEquals(1, result.getFullyQualifiedTableNames().size());
471+
assertFullyQualifiedTableName(
472+
"myS3", "default", "new_table", result.getFullyQualifiedTableNames().get(0));
473+
474+
// Test SELECT queries
475+
String selectQuery = "SELECT * FROM myS3.default.alb_logs";
476+
result = SQLQueryUtils.extractFullyQualifiedTableNamesWithMetadata(selectQuery);
477+
assertFalse(result.isCreateTableQuery());
478+
assertEquals(1, result.getFullyQualifiedTableNames().size());
479+
assertFullyQualifiedTableName(
480+
"myS3", "default", "alb_logs", result.getFullyQualifiedTableNames().get(0));
481+
482+
// Test DROP TABLE queries
483+
String dropTableQuery = "DROP TABLE myS3.default.alb_logs";
484+
result = SQLQueryUtils.extractFullyQualifiedTableNamesWithMetadata(dropTableQuery);
485+
assertFalse(result.isCreateTableQuery());
486+
assertEquals(1, result.getFullyQualifiedTableNames().size());
487+
assertFullyQualifiedTableName(
488+
"myS3", "default", "alb_logs", result.getFullyQualifiedTableNames().get(0));
489+
490+
// Test DESCRIBE TABLE queries
491+
String describeTableQuery = "DESCRIBE TABLE myS3.default.alb_logs";
492+
result = SQLQueryUtils.extractFullyQualifiedTableNamesWithMetadata(describeTableQuery);
493+
assertFalse(result.isCreateTableQuery());
494+
assertEquals(1, result.getFullyQualifiedTableNames().size());
495+
assertFullyQualifiedTableName(
496+
"myS3", "default", "alb_logs", result.getFullyQualifiedTableNames().get(0));
497+
498+
// Test JOIN queries
499+
String joinQuery =
500+
"SELECT * FROM myS3.default.alb_logs JOIN myS3.default.http_logs ON alb_logs.id ="
501+
+ " http_logs.id";
502+
result = SQLQueryUtils.extractFullyQualifiedTableNamesWithMetadata(joinQuery);
503+
assertFalse(result.isCreateTableQuery());
504+
assertEquals(2, result.getFullyQualifiedTableNames().size());
505+
assertFullyQualifiedTableName(
506+
"myS3", "default", "alb_logs", result.getFullyQualifiedTableNames().get(0));
507+
assertFullyQualifiedTableName(
508+
"myS3", "default", "http_logs", result.getFullyQualifiedTableNames().get(1));
509+
}
510+
447511
@Getter
448512
protected static class IndexQuery {
449513
private String query;

common/src/main/java/org/opensearch/sql/common/patterns/BrainLogParser.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@ public class BrainLogParser {
3939
"(\\d{4}-\\d{2}-\\d{2})[T"
4040
+ " ]?(\\d{2}:\\d{2}:\\d{2})(\\.\\d{3})?(Z|([+-]\\d{2}:?\\d{2}))?"),
4141
"<*DATETIME*>");
42+
// UUID
43+
DEFAULT_FILTER_PATTERN_VARIABLE_MAP.put(
44+
Pattern.compile(
45+
"\\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\\b"),
46+
"<*UUID*>");
4247
// Hex Decimal, letters followed by digits, float numbers
4348
DEFAULT_FILTER_PATTERN_VARIABLE_MAP.put(
4449
Pattern.compile(

common/src/test/java/org/opensearch/sql/common/patterns/BrainLogParserTest.java

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,22 @@ public void testPreprocess() {
104104
assertEquals(expectedResult, result);
105105
}
106106

107+
@Test
108+
public void testPreprocessWithUUID() {
109+
String logMessage = "127.0.0.1 - 1234 something, user_id:c78ac970-f0c3-4954-8cf8-352a8458d01c";
110+
String logId = "log1";
111+
List<String> expectedResult =
112+
Arrays.asList("<*IP*>", "-", "<*>", "something", "user_id:<*UUID*>", "log1");
113+
List<String> result = parser.preprocess(logMessage, logId);
114+
assertEquals(expectedResult, result);
115+
// Test with different delimiter
116+
logMessage = "127.0.0.1=1234 something, user_id:c78ac970-f0c3-4954-8cf8-352a8458d01c";
117+
logId = "log2";
118+
expectedResult = Arrays.asList("<*IP*>=<*>", "something", "user_id:<*UUID*>", "log2");
119+
result = parser.preprocess(logMessage, logId);
120+
assertEquals(expectedResult, result);
121+
}
122+
107123
@Test
108124
public void testPreprocessWithIllegalInput() {
109125
String logMessage = "127.0.0.1 - 1234 something";

0 commit comments

Comments
 (0)