Skip to content

Commit 4409a00

Browse files
authored
Feat/4141 gql compliance (#4477)
* test: [#4141] add ISO GQL three-valued (trinary) logic regression matrix Locks in the existing UNKNOWN/NULL three-valued logic behavior with a 13-case end-to-end regression test covering AND/OR/NOT/XOR null propagation, comparisons against null, IN with null elements, WHERE non-match coercion vs RETURN null preservation, and IS NULL concrete booleans. Confirms ArcadeDB represents UNKNOWN as null-in-boolean-context per ISO/IEC 39075, so no separate UNKNOWN value type is required. Related to #4141 * feat: [#4141] reject deprecated Cypher syntax (PERIODIC COMMIT, {param}) with GQL hints ISO/IEC 39075 (GQL) / Cypher 25 cleanup. Adds a token-stream pre-scan in Cypher25AntlrParser that rejects removed legacy syntax with a clear, actionable message instead of a cryptic ANTLR error: - 'PERIODIC COMMIT' -> use 'CALL { ... } IN TRANSACTIONS [OF n ROWS]' - legacy '{param}' -> use '$param' The scan runs on the default token channel so matches inside string literals/comments are never flagged, distinguishes a legacy '{name}' from a map projection ('var {name}') and brace-block expressions (EXISTS/COUNT/COLLECT/CALL { ... }), and excludes numeric '{n}'/'{n,m}' quantified-path-pattern quantifiers. CypherStatementCache no longer double-wraps CommandParsingException so the hint surfaces at top level. Related to #4141 * test: [#4141] verify GQL grouping-key enforcement; surface validation errors Adds regression coverage for ISO GQL explicit-grouping rules: explicit single/multiple grouping keys work, an aggregation mixed with a grouping key is allowed, and an aggregation mixed with a non-grouping reference is rejected as 'AmbiguousAggregationExpression' (the enforcement already lived in CypherSemanticValidator; this proves it). Also covers rejection of aggregation in WHERE. Cypher25AntlrParser now rethrows CommandParsingException as-is instead of burying it under a generic 'Failed to parse Cypher query' wrapper, so semantic-validation and explicit parse messages surface at top level. Related to #4141 * refactor: [#4141] address PR review - hot-path pre-filter, idempotent tests Code review follow-ups for the GQL deprecated-syntax check: - Add an allocation-free pre-filter to checkDeprecatedSyntax so queries without '{' or 'PERIODIC' skip the second lex pass entirely (keeps GC pressure off the parse hot path). - Move checkDeprecatedSyntax inside the parse try so any unexpected exception also gets the wrapping context; CommandParsingException is still re-thrown as-is so hints surface at top level (single control path). - Document the intentional BRACE_BLOCK_KEYWORDS false-negative and the PARAM_NAME numeric-exclusion rationale. - Make the three Issue4141 test setUp methods idempotent (drop a leftover db from an interrupted run) to avoid flaky re-runs on a dirty workspace. - Add brace-block false-positive tests (EXISTS { }, COUNT { }) and a note on the sorting helper in the trinary-logic test. Related to #4141 * refactor: [#4141] single-lex deprecated-syntax scan; close map-projection false positive Second review round on the deprecated-syntax check: - Reuse the main parse's token stream instead of re-lexing, and walk it with a 3-token sliding window (no intermediate ArrayList) - removes the double-lex and the extra list allocation on the parse hot path. - Replace the name-token guard with a value-operator allowlist: a legacy '{name}' is only flagged when the '{' follows an operator/punctuation (=, comparison, arithmetic, ',', '(', '['), where a map projection is impossible. This closes the false positive on 'type{name}' where the projected-on variable is a keyword token (NAME/TYPE are legal variable names); the cost is only a harmless false negative after a keyword. - Trim what-comments per CLAUDE.md, keeping the non-obvious WHYs. - Align test import ordering with the project convention; add tests for the keyword-named-variable projection and for 'PERIODIC COMMIT' inside a string literal. Related to #4141
1 parent a9734ca commit 4409a00

5 files changed

Lines changed: 591 additions & 4 deletions

File tree

engine/src/main/java/com/arcadedb/query/opencypher/parser/Cypher25AntlrParser.java

Lines changed: 77 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@
2626
import org.antlr.v4.runtime.CommonTokenStream;
2727
import org.antlr.v4.runtime.Token;
2828

29+
import java.util.Set;
30+
import java.util.regex.Pattern;
31+
2932
/**
3033
* Cypher parser using ANTLR4 with the official Cypher 2.5 grammar.
3134
* This replaces the regex-based parser with a proper grammar-based implementation.
@@ -45,13 +48,14 @@ public CypherStatement parse(final String query) {
4548
throw new CommandParsingException("Query cannot be empty");
4649

4750
try {
48-
// Create lexer
4951
final Cypher25Lexer lexer = new Cypher25Lexer(CharStreams.fromString(query));
50-
51-
// Create token stream
5252
final CommonTokenStream tokens = new CommonTokenStream(lexer);
5353

54-
// Create parser
54+
// Reject deprecated/legacy Cypher syntax with an actionable hint before the generic ANTLR parse,
55+
// which would otherwise produce a cryptic "Unexpected input" error (issue #4141). Reuses this same
56+
// token stream, so no second lex. The catch below re-throws CommandParsingException as-is.
57+
checkDeprecatedSyntax(query, tokens);
58+
5559
final Cypher25Parser parser = new Cypher25Parser(tokens);
5660

5761
// Custom error handling
@@ -90,8 +94,77 @@ public CypherStatement parse(final String query) {
9094

9195
return statement;
9296

97+
} catch (final CommandParsingException e) {
98+
// semantic-validation and explicit parse errors already carry a clear, actionable message
99+
// (e.g. AmbiguousAggregationExpression, UndefinedVariable, "Unexpected input ...") - surface it
100+
throw e;
93101
} catch (final Exception e) {
94102
throw new CommandParsingException("Failed to parse Cypher query: " + query, e);
95103
}
96104
}
105+
106+
// A legacy named parameter name (the required leading letter/underscore also excludes purely numeric
107+
// braces, so quantified-path-pattern quantifiers '{n}'/'{n,m}' are never mistaken for one).
108+
private static final Pattern PARAM_NAME = Pattern.compile("[A-Za-z_][A-Za-z0-9_]*");
109+
110+
// Operator/punctuation tokens after which a '{...}' begins a value, never a map projection (which must
111+
// attach to a variable). Restricting the legacy-'{name}' check to these positions guarantees we never
112+
// flag a real map projection - including 'type{name}', where the projected-on variable is a keyword
113+
// token (NAME, TYPE, ... are legal variable names) and so is indistinguishable from a clause keyword.
114+
// The cost is only a false negative (a legacy param after a keyword like IN/RETURN yields the generic
115+
// parser error instead of the hint), which is harmless.
116+
private static final Set<Integer> VALUE_PREFIX_TOKENS = Set.of(
117+
Cypher25Lexer.EQ, Cypher25Lexer.NEQ, Cypher25Lexer.LT, Cypher25Lexer.GT, Cypher25Lexer.LE, Cypher25Lexer.GE,
118+
Cypher25Lexer.PLUS, Cypher25Lexer.MINUS, Cypher25Lexer.TIMES, Cypher25Lexer.DIVIDE, Cypher25Lexer.POW,
119+
Cypher25Lexer.COMMA, Cypher25Lexer.LPAREN, Cypher25Lexer.LBRACKET);
120+
121+
/**
122+
* Rejects deprecated syntax removed under ISO/IEC 39075 (GQL) / Cypher 25 with a hint at the supported
123+
* replacement: {@code PERIODIC COMMIT} (use {@code CALL { ... } IN TRANSACTIONS}) and legacy
124+
* {@code {param}} (use {@code $param}). Scans {@code tokens} on the default channel, so occurrences
125+
* inside string literals or comments are never matched.
126+
*/
127+
private static void checkDeprecatedSyntax(final String query, final CommonTokenStream tokens) {
128+
// Allocation-free pre-filter: neither deprecated form is possible without a '{' or the word PERIODIC,
129+
// so the token walk is skipped for the vast majority of queries.
130+
if (query.indexOf('{') < 0 && !containsIgnoreCase(query, "PERIODIC"))
131+
return;
132+
133+
tokens.fill(); // the parser reuses this same fully-buffered stream
134+
135+
// Sliding window of the three most recent default-channel tokens: t1 is the immediate predecessor of
136+
// the current token, t2 the one before it, t3 the one before that. No intermediate list is allocated.
137+
Token t3 = null, t2 = null, t1 = null;
138+
for (final Token tok : tokens.getTokens()) {
139+
if (tok.getChannel() != Token.DEFAULT_CHANNEL || tok.getType() == Token.EOF)
140+
continue;
141+
142+
if (t1 != null && "PERIODIC".equalsIgnoreCase(t1.getText()) && "COMMIT".equalsIgnoreCase(tok.getText()))
143+
throw new CommandParsingException(
144+
"Deprecated syntax: 'PERIODIC COMMIT' is no longer supported. Use 'CALL { ... } IN TRANSACTIONS [OF n ROWS]' instead");
145+
146+
// legacy '{name}': t2='{', t1=name, tok='}', preceded (t3) by a value operator/punctuation
147+
if (tok.getType() == Cypher25Lexer.RCURLY
148+
&& t2 != null && t2.getType() == Cypher25Lexer.LCURLY
149+
&& t1 != null && PARAM_NAME.matcher(t1.getText()).matches()
150+
&& t3 != null && VALUE_PREFIX_TOKENS.contains(t3.getType())) {
151+
final String name = t1.getText();
152+
throw new CommandParsingException(
153+
"Deprecated syntax: legacy parameter '{" + name + "}' is no longer supported. Use '$" + name + "' instead");
154+
}
155+
156+
t3 = t2;
157+
t2 = t1;
158+
t1 = tok;
159+
}
160+
}
161+
162+
// Allocation-free case-insensitive substring search (avoids String.toUpperCase() on the parse hot path).
163+
private static boolean containsIgnoreCase(final String s, final String sub) {
164+
final int max = s.length() - sub.length();
165+
for (int i = 0; i <= max; i++)
166+
if (s.regionMatches(true, i, sub, 0, sub.length()))
167+
return true;
168+
return false;
169+
}
97170
}

engine/src/main/java/com/arcadedb/query/opencypher/query/CypherStatementCache.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,9 @@ public CypherStatement get(final String query) {
7979
private CypherStatement parse(final String query) throws CommandParsingException {
8080
try {
8181
return parser.parse(query);
82+
} catch (final CommandParsingException e) {
83+
// already a clear, actionable parsing error (e.g. deprecated-syntax hints) - keep it at top level
84+
throw e;
8285
} catch (final Exception e) {
8386
throw new CommandParsingException("Error parsing OpenCypher query: " + query, e);
8487
}
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
/*
2+
* Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com)
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*
16+
* SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com)
17+
* SPDX-License-Identifier: Apache-2.0
18+
*/
19+
package com.arcadedb.query.opencypher;
20+
21+
import com.arcadedb.database.Database;
22+
import com.arcadedb.database.DatabaseFactory;
23+
import com.arcadedb.exception.CommandParsingException;
24+
import com.arcadedb.query.sql.executor.ResultSet;
25+
26+
import org.junit.jupiter.api.AfterEach;
27+
import org.junit.jupiter.api.BeforeEach;
28+
import org.junit.jupiter.api.Test;
29+
30+
import java.util.Map;
31+
32+
import static org.assertj.core.api.Assertions.assertThat;
33+
import static org.assertj.core.api.Assertions.assertThatThrownBy;
34+
35+
/**
36+
* Issue #4141 (ISO GQL / Cypher 25 cleanup): deprecated/legacy syntax must be rejected with a clear,
37+
* actionable error pointing at the supported replacement.
38+
* <ul>
39+
* <li>{@code PERIODIC COMMIT} -&gt; {@code CALL { ... } IN TRANSACTIONS}</li>
40+
* <li>legacy {@code {param}} -&gt; {@code $param}</li>
41+
* </ul>
42+
* Also guards against false positives: map projections ({@code n{name}}) and brace-block expressions
43+
* ({@code COUNT { ... }}) must still parse, and the modern {@code $param} form must keep working.
44+
*/
45+
public class Issue4141DeprecatedSyntaxTest {
46+
private Database database;
47+
48+
@BeforeEach
49+
void setUp() {
50+
final DatabaseFactory factory = new DatabaseFactory("./target/databases/testIssue4141DeprecatedSyntax");
51+
if (factory.exists())
52+
factory.open().drop(); // defend against a leftover db from a previously interrupted run
53+
database = factory.create();
54+
database.getSchema().createVertexType("Person");
55+
database.transaction(() -> {
56+
database.command("opencypher", "CREATE (p:Person {name: 'Alice', age: 30})");
57+
database.command("opencypher", "CREATE (p:Person {name: 'Bob', age: 25})");
58+
});
59+
}
60+
61+
@AfterEach
62+
void tearDown() {
63+
if (database != null) {
64+
database.drop();
65+
database = null;
66+
}
67+
}
68+
69+
// ---- Deprecated syntax is rejected with an actionable message -------------------------------
70+
71+
@Test
72+
void periodicCommitIsRejectedWithHint() {
73+
assertThatThrownBy(() ->
74+
database.command("opencypher", "USING PERIODIC COMMIT 500 MATCH (p:Person) RETURN p"))
75+
.isInstanceOf(CommandParsingException.class)
76+
.hasMessageContaining("PERIODIC COMMIT")
77+
.hasMessageContaining("IN TRANSACTIONS");
78+
}
79+
80+
@Test
81+
void legacyParameterIsRejectedWithHint() {
82+
assertThatThrownBy(() ->
83+
database.command("opencypher", "MATCH (p:Person) WHERE p.name = {name} RETURN p"))
84+
.isInstanceOf(CommandParsingException.class)
85+
.hasMessageContaining("{name}")
86+
.hasMessageContaining("$name");
87+
}
88+
89+
@Test
90+
void legacyParameterIsRejectedEvenWhenSuppliedAsArg() {
91+
assertThatThrownBy(() ->
92+
database.command("opencypher", "MATCH (p:Person) WHERE p.name = {name} RETURN p", Map.of("name", "Alice")))
93+
.isInstanceOf(CommandParsingException.class)
94+
.hasMessageContaining("$name");
95+
}
96+
97+
// ---- Supported syntax keeps working (no false positives) ------------------------------------
98+
99+
@Test
100+
void modernDollarParameterStillWorks() {
101+
try (final ResultSet rs = database.query("opencypher", "MATCH (p:Person) WHERE p.name = $name RETURN p.age AS age",
102+
Map.of("name", "Alice"))) {
103+
assertThat(rs.hasNext()).isTrue();
104+
assertThat((Integer) rs.next().getProperty("age")).isEqualTo(30);
105+
}
106+
}
107+
108+
@Test
109+
void mapProjectionIsNotMistakenForLegacyParameter() {
110+
// 'p{name}' has the same token shape as a legacy parameter ('{' name '}') but is a map projection
111+
// (variable LCURLY variable RCURLY). It must parse, projecting the in-scope 'name' variable.
112+
try (final ResultSet rs = database.query("opencypher",
113+
"MATCH (p:Person) WHERE p.name = 'Alice' WITH p, p.name AS name RETURN p{name} AS proj")) {
114+
assertThat(rs.hasNext()).isTrue();
115+
final Map<String, Object> proj = rs.next().getProperty("proj");
116+
assertThat(proj).containsEntry("name", "Alice");
117+
}
118+
}
119+
120+
@Test
121+
void mapLiteralIsNotMistakenForLegacyParameter() {
122+
try (final ResultSet rs = database.query("opencypher", "RETURN {name: 'Alice'} AS m")) {
123+
assertThat(rs.hasNext()).isTrue();
124+
final Map<String, Object> m = rs.next().getProperty("m");
125+
assertThat(m).containsEntry("name", "Alice");
126+
}
127+
}
128+
129+
@Test
130+
void existsBraceBlockIsNotMistakenForLegacyParameter() {
131+
// 'EXISTS { ... }' is a brace-block expression; the following '{' must not trigger the legacy-param check.
132+
try (final ResultSet rs = database.query("opencypher",
133+
"MATCH (p:Person) WHERE EXISTS { (p) } RETURN p.name AS name ORDER BY name")) {
134+
assertThat(rs.hasNext()).isTrue();
135+
}
136+
}
137+
138+
@Test
139+
void countBraceBlockIsNotMistakenForLegacyParameter() {
140+
// 'COUNT { ... }' is a brace-block expression; the following '{' must not trigger the legacy-param check.
141+
try (final ResultSet rs = database.query("opencypher",
142+
"MATCH (p:Person) RETURN p.name AS name, COUNT { (p) } AS c ORDER BY name")) {
143+
assertThat(rs.hasNext()).isTrue();
144+
assertThat(((Number) rs.next().getProperty("c")).longValue()).isEqualTo(1L);
145+
}
146+
}
147+
148+
@Test
149+
void mapProjectionOnKeywordNamedVariableIsNotMistakenForLegacyParameter() {
150+
// 'type' is a keyword token but a legal variable name, so 'type{name}' is a map projection whose '{'
151+
// is preceded by the variable (not a value operator) and must not be flagged as a legacy parameter.
152+
try (final ResultSet rs = database.query("opencypher",
153+
"MATCH (type:Person) WHERE type.name = 'Alice' WITH type, type.name AS name RETURN type{name} AS proj")) {
154+
assertThat(rs.hasNext()).isTrue();
155+
final Map<String, Object> proj = rs.next().getProperty("proj");
156+
assertThat(proj).containsEntry("name", "Alice");
157+
}
158+
}
159+
160+
@Test
161+
void periodicCommitInsideStringLiteralIsNotFlagged() {
162+
// The token scan runs on the default channel, so 'PERIODIC COMMIT' inside a string literal (a single
163+
// string token) is not two keyword tokens and must not be flagged.
164+
try (final ResultSet rs = database.query("opencypher", "RETURN 'PERIODIC COMMIT' AS s")) {
165+
assertThat(rs.hasNext()).isTrue();
166+
assertThat((String) rs.next().getProperty("s")).isEqualTo("PERIODIC COMMIT");
167+
}
168+
}
169+
}

0 commit comments

Comments
 (0)