Skip to content

Commit dbdec41

Browse files
authored
[GEODE-10463] Fix lexical nondeterminism warning in OQL grammar between ALL_UNICODE and DIGIT rules (#7928)
* GEODE-10463: Fix lexical nondeterminism warning in OQL grammar between ALL_UNICODE and DIGIT rules Refactored ALL_UNICODE rule to exclude Unicode digit ranges that overlap with DIGIT rule, eliminating lexical ambiguity in RegionNameCharacter. The ALL_UNICODE range is now split into 15 non-overlapping segments that exclude Arabic-Indic, Devanagari, Bengali, and other Unicode digit ranges. This ensures deterministic tokenization where Unicode digits are always matched by DIGIT rule while other Unicode characters use ALL_UNICODE. * GEODE-10463: Add clarifying comment for ALL_UNICODE lexer rule Add documentation comment to explain that the ALL_UNICODE character class excludes Unicode digit ranges to prevent lexical nondeterminism with the DIGIT rule in the OQL grammar lexer.
1 parent 2699a03 commit dbdec41

1 file changed

Lines changed: 16 additions & 1 deletion

File tree

  • geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse

geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,8 +133,23 @@ DIGIT : ('\u0030'..'\u0039' |
133133
'\u1040'..'\u1049')
134134
;
135135
136+
// Exclude Unicode digit ranges to prevent lexical nondeterminism with DIGIT rule
136137
protected
137-
ALL_UNICODE : ('\u0061'..'\ufffd')
138+
ALL_UNICODE : ('\u0061'..'\u065f' | // exclude Arabic-Indic digits
139+
'\u066a'..'\u06ef' | // exclude Extended Arabic-Indic digits
140+
'\u06fa'..'\u0965' | // exclude Devanagari digits
141+
'\u0970'..'\u09e5' | // exclude Bengali digits
142+
'\u09f0'..'\u0a65' | // exclude Gurmukhi digits
143+
'\u0a70'..'\u0ae5' | // exclude Gujarati digits
144+
'\u0af0'..'\u0b65' | // exclude Oriya digits
145+
'\u0b70'..'\u0be6' | // exclude Tamil digits (note: Tamil starts at 0be7)
146+
'\u0bf0'..'\u0c65' | // exclude Telugu digits
147+
'\u0c70'..'\u0ce5' | // exclude Kannada digits
148+
'\u0cf0'..'\u0d65' | // exclude Malayalam digits
149+
'\u0d70'..'\u0e4f' | // exclude Thai digits
150+
'\u0e5a'..'\u0ecf' | // exclude Lao digits
151+
'\u0eda'..'\u103f' | // exclude Myanmar digits
152+
'\u104a'..'\ufffd') // rest of Unicode
138153
;
139154
140155
/*

0 commit comments

Comments
 (0)