fix(core): normalize boolean range predicates in Gremlin filters#2991
fix(core): normalize boolean range predicates in Gremlin filters#2991contrueCT wants to merge 5 commits intoapache:masterfrom
Conversation
| return ImmutableList.of(cq); | ||
| } | ||
| return ImmutableList.of(query); | ||
| return ImmutableList.of(); |
There was a problem hiding this comment.
flattenRelations returns empty list for ALL conflicting conditions, not just boolean
This change from ImmutableList.of(query) to ImmutableList.of() affects all data types, not just booleans. Previously, when optimizeRelations returned null (meaning conditions were mutually exclusive, e.g., age > 10 AND age == 9), the original query was still returned — the contradiction was preserved and evaluated downstream. Now it returns an empty list, silently dropping the query.
While returning an empty list for contradictory conditions is arguably more correct (no results possible), this is a behavioral change for numeric/date/string range contradictions too, not just booleans. This should be:
- Called out explicitly in the PR description since it changes existing behavior for all types
- Covered with a test for numeric contradictory ranges to ensure no regression
Was this intentional or is the fix only meant for the boolean case?
| private static Condition impossibleBooleanCondition(Id key) { | ||
| return Condition.and(Condition.eq(key, false), | ||
| Condition.eq(key, true)); | ||
| } |
There was a problem hiding this comment.
impossibleBooleanCondition creates an AND node that may not be optimized away cleanly
This creates AND(eq(key, false), eq(key, true)) to represent an impossible condition. This is clever but has a potential issue: this compound Condition.And flows into the query pipeline where ConditionQueryFlatten.flattenRelations extracts only Relation conditions. An And node is NOT a Relation, so it goes into nonRelations and may bypass the merge/optimize logic entirely, potentially reaching the backend as a real filter condition.
Consider whether returning an empty ConditionQuery or a sentinel "always-false" condition would be cleaner. Alternatively, verify that the And(eq(false), eq(true)) is properly handled in the flatten/optimize pipeline without passing contradictory conditions to the backend.
(If Condition.in(key, ImmutableList.of()) — i.e., IN with empty list — is supported by the backend, that might be a cleaner way to express "matches nothing".)
|
|
||
| private static int compareBoolean(Object first, Boolean second) { | ||
| if (first == null) { | ||
| first = false; |
There was a problem hiding this comment.
compareBoolean: null → false coercion is surprising
When first is null, it's silently treated as false. This is asymmetric with compareNumber and compareDate which don't do null coercion. If a property is absent (null), treating it as false could cause incorrect query results — e.g., a vertex with no boolean property set would match lte(false).
Consider whether null should throw or return a defined comparison result (e.g., null sorts before any non-null value), consistent with how other types handle it.
|
|
||
| throw new IllegalArgumentException(String.format( | ||
| "Can't compare between %s(%s) and %s(%s)", | ||
| first, first.getClass().getSimpleName(), |
There was a problem hiding this comment.
compareBoolean in two Condition classes
The exact same compareBoolean method is added in both:
hugegraph-server/.../backend/query/Condition.javahugegraph-struct/.../query/Condition.java
This is copy-paste duplication. If the null-handling semantics or error messages need to change later, both copies must be kept in sync. Is there a shared utility class or base class where this could live instead? If duplication is intentional (module isolation), consider at least adding a test for the hugegraph-struct version as well.
| Condition.eq(key, true); | ||
| case gte: | ||
| return value ? Condition.eq(key, true) : | ||
| Condition.in(key, ImmutableList.of(false, true)); |
There was a problem hiding this comment.
🧹 Minor: gte(false) and lte(true) normalize to IN(false, true) — could simplify to "no filter"
When gte(false) or lte(true) is applied to a boolean property, the code normalizes to Condition.in(key, [false, true]). Since these two values are the entire boolean domain, this is semantically equivalent to "match everything" (no filter). The IN(false, true) condition still gets evaluated at runtime for every candidate element.
This isn't a bug, but for a performance-conscious path, you could detect this case and skip adding the condition entirely. Low priority since boolean properties are typically small cardinality.
Purpose of the PR
This PR fixes HugeGraph's inconsistent handling of boolean range predicates and unifies the internal ordering semantics as false < true. Previously, filter-step queries such as where(__.has("xxx", P.lt(true))) were pushed down through HugeGraph's condition/query pipeline and incorrectly treated as numeric-style range conditions, while equivalent match() queries could still return results normally. As a result, semantically equivalent traversals behaved differently.
Main Changes
The fix has two parts. First, boolean support is added to backend condition comparison and range-flattening logic so gt/gte/lt/lte can be evaluated consistently. Second, runtime Gremlin boolean range predicates are normalized into equivalent eq/in/empty conditions before query planning, so they are no longer misclassified as ordinary numeric range queries that require a range index. This preserves the intended boolean ordering semantics while making has(), where(), and match() behave consistently for boolean range predicates.
Verifying these changes
Regression coverage was also added for:
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need