Skip to content

fix(core): normalize boolean range predicates in Gremlin filters#2991

Open
contrueCT wants to merge 5 commits intoapache:masterfrom
contrueCT:task/fix-2934
Open

fix(core): normalize boolean range predicates in Gremlin filters#2991
contrueCT wants to merge 5 commits intoapache:masterfrom
contrueCT:task/fix-2934

Conversation

@contrueCT
Copy link
Copy Markdown
Contributor

Purpose of the PR

This PR fixes HugeGraph's inconsistent handling of boolean range predicates and unifies the internal ordering semantics as false < true. Previously, filter-step queries such as where(__.has("xxx", P.lt(true))) were pushed down through HugeGraph's condition/query pipeline and incorrectly treated as numeric-style range conditions, while equivalent match() queries could still return results normally. As a result, semantically equivalent traversals behaved differently.

Main Changes

The fix has two parts. First, boolean support is added to backend condition comparison and range-flattening logic so gt/gte/lt/lte can be evaluated consistently. Second, runtime Gremlin boolean range predicates are normalized into equivalent eq/in/empty conditions before query planning, so they are no longer misclassified as ordinary numeric range queries that require a range index. This preserves the intended boolean ordering semantics while making has(), where(), and match() behave consistently for boolean range predicates.

Verifying these changes

Regression coverage was also added for:

  • low-level boolean lt/lte/gt/gte comparison semantics
  • boolean range merge and conflicting-range flattening
  • end-to-end consistency across has(), where(), and match()
  • empty-result edge cases such as lt(false) and gt(true)
  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • mvn -Drat.skip=true -P core-test,memory -pl hugegraph-server/hugegraph-test,hugegraph-struct -am -DfailIfNoTests=false '-Dtest=ConditionTest,ConditionQueryFlattenTest,EdgeCoreTest#testQueryEdgeByBooleanRangePredicate' clean test

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working gremlin TinkerPop gremlin tests Add or improve test cases labels Apr 12, 2026
return ImmutableList.of(cq);
}
return ImmutableList.of(query);
return ImmutableList.of();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Critical: flattenRelations returns empty list for ALL conflicting conditions, not just boolean

This change from ImmutableList.of(query) to ImmutableList.of() affects all data types, not just booleans. Previously, when optimizeRelations returned null (meaning conditions were mutually exclusive, e.g., age > 10 AND age == 9), the original query was still returned — the contradiction was preserved and evaluated downstream. Now it returns an empty list, silently dropping the query.

While returning an empty list for contradictory conditions is arguably more correct (no results possible), this is a behavioral change for numeric/date/string range contradictions too, not just booleans. This should be:

  1. Called out explicitly in the PR description since it changes existing behavior for all types
  2. Covered with a test for numeric contradictory ranges to ensure no regression

Was this intentional or is the fix only meant for the boolean case?

private static Condition impossibleBooleanCondition(Id key) {
return Condition.and(Condition.eq(key, false),
Condition.eq(key, true));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ impossibleBooleanCondition creates an AND node that may not be optimized away cleanly

This creates AND(eq(key, false), eq(key, true)) to represent an impossible condition. This is clever but has a potential issue: this compound Condition.And flows into the query pipeline where ConditionQueryFlatten.flattenRelations extracts only Relation conditions. An And node is NOT a Relation, so it goes into nonRelations and may bypass the merge/optimize logic entirely, potentially reaching the backend as a real filter condition.

Consider whether returning an empty ConditionQuery or a sentinel "always-false" condition would be cleaner. Alternatively, verify that the And(eq(false), eq(true)) is properly handled in the flatten/optimize pipeline without passing contradictory conditions to the backend.

(If Condition.in(key, ImmutableList.of()) — i.e., IN with empty list — is supported by the backend, that might be a cleaner way to express "matches nothing".)


private static int compareBoolean(Object first, Boolean second) {
if (first == null) {
first = false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ compareBoolean: null → false coercion is surprising

When first is null, it's silently treated as false. This is asymmetric with compareNumber and compareDate which don't do null coercion. If a property is absent (null), treating it as false could cause incorrect query results — e.g., a vertex with no boolean property set would match lte(false).

Consider whether null should throw or return a defined comparison result (e.g., null sorts before any non-null value), consistent with how other types handle it.


throw new IllegalArgumentException(String.format(
"Can't compare between %s(%s) and %s(%s)",
first, first.getClass().getSimpleName(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Duplicate compareBoolean in two Condition classes

The exact same compareBoolean method is added in both:

  • hugegraph-server/.../backend/query/Condition.java
  • hugegraph-struct/.../query/Condition.java

This is copy-paste duplication. If the null-handling semantics or error messages need to change later, both copies must be kept in sync. Is there a shared utility class or base class where this could live instead? If duplication is intentional (module isolation), consider at least adding a test for the hugegraph-struct version as well.

Condition.eq(key, true);
case gte:
return value ? Condition.eq(key, true) :
Condition.in(key, ImmutableList.of(false, true));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Minor: gte(false) and lte(true) normalize to IN(false, true) — could simplify to "no filter"

When gte(false) or lte(true) is applied to a boolean property, the code normalizes to Condition.in(key, [false, true]). Since these two values are the entire boolean domain, this is semantically equivalent to "match everything" (no filter). The IN(false, true) condition still gets evaluated at runtime for every candidate element.

This isn't a bug, but for a performance-conscious path, you could detect this case and skip adding the condition entirely. Low priority since boolean properties are typically small cardinality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gremlin TinkerPop gremlin size:L This PR changes 100-499 lines, ignoring generated files. tests Add or improve test cases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Unexpected NumberFormatException for filter-step

2 participants