Multi-hop + WHERE Backward Prune Bugs
Summary
The cuDF same-path executor has architectural limitations with multi-hop edges combined with WHERE clauses. These were discovered during feature composition testing for PRs #846 and #852.
Bugs Identified
1. Backward prune doesn't trace through intermediate edges
Location: graphistry/compute/gfql/cudf_executor.py:312-393 (_backward_prune)
Issue: When backtracking from end nodes through multi-hop edges, the algorithm filters edges by dst in allowed_end_nodes. For multi-hop, this filters to edges connecting to the END of the path, but the START node connects to INTERMEDIATE nodes, not the end directly.
Result: allowed_nodes[0] = set() (empty) because allowed_src & allowed_tags['start'] has no overlap.
Example:
Graph: a -> b -> c -> d (3 hops)
Chain: n(start) -[min_hops=2, max_hops=3]-> n(end)
WHERE: start.value < end.value
Expected: start={'a'}, end={'c', 'd'}
Actual: start={}, end={'c', 'd'}
2. WHERE skipped for multi-hop edges
Location: graphistry/compute/gfql/cudf_executor.py:361-369
Issue: _is_single_hop() check gates WHERE clause filtering. Multi-hop edges skip this filtering entirely.
Result: WHERE comparisons connecting start/end of multi-hop edges are never applied during backward prune.
3. Non-adjacent alias WHERE not applied
Location: graphistry/compute/gfql/cudf_executor.py:404-408 (_filter_edges_by_clauses)
Issue: Only applies WHERE for adjacent aliases. WHERE between aliases 2+ edges apart is never enforced.
Example:
Chain: n(a) -> e -> n(b) -> e -> n(c)
WHERE: a.id == c.id (non-adjacent)
Result: WHERE never applied in backward prune
Tests Added
Added 8 tests in tests/gfql/ref/test_cudf_executor_inputs.py:
- 6 marked
xfail(strict=True) documenting the bugs
- 2 passing (output_slicing_with_where, label_seeds_with_output_min_hops)
Proposed Fix
The backward prune needs to be redesigned for multi-hop edges:
-
Trace full paths: Instead of filtering edges by destination, trace backward through ALL edges in the multi-hop traversal to find which start nodes connect to which end nodes.
-
Apply WHERE at end: After path tracing, apply WHERE clauses to filter valid start/end pairs.
-
Handle non-adjacent aliases: For WHERE between non-adjacent aliases, need to track which intermediate nodes connect them.
Related
Labels
bug, gfql, same-path-executor
Multi-hop + WHERE Backward Prune Bugs
Summary
The cuDF same-path executor has architectural limitations with multi-hop edges combined with WHERE clauses. These were discovered during feature composition testing for PRs #846 and #852.
Bugs Identified
1. Backward prune doesn't trace through intermediate edges
Location:
graphistry/compute/gfql/cudf_executor.py:312-393(_backward_prune)Issue: When backtracking from end nodes through multi-hop edges, the algorithm filters edges by
dst in allowed_end_nodes. For multi-hop, this filters to edges connecting to the END of the path, but the START node connects to INTERMEDIATE nodes, not the end directly.Result:
allowed_nodes[0] = set()(empty) becauseallowed_src & allowed_tags['start']has no overlap.Example:
2. WHERE skipped for multi-hop edges
Location:
graphistry/compute/gfql/cudf_executor.py:361-369Issue:
_is_single_hop()check gates WHERE clause filtering. Multi-hop edges skip this filtering entirely.Result: WHERE comparisons connecting start/end of multi-hop edges are never applied during backward prune.
3. Non-adjacent alias WHERE not applied
Location:
graphistry/compute/gfql/cudf_executor.py:404-408(_filter_edges_by_clauses)Issue: Only applies WHERE for adjacent aliases. WHERE between aliases 2+ edges apart is never enforced.
Example:
Tests Added
Added 8 tests in
tests/gfql/ref/test_cudf_executor_inputs.py:xfail(strict=True)documenting the bugsProposed Fix
The backward prune needs to be redesigned for multi-hop edges:
Trace full paths: Instead of filtering edges by destination, trace backward through ALL edges in the multi-hop traversal to find which start nodes connect to which end nodes.
Apply WHERE at end: After path tracing, apply WHERE clauses to filter valid start/end pairs.
Handle non-adjacent aliases: For WHERE between non-adjacent aliases, need to track which intermediate nodes connect them.
Related
Labels
bug, gfql, same-path-executor