Skip to content

Fix multi-hop + WHERE backward prune bugs in cuDF executor #872

@lmeyerov

Description

@lmeyerov

Multi-hop + WHERE Backward Prune Bugs

Summary

The cuDF same-path executor has architectural limitations with multi-hop edges combined with WHERE clauses. These were discovered during feature composition testing for PRs #846 and #852.

Bugs Identified

1. Backward prune doesn't trace through intermediate edges

Location: graphistry/compute/gfql/cudf_executor.py:312-393 (_backward_prune)

Issue: When backtracking from end nodes through multi-hop edges, the algorithm filters edges by dst in allowed_end_nodes. For multi-hop, this filters to edges connecting to the END of the path, but the START node connects to INTERMEDIATE nodes, not the end directly.

Result: allowed_nodes[0] = set() (empty) because allowed_src & allowed_tags['start'] has no overlap.

Example:

Graph: a -> b -> c -> d  (3 hops)
Chain: n(start) -[min_hops=2, max_hops=3]-> n(end)
WHERE: start.value < end.value

Expected: start={'a'}, end={'c', 'd'}
Actual: start={}, end={'c', 'd'}

2. WHERE skipped for multi-hop edges

Location: graphistry/compute/gfql/cudf_executor.py:361-369

Issue: _is_single_hop() check gates WHERE clause filtering. Multi-hop edges skip this filtering entirely.

Result: WHERE comparisons connecting start/end of multi-hop edges are never applied during backward prune.

3. Non-adjacent alias WHERE not applied

Location: graphistry/compute/gfql/cudf_executor.py:404-408 (_filter_edges_by_clauses)

Issue: Only applies WHERE for adjacent aliases. WHERE between aliases 2+ edges apart is never enforced.

Example:

Chain: n(a) -> e -> n(b) -> e -> n(c)
WHERE: a.id == c.id  (non-adjacent)

Result: WHERE never applied in backward prune

Tests Added

Added 8 tests in tests/gfql/ref/test_cudf_executor_inputs.py:

  • 6 marked xfail(strict=True) documenting the bugs
  • 2 passing (output_slicing_with_where, label_seeds_with_output_min_hops)

Proposed Fix

The backward prune needs to be redesigned for multi-hop edges:

  1. Trace full paths: Instead of filtering edges by destination, trace backward through ALL edges in the multi-hop traversal to find which start nodes connect to which end nodes.

  2. Apply WHERE at end: After path tracing, apply WHERE clauses to filter valid start/end pairs.

  3. Handle non-adjacent aliases: For WHERE between non-adjacent aliases, need to track which intermediate nodes connect them.

Related

Labels

bug, gfql, same-path-executor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions