The WorkflowGraph class manages a DAG (Directed Acyclic Graph) of nodes connected by edges.
Key operations:
add_node()/remove_node()- node lifecycleadd_edge()with cycle detection via Kahn's algorithmmark_dirty()- cascading dirty flag propagation to all descendantstopological_order()- returns a valid execution order
The Engine class executes nodes against a graph with incremental caching.
Execution flow:
- Get topological order of the graph
- For each node up to the target: check cache
- If dirty or not cached: dispatch to handler based on node_type
- Store result in cache
- Return cached result for target node
Cache invalidation:
- When
mark_dirty()is called on a node, all downstream nodes are also marked dirty - On execution, only dirty nodes are recomputed
- Clean upstream nodes use cached results
40+ node types in 7 categories:
- Source: CSV, Parquet, JSON, XLSX readers, clipboard paste
- Transform: Filter, select, add column, rename, drop, cast, fill null, string ops, sort, sample
- Aggregate: Group by, rolling window, pivot table
- Join: Inner, left, right, full, cross, anti
- Chart: Bar, line, scatter, histogram, box, heatmap
- Output: Table view, export CSV/Parquet/JSON/XLSX