Skip to content

Commit 8d02ca2

Browse files
kmontemayorclaude
andcommitted
Skip Dataflow ThreadPoolExecutor on empty preprocessing spec
Same root cause as the enumerate fix: ThreadPoolExecutor(max_workers=0) raises ValueError. When both node_ref_to_preprocessing_spec and edge_ref_to_preprocessing_spec are empty, num_dataflow_jobs is 0 and the executor blows up in __init__. Early-return an empty PreprocessedMetadataReferences in that case — no Dataflow work to do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 88c4d35 commit 8d02ca2

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

gigl/src/data_preprocessor/data_preprocessor.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,6 +352,12 @@ def __build_data_reference_str(references: Iterable[DataReference]) -> str:
352352
edge_ref_to_preprocessing_spec
353353
)
354354

355+
if num_dataflow_jobs == 0:
356+
logger.info("No data references to preprocess; skipping Dataflow.")
357+
return PreprocessedMetadataReferences(
358+
node_data=node_refs_and_results, edge_data=edge_refs_and_results
359+
)
360+
355361
with concurrent.futures.ThreadPoolExecutor(
356362
max_workers=num_dataflow_jobs
357363
) as executor:

0 commit comments

Comments
 (0)