fix: include multi_modal_context columns in required_columns by nabinchha · Pull Request #522 · NVIDIA-NeMo/DataDesigner

nabinchha · 2026-04-09T17:12:38Z

📋 Summary

required_columns on LLMTextColumnConfig and ImageColumnConfig only extracted dependencies from Jinja2 prompt/system_prompt templates, ignoring columns referenced via multi_modal_context. This caused the async engine's execution graph to dispatch LLM tasks before their multi-modal seed columns were loaded, resulting in KeyError failures under DATA_DESIGNER_ASYNC_ENGINE=1.

🔗 Related Issue

Fixes #520

🔄 Changes

LLMTextColumnConfig.required_columns now includes multi_modal_context[*].column_name in the returned dependency list
ImageColumnConfig.required_columns — same fix applied
Added 3 unit tests covering multi-modal context inclusion and deduplication for both config types

🧪 Testing

make test passes
Unit tests added/updated
E2E tests added/updated — N/A, covered by unit tests on the config property

✅ Checklist

Follows commit message conventions
Commits are signed off (DCO)
Architecture docs updated — N/A, no architectural change

Made with Cursor

`LLMTextColumnConfig.required_columns` and `ImageColumnConfig.required_columns` only extracted dependencies from Jinja2 prompt templates, missing columns referenced via `multi_modal_context`. This caused the async engine's execution graph to dispatch LLM tasks before their multi-modal seed columns were loaded, resulting in KeyError failures under DATA_DESIGNER_ASYNC_ENGINE=1. Made-with: Cursor

github-actions · 2026-04-09T17:13:39Z

Docs preview: https://e702c059.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

greptile-apps · 2026-04-09T17:14:22Z

Greptile Summary

This PR fixes a missing dependency declaration in LLMTextColumnConfig.required_columns and ImageColumnConfig.required_columns: columns referenced via multi_modal_context were not included, causing the async engine to dispatch LLM tasks before their seed image columns were available, resulting in KeyError failures. The fix is minimal and correct — three targeted unit tests cover inclusion and deduplication for both config types.

Confidence Score: 5/5

This PR is safe to merge — the fix is minimal, targeted, and correct, with no P0/P1 findings.

The change is a straightforward two-property fix: multi_modal_context column names are now included in required_columns for both LLMTextColumnConfig and ImageColumnConfig. All subclasses inherit the fix automatically. Three focused unit tests cover the new behavior. No correctness, data-integrity, or security issues were identified.

No files require special attention.

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename	Overview
packages/data-designer-config/src/data_designer/config/column_configs.py	Adds `multi_modal_context` column names to `required_columns` in both `LLMTextColumnConfig` and `ImageColumnConfig`, with deduplication via `set`; subclasses (`LLMCodeColumnConfig`, `LLMStructuredColumnConfig`, `LLMJudgeColumnConfig`) inherit the fix automatically.
packages/data-designer-config/tests/config/test_columns.py	Adds three new unit tests for multi-modal context inclusion and deduplication in `LLMTextColumnConfig` and `ImageColumnConfig`; tests are well-scoped and correct.

Sequence Diagram

sequenceDiagram
    participant DAG as Async Engine / DAG
    participant ColCfg as LLMTextColumnConfig / ImageColumnConfig
    participant SeedCol as Seed Image Column

    Note over ColCfg: required_columns (before fix)
    ColCfg->>DAG: returns only Jinja2 template deps
    DAG->>ColCfg: schedules LLM task (missing seed dep!)
    ColCfg->>SeedCol: KeyError — seed column not yet loaded

    Note over ColCfg: required_columns (after fix)
    ColCfg->>DAG: returns Jinja2 deps + multi_modal_context column names
    DAG->>SeedCol: waits for seed column to be ready
    SeedCol-->>DAG: seed column available
    DAG->>ColCfg: schedules LLM task with correct ordering

_{Reviews (1): Last reviewed commit: "fix: include multi_modal_context columns..." | Re-trigger Greptile}

github-actions · 2026-04-09T17:15:21Z

Code Review: PR #522 — fix: include multi_modal_context columns in required_columns

Summary

This PR fixes a bug where required_columns on LLMTextColumnConfig and ImageColumnConfig only extracted dependencies from Jinja2 prompt/system_prompt templates, ignoring columns referenced via multi_modal_context. This caused the async engine's DAG to dispatch LLM tasks before their multi-modal seed columns were loaded, resulting in KeyError failures under DATA_DESIGNER_ASYNC_ENGINE=1. The fix is minimal, correct, and well-tested.

Files changed: 2 (+44 / -5)

packages/data-designer-config/src/data_designer/config/column_configs.py
packages/data-designer-config/tests/config/test_columns.py

Findings

Correctness — No Issues

LLMTextColumnConfig.required_columns (line 186–199): The fix appends ctx.column_name from self.multi_modal_context to the dependency list before deduplicating via set(). This correctly ensures multi-modal context columns are treated as dependencies in the async engine's execution graph. The existing set() dedup at the end handles the case where a context column is also referenced in the prompt template.
ImageColumnConfig.required_columns (line 597–608): Same pattern applied. Previously this method returned directly from extract_keywords_from_jinja2_template without dedup — the fix also wraps the result in list(set(...)), which is a minor improvement to consistency even without multi-modal context.
Inheritance coverage: LLMCodeColumnConfig, LLMStructuredColumnConfig, and LLMJudgeColumnConfig all inherit from LLMTextColumnConfig without overriding required_columns, so they receive the fix automatically. No other column config classes have multi_modal_context as a field — only LLMTextColumnConfig (and its subclasses) and ImageColumnConfig. The fix is complete.

Tests — Adequate

test_llm_text_column_config_required_columns_includes_multi_modal_context: Validates that a column from multi-modal context is included alongside a column from the prompt template. Uses set() comparison, which is appropriate since list(set(...)) doesn't guarantee order.
test_llm_text_column_config_required_columns_deduplicates_multi_modal_and_prompt: Validates deduplication when the same column name appears in both the prompt template and multi-modal context. Asserts exact list equality ["image_col"] — this works because a single-element set always produces a single-element list, but is slightly fragile if the implementation changes. Acceptable for now.
test_image_column_config_required_columns_includes_multi_modal_context: Same pattern for ImageColumnConfig. Good coverage.

Minor Observations (Non-blocking)

No test for subclass inheritance: There's no explicit test that e.g. LLMCodeColumnConfig or LLMJudgeColumnConfig with multi_modal_context also returns the correct required_columns. Since these don't override the method this is low risk, but a paranoia test would catch accidental future overrides.
Set ordering in test (line 144): assert config.required_columns == ["image_col"] relies on the implementation detail that list(set(["image_col", "image_col"])) produces ["image_col"]. This is always true for single-element sets but the assertion style differs from the other two tests which use set() comparison. Minor inconsistency.

Verdict

LGTM. This is a clean, focused bug fix. The change is minimal, correct, and consistent with the existing code style. The three new tests cover the key scenarios (inclusion, deduplication, both config types). No architectural concerns — the fix stays within the config layer and doesn't violate import direction rules. Ship it.

nabinchha requested a review from a team as a code owner April 9, 2026 17:12

johnnygreco approved these changes Apr 9, 2026

View reviewed changes

nabinchha merged commit fd477a6 into main Apr 9, 2026
50 checks passed

nabinchha deleted the nmulepati/fix/520-multi-modal-context-missing-from-required-columns branch April 9, 2026 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: include multi_modal_context columns in required_columns#522

fix: include multi_modal_context columns in required_columns#522
nabinchha merged 1 commit intomainfrom
nmulepati/fix/520-multi-modal-context-missing-from-required-columns

nabinchha commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

greptile-apps bot commented Apr 9, 2026

Confidence Score: 5/5

Vulnerabilities

Sequence Diagram

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nabinchha commented Apr 9, 2026

📋 Summary

🔗 Related Issue

🔄 Changes

🧪 Testing

✅ Checklist

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

greptile-apps bot commented Apr 9, 2026

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Uh oh!

github-actions bot commented Apr 9, 2026

Code Review: PR #522 — fix: include multi_modal_context columns in required_columns

Summary

Findings

Correctness — No Issues

Tests — Adequate

Minor Observations (Non-blocking)

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants