Skip to content

fix: include multi_modal_context columns in required_columns#522

Merged
nabinchha merged 1 commit intomainfrom
nmulepati/fix/520-multi-modal-context-missing-from-required-columns
Apr 9, 2026
Merged

fix: include multi_modal_context columns in required_columns#522
nabinchha merged 1 commit intomainfrom
nmulepati/fix/520-multi-modal-context-missing-from-required-columns

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

📋 Summary

required_columns on LLMTextColumnConfig and ImageColumnConfig only extracted dependencies from Jinja2 prompt/system_prompt templates, ignoring columns referenced via multi_modal_context. This caused the async engine's execution graph to dispatch LLM tasks before their multi-modal seed columns were loaded, resulting in KeyError failures under DATA_DESIGNER_ASYNC_ENGINE=1.

🔗 Related Issue

Fixes #520

🔄 Changes

🧪 Testing

  • make test passes
  • Unit tests added/updated
  • E2E tests added/updated — N/A, covered by unit tests on the config property

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A, no architectural change

Made with Cursor

`LLMTextColumnConfig.required_columns` and `ImageColumnConfig.required_columns`
only extracted dependencies from Jinja2 prompt templates, missing columns
referenced via `multi_modal_context`. This caused the async engine's execution
graph to dispatch LLM tasks before their multi-modal seed columns were loaded,
resulting in KeyError failures under DATA_DESIGNER_ASYNC_ENGINE=1.

Made-with: Cursor
@nabinchha nabinchha requested a review from a team as a code owner April 9, 2026 17:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Docs preview: https://e702c059.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR fixes a missing dependency declaration in LLMTextColumnConfig.required_columns and ImageColumnConfig.required_columns: columns referenced via multi_modal_context were not included, causing the async engine to dispatch LLM tasks before their seed image columns were available, resulting in KeyError failures. The fix is minimal and correct — three targeted unit tests cover inclusion and deduplication for both config types.

Confidence Score: 5/5

This PR is safe to merge — the fix is minimal, targeted, and correct, with no P0/P1 findings.

The change is a straightforward two-property fix: multi_modal_context column names are now included in required_columns for both LLMTextColumnConfig and ImageColumnConfig. All subclasses inherit the fix automatically. Three focused unit tests cover the new behavior. No correctness, data-integrity, or security issues were identified.

No files require special attention.

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/column_configs.py Adds multi_modal_context column names to required_columns in both LLMTextColumnConfig and ImageColumnConfig, with deduplication via set; subclasses (LLMCodeColumnConfig, LLMStructuredColumnConfig, LLMJudgeColumnConfig) inherit the fix automatically.
packages/data-designer-config/tests/config/test_columns.py Adds three new unit tests for multi-modal context inclusion and deduplication in LLMTextColumnConfig and ImageColumnConfig; tests are well-scoped and correct.

Sequence Diagram

sequenceDiagram
    participant DAG as Async Engine / DAG
    participant ColCfg as LLMTextColumnConfig / ImageColumnConfig
    participant SeedCol as Seed Image Column

    Note over ColCfg: required_columns (before fix)
    ColCfg->>DAG: returns only Jinja2 template deps
    DAG->>ColCfg: schedules LLM task (missing seed dep!)
    ColCfg->>SeedCol: KeyError — seed column not yet loaded

    Note over ColCfg: required_columns (after fix)
    ColCfg->>DAG: returns Jinja2 deps + multi_modal_context column names
    DAG->>SeedCol: waits for seed column to be ready
    SeedCol-->>DAG: seed column available
    DAG->>ColCfg: schedules LLM task with correct ordering
Loading

Reviews (1): Last reviewed commit: "fix: include multi_modal_context columns..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Code Review: PR #522 — fix: include multi_modal_context columns in required_columns

Summary

This PR fixes a bug where required_columns on LLMTextColumnConfig and ImageColumnConfig only extracted dependencies from Jinja2 prompt/system_prompt templates, ignoring columns referenced via multi_modal_context. This caused the async engine's DAG to dispatch LLM tasks before their multi-modal seed columns were loaded, resulting in KeyError failures under DATA_DESIGNER_ASYNC_ENGINE=1. The fix is minimal, correct, and well-tested.

Files changed: 2 (+44 / -5)

  • packages/data-designer-config/src/data_designer/config/column_configs.py
  • packages/data-designer-config/tests/config/test_columns.py

Findings

Correctness — No Issues

  1. LLMTextColumnConfig.required_columns (line 186–199): The fix appends ctx.column_name from self.multi_modal_context to the dependency list before deduplicating via set(). This correctly ensures multi-modal context columns are treated as dependencies in the async engine's execution graph. The existing set() dedup at the end handles the case where a context column is also referenced in the prompt template.

  2. ImageColumnConfig.required_columns (line 597–608): Same pattern applied. Previously this method returned directly from extract_keywords_from_jinja2_template without dedup — the fix also wraps the result in list(set(...)), which is a minor improvement to consistency even without multi-modal context.

  3. Inheritance coverage: LLMCodeColumnConfig, LLMStructuredColumnConfig, and LLMJudgeColumnConfig all inherit from LLMTextColumnConfig without overriding required_columns, so they receive the fix automatically. No other column config classes have multi_modal_context as a field — only LLMTextColumnConfig (and its subclasses) and ImageColumnConfig. The fix is complete.

Tests — Adequate

  1. test_llm_text_column_config_required_columns_includes_multi_modal_context: Validates that a column from multi-modal context is included alongside a column from the prompt template. Uses set() comparison, which is appropriate since list(set(...)) doesn't guarantee order.

  2. test_llm_text_column_config_required_columns_deduplicates_multi_modal_and_prompt: Validates deduplication when the same column name appears in both the prompt template and multi-modal context. Asserts exact list equality ["image_col"] — this works because a single-element set always produces a single-element list, but is slightly fragile if the implementation changes. Acceptable for now.

  3. test_image_column_config_required_columns_includes_multi_modal_context: Same pattern for ImageColumnConfig. Good coverage.

Minor Observations (Non-blocking)

  1. No test for subclass inheritance: There's no explicit test that e.g. LLMCodeColumnConfig or LLMJudgeColumnConfig with multi_modal_context also returns the correct required_columns. Since these don't override the method this is low risk, but a paranoia test would catch accidental future overrides.

  2. Set ordering in test (line 144): assert config.required_columns == ["image_col"] relies on the implementation detail that list(set(["image_col", "image_col"])) produces ["image_col"]. This is always true for single-element sets but the assertion style differs from the other two tests which use set() comparison. Minor inconsistency.

Verdict

LGTM. This is a clean, focused bug fix. The change is minimal, correct, and consistent with the existing code style. The three new tests cover the key scenarios (inclusion, deduplication, both config types). No architectural concerns — the fix stays within the config layer and doesn't violate import direction rules. Ship it.

@nabinchha nabinchha merged commit fd477a6 into main Apr 9, 2026
50 checks passed
@nabinchha nabinchha deleted the nmulepati/fix/520-multi-modal-context-missing-from-required-columns branch April 9, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async engine: multi_modal_context columns missing from required_columns breaks dependency graph

2 participants