fix(replace): drop workflow-internal columns and use COL_* constants#154
fix(replace): drop workflow-internal columns and use COL_* constants#154lipikaramaswamy wants to merge 2 commits into
Conversation
- Add COL_ENTITY_EXAMPLES, COL_ENTITIES_FOR_REPLACE, COL_ENTITIES_FOR_REPLACE_JSON to engine/constants.py and use them in llm_replace_workflow.py (closes #148). - Drop these three columns from the LlmReplaceWorkflow result before returning. They are prompt-construction scratch consumed only by the replacement-generator template, never read downstream, and carry pyarrow-backed pandas extension dtypes that broke result.trace_dataframe.to_parquet round-trip (closes #152). - Add regression tests asserting the columns are stripped in both return paths (entity rows present and absent). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
Greptile SummaryThis PR resolves two linked issues by replacing bare string column-name literals with
Confidence Score: 5/5Safe to merge — the changes are mechanical string-to-constant replacements plus a targeted .drop() applied in both return paths, with no behavioural changes to the replacement logic itself. The constant migration is straightforward and the new _INTERNAL_COLUMNS drop is guarded with errors=ignore, so it cannot break existing callers. The Jinja template fix correctly threads COL_ENTITY_EXAMPLES through substitute_placeholders. The only gap is that the entity-rows regression test does not force the drop to fire, but this does not affect production correctness. The entity-rows regression test in tests/engine/test_llm_replace_workflow.py could be strengthened to mix entity and passthrough rows so the drop is genuinely exercised. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[generate_map_only called] --> B[working_df = dataframe.copy]
B --> C[Add COL_ENTITY_EXAMPLES\nCOL_ENTITIES_FOR_REPLACE\nCOL_ENTITIES_FOR_REPLACE_JSON\nto working_df]
C --> D{split_rows on\nCOL_ENTITIES_FOR_REPLACE}
D -->|all rows empty| E[passthrough_rows only\nset COL_REPLACEMENT_MAP = empty]
D -->|entity rows present| F[entity_rows to NDD adapter]
E --> G[merge_and_reorder passthrough_rows]
G --> H[.drop _INTERNAL_COLUMNS\nerrors=ignore]
F --> I[run_result from adapter]
I --> J[filter replacement map]
J --> K[merge_and_reorder output_df + passthrough_rows]
K --> L[.drop _INTERNAL_COLUMNS\nerrors=ignore]
H --> M[LlmReplaceResult returned\nclean of internal columns]
L --> M
Reviews (2): Last reviewed commit: "fix(replace): parameterize _entity_examp..." | Re-trigger Greptile |
…plate Address greptile review feedback on #154: the Jinja template still referenced `{{ _entity_examples }}` as a raw literal even after the COL_ENTITY_EXAMPLES constant was introduced. If the constant ever gets renamed, the DataFrame column would change but the template would silently render an empty/undefined Jinja variable. Route the column name through `substitute_placeholders` the same way `<<ENTITIES_COLUMN>>` already is, so the constant is the single source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
Summary
Both issues (#148 and #152) touch the same three lines in
engine/replace/llm_replace_workflow.py:53-55. Addresses both in one mechanical fix.#148 — string-literal column names violate the
STYLEGUIDE.mdrule that all column names live inengine/constants.py:COL_ENTITY_EXAMPLES,COL_ENTITIES_FOR_REPLACE,COL_ENTITIES_FOR_REPLACE_JSONtoengine/constants.py.llm_replace_workflow.py.#152 —
result.trace_dataframe.to_parquetfails to round-trip:read_parquet, which poisoned the entire trace file's readability.generate_map_onlyreturn paths (entity rows present and entity rows absent)._INTERNAL_COLUMNStuple so the drop and the regression tests stay in sync.Test plan
test_llm_replace_workflow.pycover both return paths.drop()calls makes both regression tests fail; restoring them makes them passpd.read_parquetsucceed; downstream trace columns (_validated_entities,_sensitivity_disposition, etc.) remain accessible as Python dictsmake format-checkpasses locallyCloses #148.
Closes #152.