Skip to content

Commit 0d51539

Browse files
authored
feat: add message trace support for LLM generation (#272)
Add support for capturing full conversation traces during LLM generation, enabling debugging and fine-tuning dataset creation. Changes: - Add `with_trace` field to LLMTextColumnConfig for per-column trace control - Add `debug_override_save_all_column_traces` to RunConfig for global trace - Introduce ChatMessage dataclass for structured message representation - Update ModelFacade.generate() to return full message trace - Rename trace column postfix from `__reasoning_trace` to `__trace` - Add comprehensive traces documentation Traces capture system/user/assistant messages in order, enabling visibility into the full generation conversation including correction retries.
1 parent 4fddb4d commit 0d51539

14 files changed

Lines changed: 372 additions & 125 deletions

File tree

docs/code_reference/run_config.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,19 @@
33
The `run_config` module defines runtime settings that control dataset generation behavior,
44
including early shutdown thresholds, batch sizing, and non-inference worker concurrency.
55

6+
## Usage
7+
8+
```python
9+
import data_designer.config as dd
10+
from data_designer.interface import DataDesigner
11+
12+
data_designer = DataDesigner()
13+
data_designer.set_run_config(dd.RunConfig(
14+
buffer_size=500,
15+
max_conversation_restarts=3,
16+
))
17+
```
18+
19+
## API Reference
20+
621
::: data_designer.config.run_config

docs/concepts/columns.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ LLM-Text columns generate natural language text: product descriptions, customer
3838

3939
Use **Jinja2 templating** in prompts to reference other columns. Data Designer automatically manages dependencies and injects the referenced column values into the prompt.
4040

41-
!!! note "Reasoning Traces"
42-
Models that support extended thinking (chain-of-thought reasoning) can capture their reasoning process in a separate `{column_name}__reasoning_trace` column–useful for understanding *why* the model generated specific content. This column is automatically added to the dataset if the model and service provider parse and return reasoning content.
41+
!!! note "Generation Traces"
42+
LLM columns can optionally capture a full message trace in a separate `{column_name}__trace` column. Enable traces per-column via `with_trace=True` on the column config, or globally for all columns via `RunConfig(debug_override_save_all_column_traces=True)`. The trace includes the ordered message history for the final generation attempt (system/user/assistant), and may include model reasoning fields when the provider exposes them.
4343

4444
### 💻 LLM-Code Columns
4545

@@ -147,6 +147,6 @@ You read this property for introspection but never set it—always computed from
147147

148148
### `side_effect_columns`
149149

150-
Computed property listing columns created implicitly alongside the primary column. Currently, only LLM columns produce side effects (reasoning trace columns like `{name}__reasoning_trace` when models use extended thinking).
150+
Computed property listing columns created implicitly alongside the primary column. Currently, only LLM columns produce side effects (trace columns like `{name}__trace` when `with_trace=True` is set on the column or `debug_override_save_all_column_traces` is enabled globally).
151151

152152
For detailed information on each column type, refer to the [column configuration code reference](../code_reference/column_configs.md).

docs/concepts/traces.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Message Traces
2+
3+
Traces capture the full conversation history during LLM generation, including system prompts, user prompts, model reasoning, and the final response. This visibility is essential for understanding model behavior, debugging generation issues, and iterating on prompts.
4+
5+
## Overview
6+
7+
When generating content with LLM columns, you often need to understand what happened during generation:
8+
9+
- What system prompt was used?
10+
- What did the rendered user prompt look like?
11+
- Did the model provide any reasoning content?
12+
- Did the model retry after failures?
13+
- How did the model arrive at the final answer?
14+
15+
Traces provide this visibility by capturing the ordered message history for each generation, including any multi-turn conversations that occur during retry scenarios.
16+
17+
## Enabling Traces
18+
19+
### Per-Column (Recommended)
20+
21+
Enable `with_trace=True` on specific LLM columns:
22+
23+
```python
24+
import data_designer.config as dd
25+
26+
builder.add_column(
27+
dd.LLMTextColumnConfig(
28+
name="answer",
29+
prompt="Answer: {{ question }}",
30+
model_alias="nvidia-text",
31+
with_trace=True, # Enable trace for this column
32+
)
33+
)
34+
```
35+
36+
### Global Debug Override
37+
38+
Enable traces for ALL LLM columns (useful during development):
39+
40+
```python
41+
import data_designer.config as dd
42+
from data_designer.interface import DataDesigner
43+
44+
data_designer = DataDesigner()
45+
data_designer.set_run_config(
46+
dd.RunConfig(debug_override_save_all_column_traces=True)
47+
)
48+
```
49+
50+
## Trace Column Naming
51+
52+
When enabled, LLM columns produce an additional side-effect column:
53+
54+
- `{column_name}__trace`
55+
56+
For example, if your column is named `"answer"`, the trace column will be `"answer__trace"`.
57+
58+
## Trace Data Structure
59+
60+
Each trace is a `list[dict]` where each dict represents a message in the conversation.
61+
62+
### Message Fields by Role
63+
64+
| Role | Fields | Description |
65+
|------|--------|-------------|
66+
| `system` | `role`, `content` | System prompt setting model behavior |
67+
| `user` | `role`, `content` | User prompt (rendered from template) |
68+
| `assistant` | `role`, `content`, `reasoning_content` | Model response; may include reasoning from extended thinking models |
69+
70+
### Example Trace (Simple Generation)
71+
72+
A basic trace without retries:
73+
74+
```python
75+
[
76+
# System message (if configured)
77+
{
78+
"role": "system",
79+
"content": "You are a helpful assistant that provides clear, concise answers."
80+
},
81+
# User message (the rendered prompt)
82+
{
83+
"role": "user",
84+
"content": "What is the capital of France?"
85+
},
86+
# Final assistant response
87+
{
88+
"role": "assistant",
89+
"content": "The capital of France is Paris.",
90+
"reasoning_content": None # May contain reasoning if model supports it
91+
}
92+
]
93+
```
94+
95+
### Example Trace (With Correction Retry)
96+
97+
When `max_correction_steps > 0` and parsing fails, traces capture the retry conversation:
98+
99+
```python
100+
[
101+
# System message
102+
{
103+
"role": "system",
104+
"content": "Return only valid JSON."
105+
},
106+
# User message
107+
{
108+
"role": "user",
109+
"content": "Generate a person object with name and age."
110+
},
111+
# First attempt (invalid)
112+
{
113+
"role": "assistant",
114+
"content": "Here's a person: {name: 'John', age: 30}" # Invalid JSON
115+
},
116+
# Error feedback
117+
{
118+
"role": "user",
119+
"content": "JSONDecodeError: Expecting property name enclosed in double quotes"
120+
},
121+
# Corrected response
122+
{
123+
"role": "assistant",
124+
"content": "{\"name\": \"John\", \"age\": 30}"
125+
}
126+
]
127+
```
128+
129+
## See Also
130+
131+
- **[Run Config](../code_reference/run_config.md)**: Runtime options including `debug_override_save_all_column_traces`

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ nav:
2020
- Validators: concepts/validators.md
2121
- Processors: concepts/processors.md
2222
- Person Sampling: concepts/person_sampling.md
23+
- Traces: concepts/traces.md
2324
- Tutorials:
2425
- Overview: notebooks/README.md
2526
- The Basics: notebooks/1-the-basics.ipynb

packages/data-designer-config/src/data_designer/config/column_configs.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from data_designer.config.models import ImageContext
1515
from data_designer.config.sampler_params import SamplerParamsT, SamplerType
1616
from data_designer.config.utils.code_lang import CodeLang
17-
from data_designer.config.utils.constants import REASONING_TRACE_COLUMN_POSTFIX
17+
from data_designer.config.utils.constants import TRACE_COLUMN_POSTFIX
1818
from data_designer.config.utils.misc import assert_valid_jinja2_template, extract_keywords_from_jinja2_template
1919
from data_designer.config.validator_params import ValidatorParamsT, ValidatorType
2020

@@ -143,8 +143,8 @@ class LLMTextColumnConfig(SingleColumnConfig):
143143
144144
LLM text columns generate free-form text content using language models via LiteLLM.
145145
Prompts support Jinja2 templating to reference values from other columns, enabling
146-
context-aware generation. The generated text can optionally include reasoning traces
147-
when models support extended thinking.
146+
context-aware generation. The generated text can optionally include message traces
147+
capturing the full conversation history.
148148
149149
Attributes:
150150
prompt: Prompt template for text generation. Supports Jinja2 syntax to
@@ -159,13 +159,18 @@ class LLMTextColumnConfig(SingleColumnConfig):
159159
`LLMStructuredColumnConfig` for structured output, `LLMCodeColumnConfig` for code.
160160
multi_modal_context: Optional list of image contexts for multi-modal generation.
161161
Enables vision-capable models to generate text based on image inputs.
162+
with_trace: If True, creates a `{column_name}__trace` column containing the full
163+
ordered message history (system/user/assistant) for the generation.
164+
Can be overridden globally via `RunConfig.debug_override_save_all_column_traces`.
165+
Defaults to False.
162166
column_type: Discriminator field, always "llm-text" for this configuration type.
163167
"""
164168

165169
prompt: str
166170
model_alias: str
167171
system_prompt: str | None = None
168172
multi_modal_context: list[ImageContext] | None = None
173+
with_trace: bool = False
169174
column_type: Literal["llm-text"] = "llm-text"
170175

171176
@staticmethod
@@ -186,14 +191,15 @@ def required_columns(self) -> list[str]:
186191

187192
@property
188193
def side_effect_columns(self) -> list[str]:
189-
"""Returns the reasoning trace column, which may be generated alongside the main column.
194+
"""Returns the trace column, which may be generated alongside the main column.
190195
191-
Reasoning traces are only returned if the served model parses and returns reasoning content.
196+
Traces are generated when `with_trace=True` on the column config or
197+
when `RunConfig.debug_override_save_all_column_traces=True` globally.
192198
193199
Returns:
194-
List containing the reasoning trace column name.
200+
List containing the trace column name.
195201
"""
196-
return [f"{self.name}{REASONING_TRACE_COLUMN_POSTFIX}"]
202+
return [f"{self.name}{TRACE_COLUMN_POSTFIX}"]
197203

198204
@model_validator(mode="after")
199205
def assert_prompt_valid_jinja(self) -> Self:

packages/data-designer-config/src/data_designer/config/run_config.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ class RunConfig(ConfigBase):
3333
max_conversation_correction_steps: Maximum number of correction rounds permitted within a
3434
single conversation when generation tasks call `ModelFacade.generate(...)`. Must be >= 0.
3535
Default is 0.
36+
debug_override_save_all_column_traces: If True, overrides per-column `with_trace` settings
37+
and includes `__trace` columns for ALL LLM generations, containing the full ordered
38+
message history (system/user/assistant) for the final generation attempt.
39+
Useful for debugging. Default is False.
3640
"""
3741

3842
disable_early_shutdown: bool = False
@@ -42,6 +46,7 @@ class RunConfig(ConfigBase):
4246
non_inference_max_parallel_workers: int = Field(default=4, ge=1)
4347
max_conversation_restarts: int = Field(default=5, ge=0)
4448
max_conversation_correction_steps: int = Field(default=0, ge=0)
49+
debug_override_save_all_column_traces: bool = False
4550

4651
@model_validator(mode="after")
4752
def normalize_shutdown_settings(self) -> Self:

packages/data-designer-config/src/data_designer/config/utils/constants.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ class NordColor(Enum):
166166
MAX_TOP_P = 1.0
167167
MIN_TOP_P = 0.0
168168
MIN_MAX_TOKENS = 1
169-
REASONING_TRACE_COLUMN_POSTFIX = "__reasoning_trace"
169+
TRACE_COLUMN_POSTFIX = "__trace"
170170

171171
AVAILABLE_LOCALES = [
172172
"ar_AA",

packages/data-designer-config/tests/config/test_columns.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ def test_llm_text_column_config():
8585
assert llm_text_column_config.system_prompt == stub_system_prompt
8686
assert llm_text_column_config.column_type == DataDesignerColumnType.LLM_TEXT
8787
assert set(llm_text_column_config.required_columns) == {"some_column", "some_other_column"}
88-
assert llm_text_column_config.side_effect_columns == ["test_llm_text__reasoning_trace"]
88+
assert llm_text_column_config.side_effect_columns == ["test_llm_text__trace"]
8989

9090
# invalid prompt
9191
with pytest.raises(

packages/data-designer-engine/src/data_designer/engine/column_generators/generators/llm_completion.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
LLMStructuredColumnConfig,
1313
LLMTextColumnConfig,
1414
)
15-
from data_designer.config.utils.constants import REASONING_TRACE_COLUMN_POSTFIX
15+
from data_designer.config.utils.constants import TRACE_COLUMN_POSTFIX
1616
from data_designer.engine.column_generators.generators.base import ColumnGeneratorWithModel, GenerationStrategy
1717
from data_designer.engine.column_generators.utils.prompt_renderer import (
1818
PromptType,
@@ -66,7 +66,7 @@ def generate(self, data: dict) -> dict:
6666
for context in self.config.multi_modal_context:
6767
multi_modal_context.extend(context.get_contexts(deserialized_record))
6868

69-
response, reasoning_trace = self.model.generate(
69+
response, trace = self.model.generate(
7070
prompt=self.prompt_renderer.render(
7171
record=deserialized_record,
7272
prompt_template=self.config.prompt,
@@ -87,8 +87,11 @@ def generate(self, data: dict) -> dict:
8787
serialized_output = self.response_recipe.serialize_output(response)
8888
data[self.config.name] = self._process_serialized_output(serialized_output)
8989

90-
if reasoning_trace:
91-
data[self.config.name + REASONING_TRACE_COLUMN_POSTFIX] = reasoning_trace
90+
should_save_trace = (
91+
self.config.with_trace or self.resource_provider.run_config.debug_override_save_all_column_traces
92+
)
93+
if should_save_trace:
94+
data[self.config.name + TRACE_COLUMN_POSTFIX] = [message.to_dict() for message in trace]
9295

9396
return data
9497

0 commit comments

Comments
 (0)