Skip to content

Commit 8be4ff7

Browse files
authored
feat: add RunConfig jinja rendering engine (#557)
1 parent b220f36 commit 8be4ff7

20 files changed

Lines changed: 586 additions & 21 deletions

File tree

docs/code_reference/run_config.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,14 @@
11
# Run Config
22

33
The `run_config` module defines runtime settings that control dataset generation behavior,
4-
including early shutdown thresholds, batch sizing, and non-inference worker concurrency.
4+
including early shutdown thresholds, batch sizing, non-inference worker concurrency,
5+
and the Jinja rendering engine used by the runtime.
6+
7+
`JinjaRenderingEngine.SECURE` is the default. Set `JinjaRenderingEngine.NATIVE`
8+
when you want Jinja2's broader built-in sandbox behavior instead of Data Designer's
9+
hardened renderer.
10+
11+
For guidance on when to use each mode, see [Security](../concepts/security.md).
512

613
## Usage
714

@@ -13,6 +20,7 @@ data_designer = DataDesigner()
1320
data_designer.set_run_config(dd.RunConfig(
1421
buffer_size=500,
1522
max_conversation_restarts=3,
23+
jinja_rendering_engine=dd.JinjaRenderingEngine.NATIVE,
1624
))
1725
```
1826

docs/concepts/deployment-options.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,8 @@ If you need to provide synthetic data generation as a shared service:
141141
- **Job management**: Queue, monitor, and manage generation jobs centrally
142142
- **Resource sharing**: Shared infrastructure for SDG workloads
143143

144+
When users can submit configs containing Jinja templates to a shared engine, template rendering becomes a remote code execution concern and part of your security boundary. See [Security](security.md) for guidance on when to keep the default `JinjaRenderingEngine.SECURE` mode.
145+
144146
---
145147

146148
## 🧭 Decision Flowchart
@@ -181,3 +183,4 @@ If you need to provide synthetic data generation as a shared service:
181183

182184
- **Library**: Continue with this documentation
183185
- **Microservice**: See the [NeMo Data Designer Microservice documentation](https://docs.nvidia.com/nemo/microservices/latest/design-synthetic-data-from-scratch-or-seeds/index.html){target="_blank"}
186+
- **Security model**: See [Security](security.md)

docs/concepts/security.md

Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Security
2+
3+
Data Designer can run in two very different trust models:
4+
5+
- **Trusted / monolithic**: The same user or team writes the config and runs the engine.
6+
- **Untrusted / shared execution**: One user submits a config and a different process, service, or team executes it.
7+
8+
That distinction matters for features that evaluate user-supplied configuration at runtime, such as Jinja template rendering. In a trusted local workflow, broader template flexibility may be acceptable. In a shared-service deployment, user-supplied Jinja becomes part of the engine's remote code execution surface. A template sandbox escape would execute inside the process running Data Designer.
9+
10+
See [Deployment Options](deployment-options.md) for the architectures where that trust boundary changes.
11+
12+
## Jinja Rendering Modes
13+
14+
Data Designer exposes the renderer choice through `RunConfig`:
15+
16+
```python
17+
import data_designer.config as dd
18+
19+
run_config = dd.RunConfig(
20+
jinja_rendering_engine=dd.JinjaRenderingEngine.SECURE,
21+
)
22+
```
23+
24+
`SECURE` is the default. Opt into `NATIVE` only when you are comfortable treating the config author and the engine operator as the same trust domain.
25+
26+
| Mode | What it uses | Best fit |
27+
|------|---------------|----------|
28+
| `SECURE` | Data Designer's hardened renderer built on top of Jinja2's sandbox | Shared services, microservices, internal platforms, or any deployment where config submission is separated from execution |
29+
| `NATIVE` | Jinja2's built-in sandbox with Data Designer's variable whitelist | Local library usage and other trusted, monolithic workflows that want broader Jinja behavior |
30+
31+
!!! warning "Treat untrusted Jinja as a security boundary"
32+
If many users can submit configs to one engine, or if configs are accepted over an API and executed elsewhere, keep `JinjaRenderingEngine.SECURE`. In that model, Jinja templates are no longer just prompt-formatting helpers. They are untrusted user programs being evaluated by your engine.
33+
34+
## Compatibility Matrix
35+
36+
`NATIVE` is not an unrestricted Python template engine. The matrix below shows what each mode permits, restricts, or adds on top of Jinja2's standard sandbox behavior.
37+
38+
| Capability | `NATIVE` | `SECURE` |
39+
|------|------|----------|
40+
| Jinja2 `ImmutableSandboxedEnvironment` baseline | Yes | Yes |
41+
| References to explicitly provided dataset variables only | Yes | Yes |
42+
| Standard Jinja built-in filter set | Yes | Subset only |
43+
| Data Designer `jsonpath` filter | Yes | Yes |
44+
| `import`, `macro`, `set`, `extends`, `block` support | Yes | No |
45+
| Nested or recursive `for` loops | Yes | No |
46+
| Unbounded AST complexity | Yes | No |
47+
| Template context sanitized to JSON-compatible types before render | No | Yes |
48+
| Empty, oversized, or built-in-like rendered output is permitted | Yes | No |
49+
50+
## What `SECURE` Adds on Top of Standard Jinja Sandbox
51+
52+
The `SECURE` renderer uses a hardened environment implemented in the [renderer source file on GitHub](https://github.com/NVIDIA-NeMo/DataDesigner/blob/v0.5.6/packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py). Compared with the standard Jinja sandbox, it adds several additional controls.
53+
54+
### Record Sanitization Before Render
55+
56+
Before rendering, `SECURE` forces template context through a JSON-compatible serialization step. That means remote templates operate on plain data, not arbitrary Python objects.
57+
58+
```python
59+
# Intended shape for remote template context
60+
record = {
61+
"user": {
62+
"name": "alice",
63+
"roles": ["admin", "reviewer"],
64+
}
65+
}
66+
```
67+
68+
```python
69+
# Not the kind of server-side object SECURE wants to expose directly
70+
record = {
71+
"user": SomePythonObject(...),
72+
}
73+
```
74+
75+
In a remote execution setting, exposing rich Python objects increases the risk of attribute- and method-based sandbox escapes. Jinja's [sandbox security considerations](https://jinja.palletsprojects.com/en/stable/sandbox/) note that the sandbox is not a complete security boundary, and past escapes have included [`str.format` (CVE-2016-10745)](https://nvd.nist.gov/vuln/detail/CVE-2016-10745), [`str.format_map` (CVE-2019-10906)](https://github.com/advisories/GHSA-462w-v97r-4m45), [indirect `str.format` references (CVE-2024-56326)](https://nvd.nist.gov/vuln/detail/CVE-2024-56326), and [`|attr`-based access to `format` (CVE-2025-27516)](https://nvd.nist.gov/vuln/detail/CVE-2025-27516); PortSwigger's [server-side template injection research](https://portswigger.net/research/server-side-template-injection) covers the broader object-traversal pattern.
76+
77+
### Filter Allowlist
78+
79+
`SECURE` keeps only a small approved subset of Jinja filters plus the Data Designer `jsonpath` filter. If a filter is not on that allowlist, the template is rejected. Common excluded filters are:
80+
81+
| Disallowed filters | Why they are excluded in `SECURE` |
82+
| --- | --- |
83+
| `attr`, `xmlattr` | These add dynamic attribute lookup or attribute-name construction, which widens the object-traversal surface in untrusted templates. |
84+
| `map`, `select`, `reject`, `selectattr`, `rejectattr`, `groupby`, `batch`, `slice`, `sum` | These make templates behave more like a data-processing language and can multiply compute across large inputs. |
85+
| `join`, `format`, `indent`, `wordwrap`, `center`, `filesizeformat` | These expand presentation and composition logic inside the template. `SECURE` keeps formatting logic narrow so templates stay close to interpolation. |
86+
| `default`, `d`, `dictsort`, `count`, `wordcount`, `pprint`, `tojson` | These encourage fallback logic, secondary data shaping, or debug-style output inside the template rather than in the engine or config layer. |
87+
| `safe`, `striptags`, `urlize` | These are primarily HTML-oriented output transforms and are unnecessary for server-side dataset rendering. |
88+
89+
Some omitted convenience filters, such as the `e` alias for `escape`, are excluded because `SECURE` uses a small explicit allowlist. The current implementation does not assign each omitted filter its own separate security rationale.
90+
91+
Use `NATIVE` when full Jinja filter compatibility matters more than the additional restrictions used for untrusted template execution.
92+
93+
### Template Features Removed
94+
95+
`SECURE` rejects `import`, `macro`, `set`, `extends`, and `block`.
96+
97+
```jinja
98+
{% macro render_name(name) %}{{ name }}{% endmacro %}
99+
{{ render_name(customer_name) }}
100+
```
101+
102+
```jinja
103+
{% set temp = user_id %}
104+
{{ temp }}
105+
```
106+
107+
Those features are useful in trusted authoring environments, but they also make user templates more expressive and stateful. In a remote execution model, `SECURE` intentionally narrows the language so templates stay closer to data interpolation than to a reusable programming layer.
108+
109+
### Loop Restrictions
110+
111+
`SECURE` rejects recursive loops and nested `for` loops.
112+
113+
```jinja
114+
{% for row in rows %}
115+
{% for item in row %}
116+
{{ item }}
117+
{% endfor %}
118+
{% endfor %}
119+
```
120+
121+
Nested and recursive loops are especially risky in shared execution because they can amplify compute cost and output size in ways that are hard to reason about from the outside.
122+
123+
### AST Complexity Limits
124+
125+
`SECURE` statically analyzes the parsed Jinja AST and rejects templates that exceed the current limits of 600 nodes or depth 10.
126+
127+
```jinja
128+
{% if a %}
129+
{% if b %}
130+
{% if c %}
131+
{{ value }}
132+
{% endif %}
133+
{% endif %}
134+
{% endif %}
135+
```
136+
137+
This is not about any one feature being unsafe by itself. It is about limiting how much control flow and composition untrusted templates can pack into a single server-side render operation, which helps prevent compute bombs in shared execution.
138+
139+
### `self` References Blocked
140+
141+
`SECURE` rejects references to `self`.
142+
143+
```jinja
144+
{{ self }}
145+
```
146+
147+
The point is to avoid exposing template internals back to the submitter. In a remote setting, even accidental access to those internals is unnecessary surface area.
148+
149+
### Rendered Output Guards
150+
151+
`SECURE` validates rendered output after template execution. It rejects empty output, very large output, and strings that look like Python built-in or function representations.
152+
153+
```jinja
154+
{{ "" }}
155+
```
156+
157+
```text
158+
<built-in method ...>
159+
<function ...>
160+
```
161+
162+
These checks matter because not all bad outcomes come from parse-time behavior. Some templates are syntactically valid but still produce output that is clearly broken, oversized, or revealing internal implementation details.
163+
164+
### Sanitized User-Facing Errors
165+
166+
At the engine boundary, `SECURE` normalizes most template failures into a generic invalid-template message.
167+
168+
```text
169+
User provided prompt generation template is invalid.
170+
```
171+
172+
That matters in remote execution because exception details can leak information about server-side implementation, supported objects, or internal execution paths that untrusted users do not need to see.
173+
174+
These controls exist because the standard sandbox is a good baseline, but shared-service deployments need a narrower and more defensive execution model.
175+
176+
## Why This Matters in Multi-User Deployments
177+
178+
The security posture changes as soon as config submission and execution are separated.
179+
180+
Examples:
181+
182+
- A centralized Data Designer service accepts configs from many users.
183+
- An internal platform lets users upload or edit configs that are executed by a background worker.
184+
- A REST API accepts Jinja-containing configs and runs them on server-side infrastructure.
185+
186+
In those environments, templates are no longer just local convenience syntax. They are untrusted input being evaluated by infrastructure the submitter does not control. In practice, that makes Jinja rendering a remote code execution concern, which is why `SECURE` exists and why it remains the default.
187+
188+
If you are deciding between local library usage and a shared service model, read [Deployment Options](deployment-options.md). The library patterns are often still "trusted" deployments. The shared microservice pattern is not.
189+
190+
## When To Use `NATIVE`
191+
192+
Use `NATIVE` when all of the following are true:
193+
194+
- The person submitting the config is also the person running the engine, or they are in the same trusted operational boundary.
195+
- You want broader standard Jinja behavior than `SECURE` allows.
196+
- You understand that this is a flexibility tradeoff, not the safer default.
197+
198+
For example, this is often reasonable in a notebook, local script, or other single-user library workflow.
199+
200+
## Related Reading
201+
202+
- [Deployment Options](deployment-options.md)
203+
- [Run Config Reference](../code_reference/run_config.md)

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ nav:
3131
- Safety & Limits: concepts/mcp/safety-and-limits.md
3232
- Architecture & Performance: concepts/architecture-and-performance.md
3333
- Deployment Options: concepts/deployment-options.md
34+
- Security: concepts/security.md
3435
- Tutorials:
3536
- Overview: notebooks/README.md
3637
- The Basics: notebooks/1-the-basics.ipynb

packages/data-designer-config/src/data_designer/config/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@
5858
ProcessorType,
5959
SchemaTransformProcessorConfig,
6060
)
61-
from data_designer.config.run_config import RunConfig, ThrottleConfig # noqa: F401
61+
from data_designer.config.run_config import JinjaRenderingEngine, RunConfig, ThrottleConfig # noqa: F401
6262
from data_designer.config.sampler_constraints import ( # noqa: F401
6363
ColumnInequalityConstraint,
6464
ConstraintType,
@@ -175,6 +175,7 @@
175175
"ProcessorType": (_MOD_PROCESSORS, "ProcessorType"),
176176
"SchemaTransformProcessorConfig": (_MOD_PROCESSORS, "SchemaTransformProcessorConfig"),
177177
# run_config
178+
"JinjaRenderingEngine": (f"{_MOD_BASE}.run_config", "JinjaRenderingEngine"),
178179
"RunConfig": (f"{_MOD_BASE}.run_config", "RunConfig"),
179180
"ThrottleConfig": (f"{_MOD_BASE}.run_config", "ThrottleConfig"),
180181
# sampler_constraints

packages/data-designer-config/src/data_designer/config/run_config.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,14 @@
99
from typing_extensions import Self
1010

1111
from data_designer.config.base import ConfigBase
12+
from data_designer.config.utils.type_helpers import StrEnum
13+
14+
15+
class JinjaRenderingEngine(StrEnum):
16+
"""Template renderer used by the engine for user-supplied Jinja templates."""
17+
18+
NATIVE = "native"
19+
SECURE = "secure"
1220

1321

1422
class ThrottleConfig(ConfigBase):
@@ -99,6 +107,11 @@ class RunConfig(ConfigBase):
99107
Default is False.
100108
progress_interval: How often (in seconds) the async progress reporter emits a
101109
consolidated log block. Must be > 0. Default is 5.0.
110+
jinja_rendering_engine: Template renderer used for engine-side Jinja evaluation.
111+
``native`` uses Jinja2's built-in sandbox with the standard filter set and
112+
fewer Data Designer-specific restrictions. ``secure`` uses Data Designer's
113+
hardened sandbox with additional AST, filter, and output guards.
114+
Default is ``secure``.
102115
throttle: AIMD throttle tuning parameters. See ``ThrottleConfig`` for details.
103116
"""
104117

@@ -112,6 +125,13 @@ class RunConfig(ConfigBase):
112125
async_trace: bool = False
113126
progress_bar: bool = False
114127
progress_interval: float = Field(default=5.0, gt=0.0)
128+
jinja_rendering_engine: JinjaRenderingEngine = Field(
129+
default=JinjaRenderingEngine.SECURE,
130+
description=(
131+
"Template renderer used for engine-side Jinja evaluation. "
132+
"`native` uses Jinja2's built-in sandbox; `secure` uses Data Designer's hardened sandbox."
133+
),
134+
)
115135
throttle: ThrottleConfig = Field(default_factory=ThrottleConfig)
116136

117137
@model_validator(mode="after")
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
from __future__ import annotations
5+
6+
from data_designer.config.run_config import JinjaRenderingEngine, RunConfig
7+
8+
9+
def test_run_config_defaults_to_secure_jinja_renderer() -> None:
10+
assert JinjaRenderingEngine(RunConfig().jinja_rendering_engine) == JinjaRenderingEngine.SECURE
11+
12+
13+
def test_run_config_accepts_native_renderer() -> None:
14+
run_config = RunConfig(jinja_rendering_engine=JinjaRenderingEngine.NATIVE)
15+
assert JinjaRenderingEngine(run_config.jinja_rendering_engine) == JinjaRenderingEngine.NATIVE

packages/data-designer-engine/src/data_designer/engine/column_generators/generators/llm_completion.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ def prompt_renderer(self) -> RecordBasedPromptRenderer:
5757
"column_type": self.config.column_type,
5858
"model_alias": self.config.model_alias,
5959
},
60+
jinja_rendering_engine=self.resource_provider.run_config.jinja_rendering_engine,
6061
)
6162

6263
def generate(self, data: dict) -> dict:

packages/data-designer-engine/src/data_designer/engine/column_generators/generators/samplers.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ def _create_sampling_dataset_generator(self) -> SamplingDatasetGenerator:
5656
return SamplingDatasetGenerator(
5757
sampler_columns=self.config,
5858
person_generator_loader=(self._person_generator_loader if self._needs_person_generator else None),
59+
jinja_rendering_engine=self.resource_provider.run_config.jinja_rendering_engine,
5960
)
6061

6162
def _log_person_generation_if_needed(self) -> None:

packages/data-designer-engine/src/data_designer/engine/column_generators/utils/prompt_renderer.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from data_designer.config.base import SingleColumnConfig
1010
from data_designer.config.column_types import DataDesignerColumnType
1111
from data_designer.config.models import ModelConfig
12+
from data_designer.config.run_config import JinjaRenderingEngine
1213
from data_designer.config.utils.code_lang import CodeLang
1314
from data_designer.config.utils.misc import extract_keywords_from_jinja2_template
1415
from data_designer.config.utils.type_helpers import StrEnum
@@ -36,9 +37,16 @@ class PromptType(StrEnum):
3637

3738

3839
class RecordBasedPromptRenderer(WithJinja2UserTemplateRendering):
39-
def __init__(self, response_recipe: ResponseRecipe, *, error_message_context: dict[str, str] | None = None):
40+
def __init__(
41+
self,
42+
response_recipe: ResponseRecipe,
43+
*,
44+
error_message_context: dict[str, str] | None = None,
45+
jinja_rendering_engine: JinjaRenderingEngine = JinjaRenderingEngine.SECURE,
46+
):
4047
self.response_recipe = response_recipe
4148
self._error_message_context = error_message_context
49+
self._jinja_rendering_engine = jinja_rendering_engine
4250

4351
def render(self, *, prompt_template: str | None, record: dict, prompt_type: PromptType) -> str | None:
4452
self._prepare_environment(prompt_template=prompt_template, record=record, prompt_type=prompt_type)

0 commit comments

Comments
 (0)