NVIDIA-NeMo
diff --git a/‎docs/code_reference/run_config.md‎
Lines changed: 9 additions & 1 deletion b/‎docs/code_reference/run_config.md‎
Lines changed: 9 additions & 1 deletion
diff --git a/‎docs/concepts/deployment-options.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/concepts/deployment-options.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/concepts/security.md‎
Lines changed: 203 additions & 0 deletions b/‎docs/concepts/security.md‎
Lines changed: 203 additions & 0 deletions
diff --git a/‎mkdocs.yml‎
Lines changed: 1 addition & 0 deletions b/‎mkdocs.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎packages/data-designer-config/src/data_designer/config/__init__.py‎
Lines changed: 2 additions & 1 deletion b/‎packages/data-designer-config/src/data_designer/config/__init__.py‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎packages/data-designer-config/src/data_designer/config/run_config.py‎
Lines changed: 20 additions & 0 deletions b/‎packages/data-designer-config/src/data_designer/config/run_config.py‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎packages/data-designer-config/tests/config/test_run_config.py‎
Lines changed: 15 additions & 0 deletions b/‎packages/data-designer-config/tests/config/test_run_config.py‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎packages/data-designer-engine/src/data_designer/engine/column_generators/generators/llm_completion.py‎
Lines changed: 1 addition & 0 deletions b/‎packages/data-designer-engine/src/data_designer/engine/column_generators/generators/llm_completion.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎packages/data-designer-engine/src/data_designer/engine/column_generators/generators/samplers.py‎
Lines changed: 1 addition & 0 deletions b/‎packages/data-designer-engine/src/data_designer/engine/column_generators/generators/samplers.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎packages/data-designer-engine/src/data_designer/engine/column_generators/utils/prompt_renderer.py‎
Lines changed: 9 additions & 1 deletion b/‎packages/data-designer-engine/src/data_designer/engine/column_generators/utils/prompt_renderer.py‎
Lines changed: 9 additions & 1 deletion
@@ -1,7 +1,14 @@
 # Run Config
 
 The `run_config` module defines runtime settings that control dataset generation behavior,
-including early shutdown thresholds, batch sizing, and non-inference worker concurrency.
+including early shutdown thresholds, batch sizing, non-inference worker concurrency,
+and the Jinja rendering engine used by the runtime.
+
+`JinjaRenderingEngine.SECURE` is the default. Set `JinjaRenderingEngine.NATIVE`
+when you want Jinja2's broader built-in sandbox behavior instead of Data Designer's
+hardened renderer.
+
+For guidance on when to use each mode, see [Security](../concepts/security.md).
 
 ## Usage
 
@@ -13,6 +20,7 @@ data_designer = DataDesigner()
 data_designer.set_run_config(dd.RunConfig(
     buffer_size=500,
     max_conversation_restarts=3,
+    jinja_rendering_engine=dd.JinjaRenderingEngine.NATIVE,
 ))
 ```
 
 
@@ -141,6 +141,8 @@ If you need to provide synthetic data generation as a shared service:
 - **Job management**: Queue, monitor, and manage generation jobs centrally
 - **Resource sharing**: Shared infrastructure for SDG workloads
 
+When users can submit configs containing Jinja templates to a shared engine, template rendering becomes a remote code execution concern and part of your security boundary. See [Security](security.md) for guidance on when to keep the default `JinjaRenderingEngine.SECURE` mode.
+
 ---
 
 ## 🧭 Decision Flowchart
@@ -181,3 +183,4 @@ If you need to provide synthetic data generation as a shared service:
 
 - **Library**: Continue with this documentation
 - **Microservice**: See the [NeMo Data Designer Microservice documentation](https://docs.nvidia.com/nemo/microservices/latest/design-synthetic-data-from-scratch-or-seeds/index.html){target="_blank"}
+- **Security model**: See [Security](security.md)
@@ -0,0 +1,203 @@
+# Security
+
+Data Designer can run in two very different trust models:
+
+- **Trusted / monolithic**: The same user or team writes the config and runs the engine.
+- **Untrusted / shared execution**: One user submits a config and a different process, service, or team executes it.
+
+That distinction matters for features that evaluate user-supplied configuration at runtime, such as Jinja template rendering. In a trusted local workflow, broader template flexibility may be acceptable. In a shared-service deployment, user-supplied Jinja becomes part of the engine's remote code execution surface. A template sandbox escape would execute inside the process running Data Designer.
+
+See [Deployment Options](deployment-options.md) for the architectures where that trust boundary changes.
+
+## Jinja Rendering Modes
+
+Data Designer exposes the renderer choice through `RunConfig`:
+
+```python
+import data_designer.config as dd
+
+run_config = dd.RunConfig(
+    jinja_rendering_engine=dd.JinjaRenderingEngine.SECURE,
+)
+```
+
+`SECURE` is the default. Opt into `NATIVE` only when you are comfortable treating the config author and the engine operator as the same trust domain.
+
+| Mode | What it uses | Best fit |
+|------|---------------|----------|
+| `SECURE` | Data Designer's hardened renderer built on top of Jinja2's sandbox | Shared services, microservices, internal platforms, or any deployment where config submission is separated from execution |
+| `NATIVE` | Jinja2's built-in sandbox with Data Designer's variable whitelist | Local library usage and other trusted, monolithic workflows that want broader Jinja behavior |
+
+!!! warning "Treat untrusted Jinja as a security boundary"
+    If many users can submit configs to one engine, or if configs are accepted over an API and executed elsewhere, keep `JinjaRenderingEngine.SECURE`. In that model, Jinja templates are no longer just prompt-formatting helpers. They are untrusted user programs being evaluated by your engine.
+
+## Compatibility Matrix
+
+`NATIVE` is not an unrestricted Python template engine. The matrix below shows what each mode permits, restricts, or adds on top of Jinja2's standard sandbox behavior.
+
+| Capability | `NATIVE` | `SECURE` |
+|------|------|----------|
+| Jinja2 `ImmutableSandboxedEnvironment` baseline | Yes | Yes |
+| References to explicitly provided dataset variables only | Yes | Yes |
+| Standard Jinja built-in filter set | Yes | Subset only |
+| Data Designer `jsonpath` filter | Yes | Yes |
+| `import`, `macro`, `set`, `extends`, `block` support | Yes | No |
+| Nested or recursive `for` loops | Yes | No |
+| Unbounded AST complexity | Yes | No |
+| Template context sanitized to JSON-compatible types before render | No | Yes |
+| Empty, oversized, or built-in-like rendered output is permitted | Yes | No |
+
+## What `SECURE` Adds on Top of Standard Jinja Sandbox
+
+The `SECURE` renderer uses a hardened environment implemented in the [renderer source file on GitHub](https://github.com/NVIDIA-NeMo/DataDesigner/blob/v0.5.6/packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py). Compared with the standard Jinja sandbox, it adds several additional controls.
+
+### Record Sanitization Before Render
+
+Before rendering, `SECURE` forces template context through a JSON-compatible serialization step. That means remote templates operate on plain data, not arbitrary Python objects.
+
+```python
+# Intended shape for remote template context
+record = {
+    "user": {
+        "name": "alice",
+        "roles": ["admin", "reviewer"],
+    }
+}
+```
+
+```python
+# Not the kind of server-side object SECURE wants to expose directly
+record = {
+    "user": SomePythonObject(...),
+}
+```
+
+In a remote execution setting, exposing rich Python objects increases the risk of attribute- and method-based sandbox escapes. Jinja's [sandbox security considerations](https://jinja.palletsprojects.com/en/stable/sandbox/) note that the sandbox is not a complete security boundary, and past escapes have included [`str.format` (CVE-2016-10745)](https://nvd.nist.gov/vuln/detail/CVE-2016-10745), [`str.format_map` (CVE-2019-10906)](https://github.com/advisories/GHSA-462w-v97r-4m45), [indirect `str.format` references (CVE-2024-56326)](https://nvd.nist.gov/vuln/detail/CVE-2024-56326), and [`|attr`-based access to `format` (CVE-2025-27516)](https://nvd.nist.gov/vuln/detail/CVE-2025-27516); PortSwigger's [server-side template injection research](https://portswigger.net/research/server-side-template-injection) covers the broader object-traversal pattern.
+
+### Filter Allowlist
+
+`SECURE` keeps only a small approved subset of Jinja filters plus the Data Designer `jsonpath` filter. If a filter is not on that allowlist, the template is rejected. Common excluded filters are:
+
+| Disallowed filters | Why they are excluded in `SECURE` |
+| --- | --- |
+| `attr`, `xmlattr` | These add dynamic attribute lookup or attribute-name construction, which widens the object-traversal surface in untrusted templates. |
+| `map`, `select`, `reject`, `selectattr`, `rejectattr`, `groupby`, `batch`, `slice`, `sum` | These make templates behave more like a data-processing language and can multiply compute across large inputs. |
+| `join`, `format`, `indent`, `wordwrap`, `center`, `filesizeformat` | These expand presentation and composition logic inside the template. `SECURE` keeps formatting logic narrow so templates stay close to interpolation. |
+| `default`, `d`, `dictsort`, `count`, `wordcount`, `pprint`, `tojson` | These encourage fallback logic, secondary data shaping, or debug-style output inside the template rather than in the engine or config layer. |
+| `safe`, `striptags`, `urlize` | These are primarily HTML-oriented output transforms and are unnecessary for server-side dataset rendering. |
+
+Some omitted convenience filters, such as the `e` alias for `escape`, are excluded because `SECURE` uses a small explicit allowlist. The current implementation does not assign each omitted filter its own separate security rationale.
+
+Use `NATIVE` when full Jinja filter compatibility matters more than the additional restrictions used for untrusted template execution.
+
+### Template Features Removed
+
+`SECURE` rejects `import`, `macro`, `set`, `extends`, and `block`.
+
+```jinja
+{% macro render_name(name) %}{{ name }}{% endmacro %}
+{{ render_name(customer_name) }}
+```
+
+```jinja
+{% set temp = user_id %}
+{{ temp }}
+```
+
+Those features are useful in trusted authoring environments, but they also make user templates more expressive and stateful. In a remote execution model, `SECURE` intentionally narrows the language so templates stay closer to data interpolation than to a reusable programming layer.
+
+### Loop Restrictions
+
+`SECURE` rejects recursive loops and nested `for` loops.
+
+```jinja
+{% for row in rows %}
+  {% for item in row %}
+    {{ item }}
+  {% endfor %}
+{% endfor %}
+```
+
+Nested and recursive loops are especially risky in shared execution because they can amplify compute cost and output size in ways that are hard to reason about from the outside.
+
+### AST Complexity Limits
+
+`SECURE` statically analyzes the parsed Jinja AST and rejects templates that exceed the current limits of 600 nodes or depth 10.
+
+```jinja
+{% if a %}
+  {% if b %}
+    {% if c %}
+      {{ value }}
+    {% endif %}
+  {% endif %}
+{% endif %}
+```
+
+This is not about any one feature being unsafe by itself. It is about limiting how much control flow and composition untrusted templates can pack into a single server-side render operation, which helps prevent compute bombs in shared execution.
+
+### `self` References Blocked
+
+`SECURE` rejects references to `self`.
+
+```jinja
+{{ self }}
+```
+
+The point is to avoid exposing template internals back to the submitter. In a remote setting, even accidental access to those internals is unnecessary surface area.
+
+### Rendered Output Guards
+
+`SECURE` validates rendered output after template execution. It rejects empty output, very large output, and strings that look like Python built-in or function representations.
+
+```jinja
+{{ "" }}
+```
+
+```text
+<built-in method ...>
+<function ...>
+```
+
+These checks matter because not all bad outcomes come from parse-time behavior. Some templates are syntactically valid but still produce output that is clearly broken, oversized, or revealing internal implementation details.
+
+### Sanitized User-Facing Errors
+
+At the engine boundary, `SECURE` normalizes most template failures into a generic invalid-template message.
+
+```text
+User provided prompt generation template is invalid.
+```
+
+That matters in remote execution because exception details can leak information about server-side implementation, supported objects, or internal execution paths that untrusted users do not need to see.
+
+These controls exist because the standard sandbox is a good baseline, but shared-service deployments need a narrower and more defensive execution model.
+
+## Why This Matters in Multi-User Deployments
+
+The security posture changes as soon as config submission and execution are separated.
+
+Examples:
+
+- A centralized Data Designer service accepts configs from many users.
+- An internal platform lets users upload or edit configs that are executed by a background worker.
+- A REST API accepts Jinja-containing configs and runs them on server-side infrastructure.
+
+In those environments, templates are no longer just local convenience syntax. They are untrusted input being evaluated by infrastructure the submitter does not control. In practice, that makes Jinja rendering a remote code execution concern, which is why `SECURE` exists and why it remains the default.
+
+If you are deciding between local library usage and a shared service model, read [Deployment Options](deployment-options.md). The library patterns are often still "trusted" deployments. The shared microservice pattern is not.
+
+## When To Use `NATIVE`
+
+Use `NATIVE` when all of the following are true:
+
+- The person submitting the config is also the person running the engine, or they are in the same trusted operational boundary.
+- You want broader standard Jinja behavior than `SECURE` allows.
+- You understand that this is a flexibility tradeoff, not the safer default.
+
+For example, this is often reasonable in a notebook, local script, or other single-user library workflow.
+
+## Related Reading
+
+- [Deployment Options](deployment-options.md)
+- [Run Config Reference](../code_reference/run_config.md)
@@ -31,6 +31,7 @@ nav:
           - Safety & Limits: concepts/mcp/safety-and-limits.md
       - Architecture & Performance: concepts/architecture-and-performance.md
       - Deployment Options: concepts/deployment-options.md
+      - Security: concepts/security.md
   - Tutorials:
       - Overview: notebooks/README.md
       - The Basics: notebooks/1-the-basics.ipynb
 
@@ -58,7 +58,7 @@
         ProcessorType,
         SchemaTransformProcessorConfig,
     )
-    from data_designer.config.run_config import RunConfig, ThrottleConfig  # noqa: F401
+    from data_designer.config.run_config import JinjaRenderingEngine, RunConfig, ThrottleConfig  # noqa: F401
     from data_designer.config.sampler_constraints import (  # noqa: F401
         ColumnInequalityConstraint,
         ConstraintType,
@@ -175,6 +175,7 @@
     "ProcessorType": (_MOD_PROCESSORS, "ProcessorType"),
     "SchemaTransformProcessorConfig": (_MOD_PROCESSORS, "SchemaTransformProcessorConfig"),
     # run_config
+    "JinjaRenderingEngine": (f"{_MOD_BASE}.run_config", "JinjaRenderingEngine"),
     "RunConfig": (f"{_MOD_BASE}.run_config", "RunConfig"),
     "ThrottleConfig": (f"{_MOD_BASE}.run_config", "ThrottleConfig"),
     # sampler_constraints
 
@@ -9,6 +9,14 @@
 from typing_extensions import Self
 
 from data_designer.config.base import ConfigBase
+from data_designer.config.utils.type_helpers import StrEnum
+
+
+class JinjaRenderingEngine(StrEnum):
+    """Template renderer used by the engine for user-supplied Jinja templates."""
+
+    NATIVE = "native"
+    SECURE = "secure"
 
 
 class ThrottleConfig(ConfigBase):
@@ -99,6 +107,11 @@ class RunConfig(ConfigBase):
             Default is False.
         progress_interval: How often (in seconds) the async progress reporter emits a
             consolidated log block. Must be > 0. Default is 5.0.
+        jinja_rendering_engine: Template renderer used for engine-side Jinja evaluation.
+            ``native`` uses Jinja2's built-in sandbox with the standard filter set and
+            fewer Data Designer-specific restrictions. ``secure`` uses Data Designer's
+            hardened sandbox with additional AST, filter, and output guards.
+            Default is ``secure``.
         throttle: AIMD throttle tuning parameters.  See ``ThrottleConfig`` for details.
     """
 
@@ -112,6 +125,13 @@ class RunConfig(ConfigBase):
     async_trace: bool = False
     progress_bar: bool = False
     progress_interval: float = Field(default=5.0, gt=0.0)
+    jinja_rendering_engine: JinjaRenderingEngine = Field(
+        default=JinjaRenderingEngine.SECURE,
+        description=(
+            "Template renderer used for engine-side Jinja evaluation. "
+            "`native` uses Jinja2's built-in sandbox; `secure` uses Data Designer's hardened sandbox."
+        ),
+    )
     throttle: ThrottleConfig = Field(default_factory=ThrottleConfig)
 
     @model_validator(mode="after")
 
@@ -0,0 +1,15 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+from data_designer.config.run_config import JinjaRenderingEngine, RunConfig
+
+
+def test_run_config_defaults_to_secure_jinja_renderer() -> None:
+    assert JinjaRenderingEngine(RunConfig().jinja_rendering_engine) == JinjaRenderingEngine.SECURE
+
+
+def test_run_config_accepts_native_renderer() -> None:
+    run_config = RunConfig(jinja_rendering_engine=JinjaRenderingEngine.NATIVE)
+    assert JinjaRenderingEngine(run_config.jinja_rendering_engine) == JinjaRenderingEngine.NATIVE
@@ -57,6 +57,7 @@ def prompt_renderer(self) -> RecordBasedPromptRenderer:
                 "column_type": self.config.column_type,
                 "model_alias": self.config.model_alias,
             },
+            jinja_rendering_engine=self.resource_provider.run_config.jinja_rendering_engine,
         )
 
     def generate(self, data: dict) -> dict:
 
@@ -56,6 +56,7 @@ def _create_sampling_dataset_generator(self) -> SamplingDatasetGenerator:
         return SamplingDatasetGenerator(
             sampler_columns=self.config,
             person_generator_loader=(self._person_generator_loader if self._needs_person_generator else None),
+            jinja_rendering_engine=self.resource_provider.run_config.jinja_rendering_engine,
         )
 
     def _log_person_generation_if_needed(self) -> None:
 
@@ -9,6 +9,7 @@
 from data_designer.config.base import SingleColumnConfig
 from data_designer.config.column_types import DataDesignerColumnType
 from data_designer.config.models import ModelConfig
+from data_designer.config.run_config import JinjaRenderingEngine
 from data_designer.config.utils.code_lang import CodeLang
 from data_designer.config.utils.misc import extract_keywords_from_jinja2_template
 from data_designer.config.utils.type_helpers import StrEnum
@@ -36,9 +37,16 @@ class PromptType(StrEnum):
 
 
 class RecordBasedPromptRenderer(WithJinja2UserTemplateRendering):
-    def __init__(self, response_recipe: ResponseRecipe, *, error_message_context: dict[str, str] | None = None):
+    def __init__(
+        self,
+        response_recipe: ResponseRecipe,
+        *,
+        error_message_context: dict[str, str] | None = None,
+        jinja_rendering_engine: JinjaRenderingEngine = JinjaRenderingEngine.SECURE,
+    ):
         self.response_recipe = response_recipe
         self._error_message_context = error_message_context
+        self._jinja_rendering_engine = jinja_rendering_engine
 
     def render(self, *, prompt_template: str | None, record: dict, prompt_type: PromptType) -> str | None:
         self._prepare_environment(prompt_template=prompt_template, record=record, prompt_type=prompt_type)
Original file line number	Diff line number	Diff line change
`@@ -57,6 +57,7 @@ def prompt_renderer(self) -> RecordBasedPromptRenderer:`
`57`	`57`	`"column_type": self.config.column_type,`
`58`	`58`	`"model_alias": self.config.model_alias,`
`59`	`59`	`},`
	`60`	`+ jinja_rendering_engine=self.resource_provider.run_config.jinja_rendering_engine,`
`60`	`61`	`)`
`61`	`62`
`62`	`63`	`def generate(self, data: dict) -> dict:`
Original file line number	Diff line number	Diff line change
`@@ -56,6 +56,7 @@ def _create_sampling_dataset_generator(self) -> SamplingDatasetGenerator:`
`56`	`56`	`return SamplingDatasetGenerator(`
`57`	`57`	`sampler_columns=self.config,`
`58`	`58`	`person_generator_loader=(self._person_generator_loader if self._needs_person_generator else None),`
	`59`	`+ jinja_rendering_engine=self.resource_provider.run_config.jinja_rendering_engine,`
`59`	`60`	`)`
`60`	`61`
`61`	`62`	`def _log_person_generation_if_needed(self) -> None:`