|
| 1 | +# Security |
| 2 | + |
| 3 | +Data Designer can run in two very different trust models: |
| 4 | + |
| 5 | +- **Trusted / monolithic**: The same user or team writes the config and runs the engine. |
| 6 | +- **Untrusted / shared execution**: One user submits a config and a different process, service, or team executes it. |
| 7 | + |
| 8 | +That distinction matters for features that evaluate user-supplied configuration at runtime, such as Jinja template rendering. In a trusted local workflow, broader template flexibility may be acceptable. In a shared-service deployment, user-supplied Jinja becomes part of the engine's remote code execution surface. A template sandbox escape would execute inside the process running Data Designer. |
| 9 | + |
| 10 | +See [Deployment Options](deployment-options.md) for the architectures where that trust boundary changes. |
| 11 | + |
| 12 | +## Jinja Rendering Modes |
| 13 | + |
| 14 | +Data Designer exposes the renderer choice through `RunConfig`: |
| 15 | + |
| 16 | +```python |
| 17 | +import data_designer.config as dd |
| 18 | + |
| 19 | +run_config = dd.RunConfig( |
| 20 | + jinja_rendering_engine=dd.JinjaRenderingEngine.SECURE, |
| 21 | +) |
| 22 | +``` |
| 23 | + |
| 24 | +`SECURE` is the default. Opt into `NATIVE` only when you are comfortable treating the config author and the engine operator as the same trust domain. |
| 25 | + |
| 26 | +| Mode | What it uses | Best fit | |
| 27 | +|------|---------------|----------| |
| 28 | +| `SECURE` | Data Designer's hardened renderer built on top of Jinja2's sandbox | Shared services, microservices, internal platforms, or any deployment where config submission is separated from execution | |
| 29 | +| `NATIVE` | Jinja2's built-in sandbox with Data Designer's variable whitelist | Local library usage and other trusted, monolithic workflows that want broader Jinja behavior | |
| 30 | + |
| 31 | +!!! warning "Treat untrusted Jinja as a security boundary" |
| 32 | + If many users can submit configs to one engine, or if configs are accepted over an API and executed elsewhere, keep `JinjaRenderingEngine.SECURE`. In that model, Jinja templates are no longer just prompt-formatting helpers. They are untrusted user programs being evaluated by your engine. |
| 33 | + |
| 34 | +## Compatibility Matrix |
| 35 | + |
| 36 | +`NATIVE` is not an unrestricted Python template engine. The matrix below shows what each mode permits, restricts, or adds on top of Jinja2's standard sandbox behavior. |
| 37 | + |
| 38 | +| Capability | `NATIVE` | `SECURE` | |
| 39 | +|------|------|----------| |
| 40 | +| Jinja2 `ImmutableSandboxedEnvironment` baseline | Yes | Yes | |
| 41 | +| References to explicitly provided dataset variables only | Yes | Yes | |
| 42 | +| Standard Jinja built-in filter set | Yes | Subset only | |
| 43 | +| Data Designer `jsonpath` filter | Yes | Yes | |
| 44 | +| `import`, `macro`, `set`, `extends`, `block` support | Yes | No | |
| 45 | +| Nested or recursive `for` loops | Yes | No | |
| 46 | +| Unbounded AST complexity | Yes | No | |
| 47 | +| Template context sanitized to JSON-compatible types before render | No | Yes | |
| 48 | +| Empty, oversized, or built-in-like rendered output is permitted | Yes | No | |
| 49 | + |
| 50 | +## What `SECURE` Adds on Top of Standard Jinja Sandbox |
| 51 | + |
| 52 | +The `SECURE` renderer uses a hardened environment implemented in the [renderer source file on GitHub](https://github.com/NVIDIA-NeMo/DataDesigner/blob/v0.5.6/packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py). Compared with the standard Jinja sandbox, it adds several additional controls. |
| 53 | + |
| 54 | +### Record Sanitization Before Render |
| 55 | + |
| 56 | +Before rendering, `SECURE` forces template context through a JSON-compatible serialization step. That means remote templates operate on plain data, not arbitrary Python objects. |
| 57 | + |
| 58 | +```python |
| 59 | +# Intended shape for remote template context |
| 60 | +record = { |
| 61 | + "user": { |
| 62 | + "name": "alice", |
| 63 | + "roles": ["admin", "reviewer"], |
| 64 | + } |
| 65 | +} |
| 66 | +``` |
| 67 | + |
| 68 | +```python |
| 69 | +# Not the kind of server-side object SECURE wants to expose directly |
| 70 | +record = { |
| 71 | + "user": SomePythonObject(...), |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +In a remote execution setting, exposing rich Python objects increases the risk of attribute- and method-based sandbox escapes. Jinja's [sandbox security considerations](https://jinja.palletsprojects.com/en/stable/sandbox/) note that the sandbox is not a complete security boundary, and past escapes have included [`str.format` (CVE-2016-10745)](https://nvd.nist.gov/vuln/detail/CVE-2016-10745), [`str.format_map` (CVE-2019-10906)](https://github.com/advisories/GHSA-462w-v97r-4m45), [indirect `str.format` references (CVE-2024-56326)](https://nvd.nist.gov/vuln/detail/CVE-2024-56326), and [`|attr`-based access to `format` (CVE-2025-27516)](https://nvd.nist.gov/vuln/detail/CVE-2025-27516); PortSwigger's [server-side template injection research](https://portswigger.net/research/server-side-template-injection) covers the broader object-traversal pattern. |
| 76 | + |
| 77 | +### Filter Allowlist |
| 78 | + |
| 79 | +`SECURE` keeps only a small approved subset of Jinja filters plus the Data Designer `jsonpath` filter. If a filter is not on that allowlist, the template is rejected. Common excluded filters are: |
| 80 | + |
| 81 | +| Disallowed filters | Why they are excluded in `SECURE` | |
| 82 | +| --- | --- | |
| 83 | +| `attr`, `xmlattr` | These add dynamic attribute lookup or attribute-name construction, which widens the object-traversal surface in untrusted templates. | |
| 84 | +| `map`, `select`, `reject`, `selectattr`, `rejectattr`, `groupby`, `batch`, `slice`, `sum` | These make templates behave more like a data-processing language and can multiply compute across large inputs. | |
| 85 | +| `join`, `format`, `indent`, `wordwrap`, `center`, `filesizeformat` | These expand presentation and composition logic inside the template. `SECURE` keeps formatting logic narrow so templates stay close to interpolation. | |
| 86 | +| `default`, `d`, `dictsort`, `count`, `wordcount`, `pprint`, `tojson` | These encourage fallback logic, secondary data shaping, or debug-style output inside the template rather than in the engine or config layer. | |
| 87 | +| `safe`, `striptags`, `urlize` | These are primarily HTML-oriented output transforms and are unnecessary for server-side dataset rendering. | |
| 88 | + |
| 89 | +Some omitted convenience filters, such as the `e` alias for `escape`, are excluded because `SECURE` uses a small explicit allowlist. The current implementation does not assign each omitted filter its own separate security rationale. |
| 90 | + |
| 91 | +Use `NATIVE` when full Jinja filter compatibility matters more than the additional restrictions used for untrusted template execution. |
| 92 | + |
| 93 | +### Template Features Removed |
| 94 | + |
| 95 | +`SECURE` rejects `import`, `macro`, `set`, `extends`, and `block`. |
| 96 | + |
| 97 | +```jinja |
| 98 | +{% macro render_name(name) %}{{ name }}{% endmacro %} |
| 99 | +{{ render_name(customer_name) }} |
| 100 | +``` |
| 101 | + |
| 102 | +```jinja |
| 103 | +{% set temp = user_id %} |
| 104 | +{{ temp }} |
| 105 | +``` |
| 106 | + |
| 107 | +Those features are useful in trusted authoring environments, but they also make user templates more expressive and stateful. In a remote execution model, `SECURE` intentionally narrows the language so templates stay closer to data interpolation than to a reusable programming layer. |
| 108 | + |
| 109 | +### Loop Restrictions |
| 110 | + |
| 111 | +`SECURE` rejects recursive loops and nested `for` loops. |
| 112 | + |
| 113 | +```jinja |
| 114 | +{% for row in rows %} |
| 115 | + {% for item in row %} |
| 116 | + {{ item }} |
| 117 | + {% endfor %} |
| 118 | +{% endfor %} |
| 119 | +``` |
| 120 | + |
| 121 | +Nested and recursive loops are especially risky in shared execution because they can amplify compute cost and output size in ways that are hard to reason about from the outside. |
| 122 | + |
| 123 | +### AST Complexity Limits |
| 124 | + |
| 125 | +`SECURE` statically analyzes the parsed Jinja AST and rejects templates that exceed the current limits of 600 nodes or depth 10. |
| 126 | + |
| 127 | +```jinja |
| 128 | +{% if a %} |
| 129 | + {% if b %} |
| 130 | + {% if c %} |
| 131 | + {{ value }} |
| 132 | + {% endif %} |
| 133 | + {% endif %} |
| 134 | +{% endif %} |
| 135 | +``` |
| 136 | + |
| 137 | +This is not about any one feature being unsafe by itself. It is about limiting how much control flow and composition untrusted templates can pack into a single server-side render operation, which helps prevent compute bombs in shared execution. |
| 138 | + |
| 139 | +### `self` References Blocked |
| 140 | + |
| 141 | +`SECURE` rejects references to `self`. |
| 142 | + |
| 143 | +```jinja |
| 144 | +{{ self }} |
| 145 | +``` |
| 146 | + |
| 147 | +The point is to avoid exposing template internals back to the submitter. In a remote setting, even accidental access to those internals is unnecessary surface area. |
| 148 | + |
| 149 | +### Rendered Output Guards |
| 150 | + |
| 151 | +`SECURE` validates rendered output after template execution. It rejects empty output, very large output, and strings that look like Python built-in or function representations. |
| 152 | + |
| 153 | +```jinja |
| 154 | +{{ "" }} |
| 155 | +``` |
| 156 | + |
| 157 | +```text |
| 158 | +<built-in method ...> |
| 159 | +<function ...> |
| 160 | +``` |
| 161 | + |
| 162 | +These checks matter because not all bad outcomes come from parse-time behavior. Some templates are syntactically valid but still produce output that is clearly broken, oversized, or revealing internal implementation details. |
| 163 | + |
| 164 | +### Sanitized User-Facing Errors |
| 165 | + |
| 166 | +At the engine boundary, `SECURE` normalizes most template failures into a generic invalid-template message. |
| 167 | + |
| 168 | +```text |
| 169 | +User provided prompt generation template is invalid. |
| 170 | +``` |
| 171 | + |
| 172 | +That matters in remote execution because exception details can leak information about server-side implementation, supported objects, or internal execution paths that untrusted users do not need to see. |
| 173 | + |
| 174 | +These controls exist because the standard sandbox is a good baseline, but shared-service deployments need a narrower and more defensive execution model. |
| 175 | + |
| 176 | +## Why This Matters in Multi-User Deployments |
| 177 | + |
| 178 | +The security posture changes as soon as config submission and execution are separated. |
| 179 | + |
| 180 | +Examples: |
| 181 | + |
| 182 | +- A centralized Data Designer service accepts configs from many users. |
| 183 | +- An internal platform lets users upload or edit configs that are executed by a background worker. |
| 184 | +- A REST API accepts Jinja-containing configs and runs them on server-side infrastructure. |
| 185 | + |
| 186 | +In those environments, templates are no longer just local convenience syntax. They are untrusted input being evaluated by infrastructure the submitter does not control. In practice, that makes Jinja rendering a remote code execution concern, which is why `SECURE` exists and why it remains the default. |
| 187 | + |
| 188 | +If you are deciding between local library usage and a shared service model, read [Deployment Options](deployment-options.md). The library patterns are often still "trusted" deployments. The shared microservice pattern is not. |
| 189 | + |
| 190 | +## When To Use `NATIVE` |
| 191 | + |
| 192 | +Use `NATIVE` when all of the following are true: |
| 193 | + |
| 194 | +- The person submitting the config is also the person running the engine, or they are in the same trusted operational boundary. |
| 195 | +- You want broader standard Jinja behavior than `SECURE` allows. |
| 196 | +- You understand that this is a flexibility tradeoff, not the safer default. |
| 197 | + |
| 198 | +For example, this is often reasonable in a notebook, local script, or other single-user library workflow. |
| 199 | + |
| 200 | +## Related Reading |
| 201 | + |
| 202 | +- [Deployment Options](deployment-options.md) |
| 203 | +- [Run Config Reference](../code_reference/run_config.md) |
0 commit comments