Commit 81033e6
authored
feat(config): add deterministic fingerprint for workflow configs (#587)
* feat(config): add deterministic fingerprint for workflow configs (#584)
Provides DataDesignerConfig.fingerprint() and a freestanding
fingerprint_config() helper that produce a content-addressable
sha256 hash of the data-relevant portion of a workflow config.
Identical configs hash identically across processes and Python
versions; fields that don't affect generated rows (tool_configs,
profilers, skip_health_check, max_parallel_requests, timeout,
HuggingFace seed token/endpoint) are excluded.
Custom column generators contribute their registered name and
generator_params (L1) by default; opt-in custom_column_source=True
also hashes inspect.getsource() of each generator (L2) and
degrades gracefully with a warning when the source can't be
retrieved. The normalization scheme is versioned via
CONFIG_HASH_VERSION so future changes can be detected as
"unknown identity" rather than mismatch.
* test(config): cover constraints, processors, extra_body, provider, and seed strategies in fingerprint tests
Also document L1 __name__-collision and L2 whitespace-sensitivity limitations in fingerprint_config(), and drop the json.dumps default=str fallback so non-JSON-native values fail loudly instead of silently degrading determinism.
* feat(config): include tool_configs in fingerprint identity
The set of MCP tools an LLM column can call (providers, allow_tools,
max_tool_call_turns, tool_alias) shapes what the model produces, so
tool_configs is identity-relevant. Only timeout_sec is excluded,
mirroring how inference_parameters.timeout is treated as a runtime
knob rather than a data-identity field.
Updates the fingerprint_config docstring's Include/Exclude lists,
flips the existing tool_configs exclusion test, and adds coverage
for tool_alias / providers / allow_tools / max_tool_call_turns
inclusion plus timeout_sec exclusion.
Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
Made-with: Cursor
* no need to export config hash stuff to config
* refactor(config): tighten fingerprint identity, drop L2 source hashing
Drops the opt-in `custom_column_source` (L2) source-hashing path and
addresses the canonicalization gaps the reviewers found.
L2 had several silent footguns: closures with different captured state
collapsed to the same hash, the empty `custom_column_sources: []` payload
key made L1 and L2 disagree even on configs with no custom columns,
`inspect.unwrap()` could raise `ValueError` on `__wrapped__` cycles
(uncaught), and same-`__name__` collisions silently came back when
`getsource()` failed. Removing it shrinks the public surface, deletes
~50 lines of helper code, and resolves seven review comments at once.
Strengthens L1 identity for custom columns: the payload now includes
`__qualname__`, `__module__`, and the `@custom_column_generator()`
decorator metadata (`required_columns`, `side_effect_columns`,
`model_aliases`) in addition to `__name__` + `generator_params`. This
disambiguates same-`__name__`-different-scope collisions and prevents
silently dropping DAG-affecting metadata.
Canonicalizes alias-keyed lookup tables and optional collections so
builder-API and YAML-loaded configs producing identical datasets
fingerprint identically:
* `model_configs` and `tool_configs` are sorted by alias before
hashing (column order remains identity, since columns are DAG nodes).
* `None` and `[]` collapse to "absent" for top-level optional
collections (`model_configs`, `tool_configs`, `constraints`,
`processors`) and for `tool_configs[*].allow_tools`.
Consolidates the excluded-fields constants behind a single canonical
table comment and drops the Sphinx `:func:`/`:class:` roles in the
docstrings to match the rest of the codebase.
Test coverage adds order-independence tests for `model_configs` and
`tool_configs`, parametrized `None`-vs-`[]` equivalence tests for
all four optional top-level collections plus `allow_tools`,
qualname-disambiguation, and decorator-metadata change detection.
Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
Made-with: Cursor
* test(fingerprint): rename _hash helper to _compute_hash
Function names should be action words; `_hash` is a noun. Rename the
test-only helper to `_compute_hash` to match its verb-form behavior
(it computes a hash from a config). No behavioral change.
Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
Made-with: Cursor
* test(fingerprint): pin closure-capture limitation; restore test names
The previous _hash -> _compute_hash blanket rename also caught the test
names that happen to end in "_hash()" (e.g. test_changing_X_changes_hash).
"hash" is a noun there — it describes what the test is about, not the
helper being called. Restore the original names; only the helper itself
stays renamed.
Add `test_closure_captured_state_is_a_known_limitation` per @johnnygreco's
approval follow-up: factory-built closures with different captured state
share __name__/__qualname__/__module__/source and so fingerprint
identically. Pin that behavior so a future change either keeps the
limitation or has to delete the matching docstring paragraph in lockstep.
Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
Made-with: Cursor
---------
Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>1 parent 47c72b3 commit 81033e6
3 files changed
Lines changed: 777 additions & 0 deletions
File tree
- packages/data-designer-config
- src/data_designer/config
- tests/config
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
Lines changed: 215 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
0 commit comments