|
1 | 1 | # PHENIX AI Agent - Changelog |
2 | 2 |
|
| 3 | +## Version 112.48 (S2k — _inject_user_params Empty Program Guard) |
| 4 | + |
| 5 | +Fixes a subtle Python truth-value bug introduced in v112.47 where the |
| 6 | +program-scope filter silently passed all dotted keys through when |
| 7 | +`prog_base` was an empty string. |
| 8 | + |
| 9 | +### Problem |
| 10 | + |
| 11 | +```python |
| 12 | +# Python: "refinement".startswith("") is True — every string starts with "" |
| 13 | +scope_matches = leading_scope.startswith(prog_base) or prog_base.startswith(leading_scope) |
| 14 | +``` |
| 15 | + |
| 16 | +When `program_name` was not passed to `_inject_user_params`, `prog_base` |
| 17 | +defaulted to `""`, so `leading_scope.startswith("")` was always `True` and |
| 18 | +every dotted key bypassed the filter — exactly the bug the fix was meant to prevent. |
| 19 | + |
| 20 | +### Fix |
| 21 | + |
| 22 | +Added three guards to the scope-matching expression: |
| 23 | + |
| 24 | +```python |
| 25 | +scope_matches = ( |
| 26 | + bool(prog_base) and len(prog_base) >= 4 and |
| 27 | + (leading_scope == prog_base or |
| 28 | + (leading_scope.startswith(prog_base) and len(prog_base) >= 4) or |
| 29 | + (prog_base.startswith(leading_scope) and len(leading_scope) >= 4)) |
| 30 | +) |
| 31 | +``` |
| 32 | + |
| 33 | +- `bool(prog_base)` — empty string immediately fails; no keys injected |
| 34 | +- `len(prog_base) >= 4` — prevents one- or two-character accidental prefix matches |
| 35 | +- Prefix matching handles `refine` ↔ `refinement` in both directions |
| 36 | + |
| 37 | +**Decision table:** |
| 38 | + |
| 39 | +| Program | Key | Result | |
| 40 | +|---|---|---| |
| 41 | +| `phenix.refine` | `refinement.main.number_of_macro_cycles` | **INJECT** (`refine` ⊂ `refinement`) | |
| 42 | +| `phenix.ligandfit` | `refinement.main.number_of_macro_cycles` | **SKIP** (no overlap) | |
| 43 | +| `phenix.autosol` | `autosol.atom_type` | **INJECT** (exact match) | |
| 44 | +| any | `general.nproc` | **INJECT** (universal scope) | |
| 45 | +| (empty string) | `refinement.*` | **SKIP** (`bool("")` fails immediately) | |
| 46 | + |
| 47 | +### Files changed |
| 48 | + |
| 49 | +| File | Change | |
| 50 | +|------|--------| |
| 51 | +| `programs/ai_agent.py` | `_inject_user_params`: three-part guard on scope-matching expression | |
| 52 | +| `tests/tst_audit_fixes.py` | 1 new S2k guard test (123 total) | |
| 53 | + |
| 54 | +--- |
| 55 | + |
| 56 | +## Version 112.47 (S2k — _inject_user_params Program-Scoped Filtering) |
| 57 | + |
| 58 | +Extends `_inject_user_params` to accept a `program_name` parameter and filter |
| 59 | +dotted PHIL keys to only inject parameters that belong to the target program's |
| 60 | +scope. |
| 61 | + |
| 62 | +### Problem |
| 63 | + |
| 64 | +After the server correctly built `phenix.ligandfit ... general.nproc=4`, the |
| 65 | +client's `_inject_user_params` scanned the guidelines string, found |
| 66 | +`refinement.main.number_of_macro_cycles=2` (from an earlier user directive), |
| 67 | +and appended it unconditionally: |
| 68 | + |
| 69 | +``` |
| 70 | +[inject_user_params] appended: refinement.main.number_of_macro_cycles=2 |
| 71 | +Final command: phenix.ligandfit ... general.nproc=4 refinement.main.number_of_macro_cycles=2 |
| 72 | +``` |
| 73 | + |
| 74 | +The command was clean when it left the server; the client contaminated it. |
| 75 | + |
| 76 | +### Fix |
| 77 | + |
| 78 | +`_inject_user_params(self, command, guidelines, program_name='')` now |
| 79 | +classifies each extracted dotted key before injecting: |
| 80 | + |
| 81 | +```python |
| 82 | +_UNIVERSAL_SCOPES = {'general', 'output', 'job', 'data_manager', 'nproc'} |
| 83 | +prog_base = program_name.replace('phenix.', '').lower() |
| 84 | + |
| 85 | +for key in extracted_keys: |
| 86 | + if '.' in key: |
| 87 | + leading_scope = key.split('.')[0].lower() |
| 88 | + if leading_scope not in _UNIVERSAL_SCOPES and not scope_matches(leading_scope, prog_base): |
| 89 | + skipped.append(key) |
| 90 | + continue |
| 91 | + inject(key) |
| 92 | +``` |
| 93 | + |
| 94 | +Universal scopes (`general`, `output`, etc.) are always injected. All other |
| 95 | +dotted keys are only injected when their leading scope matches the program name. |
| 96 | + |
| 97 | +**Note:** v112.47 contained the empty-`prog_base` bug fixed in v112.48. |
| 98 | + |
| 99 | +### Files changed |
| 100 | + |
| 101 | +| File | Change | |
| 102 | +|------|--------| |
| 103 | +| `programs/ai_agent.py` | `_inject_user_params` rewritten with `program_name` parameter and scope filter; call site updated to pass `program_name` | |
| 104 | +| `tests/tst_audit_fixes.py` | 1 new S2k test (122 total) | |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +## Version 112.46 (S2k — _inject_user_params STOP Guard) |
| 109 | + |
| 110 | +Prevents `_inject_user_params` from running at all when the command is `STOP`. |
| 111 | + |
| 112 | +### Problem |
| 113 | + |
| 114 | +The `_inject_user_params` call site had no guard for STOP commands. A user |
| 115 | +directive containing `refinement.main.number_of_macro_cycles=2` would cause |
| 116 | +the function to emit `STOP refinement.main.number_of_macro_cycles=2`, which |
| 117 | +then failed command validation. |
| 118 | + |
| 119 | +### Fix |
| 120 | + |
| 121 | +```python |
| 122 | +if command and command != 'No command generated.' and \ |
| 123 | + command.strip().split()[0] != 'STOP': |
| 124 | + command = self._inject_user_params(command, guidelines, program_name) |
| 125 | +``` |
| 126 | + |
| 127 | +STOP commands bypass `_inject_user_params` entirely. |
| 128 | + |
| 129 | +### Files changed |
| 130 | + |
| 131 | +| File | Change | |
| 132 | +|------|--------| |
| 133 | +| `programs/ai_agent.py` | Guard added at `_inject_user_params` call site | |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +## Version 112.45 (S2j — program_registry Passthrough Removal) |
| 138 | + |
| 139 | +Removes the dotted-key passthrough in `program_registry.py` that allowed |
| 140 | +LLM strategy parameters to leak across program boundaries. |
| 141 | + |
| 142 | +### Problem |
| 143 | + |
| 144 | +`_build_command_from_registry` had a catchall passthrough: any key in the |
| 145 | +LLM strategy dict that contained a dot was appended to the command verbatim. |
| 146 | +This meant `refinement.main.number_of_macro_cycles=2` from the LLM's strategy |
| 147 | +for one program was silently included in the command for any program. |
| 148 | + |
| 149 | +### Fix |
| 150 | + |
| 151 | +Replaced the unconditional passthrough with an allowlist. A strategy key |
| 152 | +is now accepted only when it: |
| 153 | + |
| 154 | +1. Appears in `strategy_flags` for this specific program, or |
| 155 | +2. Is in `KNOWN_PHIL_SHORT_NAMES` (`nproc`, `twin_law`, `unit_cell`, etc.), or |
| 156 | +3. Contains a literal `=` sign (already a complete PHIL assignment) |
| 157 | + |
| 158 | +Dotted-path keys that pass none of these tests are logged as `DROPPED` and |
| 159 | +discarded. User-supplied dotted overrides reach the command through |
| 160 | +`_inject_user_params` instead (subject to the program-scope filter added in |
| 161 | +v112.47/v112.48). |
| 162 | + |
| 163 | +### Files changed |
| 164 | + |
| 165 | +| File | Change | |
| 166 | +|------|--------| |
| 167 | +| `agent/program_registry.py` | Passthrough logic replaced with allowlist; dropped keys logged | |
| 168 | +| `tests/tst_audit_fixes.py` | 1 new S2j code test (121 total) | |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## Version 112.44 (S2j — Cross-Program Strategy Contamination: Prompt Fix) |
| 173 | + |
| 174 | +Fixes the LLM prompt so the model only emits strategy parameters that apply to |
| 175 | +the program it is currently selecting. |
| 176 | + |
| 177 | +### Root cause |
| 178 | + |
| 179 | +Three places in the prompt instructed the LLM to extract user parameters into |
| 180 | +the `strategy` field with no program-specificity qualifier. The LLM obediently |
| 181 | +included refinement parameters (`refinement.main.number_of_macro_cycles=2`) in |
| 182 | +the strategy for every program it selected — including `phenix.ligandfit` and |
| 183 | +even `STOP`. |
| 184 | + |
| 185 | +### Fixes (knowledge/prompts_hybrid.py) |
| 186 | + |
| 187 | +**User advice extraction block** (was "Extract any specific parameters from the |
| 188 | +user advice and include them in your strategy field"): |
| 189 | + |
| 190 | +> Extract parameters **ONLY when they apply to the program you are currently |
| 191 | +> selecting.** Do NOT include parameters for a different program. For example, |
| 192 | +> if selecting `phenix.ligandfit` and the user mentioned |
| 193 | +> `refinement.main.number_of_macro_cycles`, do NOT include that parameter — |
| 194 | +> it applies to `phenix.refine`, not `phenix.ligandfit`. |
| 195 | +
|
| 196 | +**OUTPUT FORMAT strategy field** — Added CRITICAL STRATEGY RULE block: |
| 197 | + |
| 198 | +> Strategy keys must ONLY contain parameters valid for the selected program. |
| 199 | +> NEVER include parameters from a different program. When stop=true, strategy |
| 200 | +> must be empty {}. |
| 201 | +
|
| 202 | +**IMPORTANT RULES** — Added rule 6: |
| 203 | + |
| 204 | +> Strategy is program-specific: never put parameters for program X in a |
| 205 | +> strategy for program Y. |
| 206 | +
|
| 207 | +### Files changed |
| 208 | + |
| 209 | +| File | Change | |
| 210 | +|------|--------| |
| 211 | +| `knowledge/prompts_hybrid.py` | User advice extraction, OUTPUT FORMAT strategy, IMPORTANT RULES rule 6 | |
| 212 | +| `tests/tst_audit_fixes.py` | 1 new S2j prompt test (120 total) | |
| 213 | + |
| 214 | +--- |
| 215 | + |
| 216 | +## Version 112.43 (S2i — STOP Command Normalization, Root Cause Fix) |
| 217 | + |
| 218 | +Fixes the root cause of `STOP refinement.main.number_of_macro_cycles=2` being |
| 219 | +generated as a command. |
| 220 | + |
| 221 | +### Root cause analysis |
| 222 | + |
| 223 | +The LLM's OUTPUT FORMAT schema has no `command` field — only `program`, |
| 224 | +`strategy`, `files`, `stop`. When the LLM returned: |
| 225 | + |
| 226 | +```json |
| 227 | +{"program": "STOP", "strategy": {"refinement.main.number_of_macro_cycles": 2}, "stop": false} |
| 228 | +``` |
| 229 | + |
| 230 | +The PLAN node normalized `program` to `"STOP"` but never set |
| 231 | +`intent["stop"] = True`. The BUILD node then saw `stop=False`, fell through the |
| 232 | +STOP guard, and assembled the strategy flags onto the command as normal |
| 233 | +arguments, producing `STOP refinement.main.number_of_macro_cycles=2`, which |
| 234 | +then failed validation as "not a recognized phenix program." |
| 235 | + |
| 236 | +### Fix 1 — PLAN node (agent/graph_nodes.py) |
| 237 | + |
| 238 | +When `chosen_program == "STOP"`, explicitly set `intent["stop"] = True`: |
| 239 | + |
| 240 | +```python |
| 241 | +if chosen_program == "STOP" or (intent.get("stop") and |
| 242 | + chosen_program not in valid_programs): |
| 243 | + chosen_program = "STOP" |
| 244 | + intent["program"] = "STOP" |
| 245 | + intent["stop"] = True # NEW: always set stop=True when program is STOP |
| 246 | +``` |
| 247 | + |
| 248 | +### Fix 2 — BUILD node (defence in depth) |
| 249 | + |
| 250 | +Both `_build_with_new_builder` and the legacy `build()` function now short-circuit |
| 251 | +when either `intent["stop"]` is True or `intent["program"]` is `"STOP"`: |
| 252 | + |
| 253 | +```python |
| 254 | +if intent.get("stop") or intent.get("program") == "STOP": |
| 255 | + state = _log(state, "BUILD: Stop requested, no command needed") |
| 256 | + return {**state, "command": "STOP"} |
| 257 | +``` |
| 258 | + |
| 259 | +This means the BUILD node never sees strategy flags for STOP, regardless of |
| 260 | +how the LLM filled the `stop` boolean. |
| 261 | + |
| 262 | +### Files changed |
| 263 | + |
| 264 | +| File | Change | |
| 265 | +|------|--------| |
| 266 | +| `agent/graph_nodes.py` | PLAN: `intent["stop"] = True` when program is STOP; BUILD: early-exit guard in both build paths | |
| 267 | +| `tests/tst_audit_fixes.py` | 4 new S2i tests (119 total) | |
| 268 | + |
| 269 | +--- |
| 270 | + |
| 271 | +## Version 112.42 (S2h — validation_cryoem DataManager PHIL Fix) |
| 272 | + |
| 273 | +Resolves a PHIL argument error in `phenix.validation_cryoem` when run with |
| 274 | +DataManager-style file arguments. |
| 275 | + |
| 276 | +### Problem |
| 277 | + |
| 278 | +`phenix.validation_cryoem` expected PHIL-scoped arguments |
| 279 | +(`input.pdb.file_name=model.pdb`) but the command builder was emitting |
| 280 | +positional DataManager style (`model.pdb map.ccp4`), causing a PHIL parse |
| 281 | +error on launch. |
| 282 | + |
| 283 | +### Fix |
| 284 | + |
| 285 | +Added the correct PHIL argument template to `programs.yaml` for |
| 286 | +`phenix.validation_cryoem`, matching the argument style the program actually |
| 287 | +accepts. The command builder picks this up automatically through the standard |
| 288 | +invariants pipeline. |
| 289 | + |
| 290 | +### Files changed |
| 291 | + |
| 292 | +| File | Change | |
| 293 | +|------|--------| |
| 294 | +| `knowledge/programs.yaml` | `phenix.validation_cryoem`: correct PHIL argument template | |
| 295 | +| `tests/tst_audit_fixes.py` | 2 new S2h tests (115 total) | |
| 296 | + |
| 297 | +--- |
| 298 | + |
| 299 | +## Version 112.41 (S2g — map_correlations Conditional Resolution) |
| 300 | + |
| 301 | +Fixes `phenix.map_correlations` being handed a `resolution=` argument it does |
| 302 | +not accept when running in map-vs-map mode. |
| 303 | + |
| 304 | +### Problem |
| 305 | + |
| 306 | +The command builder always injected `resolution=` from the session context |
| 307 | +into `map_correlations` commands. In model-vs-map mode this is harmless; in |
| 308 | +map-vs-map mode (used by the placement probe) the program does not accept a |
| 309 | +resolution argument and exits with an error. |
| 310 | + |
| 311 | +### Fix |
| 312 | + |
| 313 | +Added a conditional resolution invariant to `map_correlations` in |
| 314 | +`programs.yaml`: resolution is only included when a model file is present |
| 315 | +in the command's file list. |
| 316 | + |
| 317 | +### Files changed |
| 318 | + |
| 319 | +| File | Change | |
| 320 | +|------|--------| |
| 321 | +| `knowledge/programs.yaml` | `phenix.map_correlations`: conditional resolution invariant | |
| 322 | +| `tests/tst_audit_fixes.py` | 2 new S2g tests (113 total) | |
| 323 | + |
| 324 | +--- |
| 325 | + |
| 326 | +## Version 112.40 (S2f — validation_cryoem Auto-Resolution) |
| 327 | + |
| 328 | +Adds automatic resolution filling for `phenix.validation_cryoem`, which |
| 329 | +requires an explicit `resolution=` argument and previously had to rely on the |
| 330 | +user supplying it. |
| 331 | + |
| 332 | +### Problem |
| 333 | + |
| 334 | +`phenix.validation_cryoem` exited with "resolution not provided" when the |
| 335 | +agent did not include a resolution argument. The session always has the |
| 336 | +resolution from a prior `mtriage` run, but no invariant was wiring it through. |
| 337 | + |
| 338 | +### Fix |
| 339 | + |
| 340 | +Added a `requires_resolution` invariant to `phenix.validation_cryoem` in |
| 341 | +`programs.yaml`. The command builder already handles this invariant for other |
| 342 | +programs; the fix was purely a YAML addition. |
| 343 | + |
| 344 | +### Files changed |
| 345 | + |
| 346 | +| File | Change | |
| 347 | +|------|--------| |
| 348 | +| `knowledge/programs.yaml` | `phenix.validation_cryoem`: `requires_resolution: true` invariant | |
| 349 | +| `tests/tst_audit_fixes.py` | 2 new S2f tests (111 total) | |
| 350 | + |
| 351 | +--- |
| 352 | + |
3 | 353 | ## Version 112.36 (S2e — after_program Directive Correctly Suppresses Placement Probe) |
4 | 354 |
|
5 | 355 | Fixes a regression introduced by S2b (v112.33). The S2b fix made |
|
0 commit comments