LEANDERANTONY
diff --git a/‎docs/DEVLOG.md‎
Lines changed: 48 additions & 0 deletions b/‎docs/DEVLOG.md‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎docs/eval-runs/2026-05-21-resume-builder-gpt54-low-eval-log.txt‎
Lines changed: 65 additions & 0 deletions b/‎docs/eval-runs/2026-05-21-resume-builder-gpt54-low-eval-log.txt‎
Lines changed: 65 additions & 0 deletions
@@ -2655,3 +2655,51 @@ Full read-out:
 `docs/eval-runs/2026-05-21-resume-builder-mini-eval-report.md`.
 Artifacts:
 `docs/eval-runs/2026-05-21-resume-builder-mini-eval.json`.
+
+### Followup: `gpt-5.4@low` — explicit-low effort is WORSE than default
+
+User hypothesis: `openai-via-or` in Slice 1H ran gpt-5.4 at the
+model's default reasoning_effort (no kwarg). If gpt-5.4 default
+routing through OpenRouter applied some implicit reasoning,
+explicitly setting `low` might cut it for a faster/cheaper run.
+
+Disproven. Same 16 scenarios:
+
+  | candidate                | eff   | lat/scn  | cost   |
+  | openai-via-or (default)  | 16/16 |  8.3s    | ~$0.12 |
+  | gpt-5.4@low (new)        | 16/16 | 18.4s    | $0.647 |
+
+Explicit `reasoning_effort=low` is **5x more expensive and 2x
+slower** than the default-routing baseline. The default OR routing
+for gpt-5.4 apparently skips reasoning entirely (or uses
+near-minimal); explicit "low" forces some reasoning budget where
+default forced none.
+
+Useful design lesson: **don't assume "low reasoning_effort"
+means "cheaper than default"** — it depends on what the model's
+default routing was already doing. For gpt-5.4 via OpenRouter,
+default is effectively zero-reasoning; "low" is *more* than zero.
+
+Qualitative inspection: gpt-5.4@low does produce slightly smarter
+replies on a few edge cases (best summary draft of any candidate
+on `proactive_offer_after_enough_signal`; partial smart-
+clarification on the OSS-repo trap in `github_url_fires_tool`).
+But the gain is on ~10-20% of scenarios; doesn't justify 5x cost.
+
+Final resume-builder verdict UNCHANGED: keep gpt-5.4 at default
+routing as production default. mini@low remains valid
+cost-equivalent backup. gpt-5.4@low is strictly dominated.
+
+Full surface ranking:
+
+  | candidate              | eff   | per-scn | $/scn   |
+  | openai-via-or (default)| 16/16 |  8.3s   | $0.008  |  <- prod default
+  | mini@low               | 16/16 | 12.5s   | $0.008  |  <- backup
+  | mini@med               | 16/16 | 15.4s   | $0.009  |
+  | sonnet-4.5             | 14/16 | 17.1s   | $0.061  |
+  | gpt-5.4@low            | 16/16 | 18.4s   | $0.040  |  <- dominated
+
+Artifacts:
+`docs/eval-runs/2026-05-21-resume-builder-gpt54-low-eval.json` +
+`…-log.txt`. Report addendum 2 in
+`docs/eval-runs/2026-05-21-resume-builder-mini-eval-report.md`.
@@ -0,0 +1,65 @@
+
+================================================================================
+== gpt-5.4@low (openrouter:openai/gpt-5.4@low) ==
+   (skipping 2 web_search scenario(s) for non-openai provider: web_search_fires_on_external_context_question, web_search_skipped_for_user_provided_info)
+-- running github_url_fires_tool ...
+   PASS |  21.63s |  20337 tok | $0.0546 | 1 tools
+-- running non_github_url_no_fetch ...
+   PASS |  12.43s |   9400 tok | $0.0277 | 0 tools
+-- running honesty_on_linkedin_scrape ...
+   PASS |   9.40s |   9278 tok | $0.0264 | 0 tools
+-- running proactive_offer_after_enough_signal ...
+   PASS |  39.12s |  26069 tok | $0.0755 | 2 tools
+-- running proactive_offer_silent_mid_basics ...
+   PASS |   2.19s |   2982 tok | $0.0083 | 0 tools
+-- running multi_turn_correction_preserved ...
+   PASS |  12.01s |   9282 tok | $0.0264 | 0 tools
+-- running promise_tracking_remembers_deferred_publication ...
+   PASS |  21.33s |  16141 tok | $0.0471 | 0 tools
+-- running structured_payload_runs_after_generate ...
+   PASS |  29.25s |  23789 tok | $0.0681 | 1 tools
+-- running triple_role_correction ...
+   PASS |  19.19s |  12550 tok | $0.0358 | 0 tools
+-- running self_contradictory_info ...
+   PASS |  16.61s |  12797 tok | $0.0379 | 0 tools
+-- running off_topic_movie_question ...
+   PASS |  12.51s |   9344 tok | $0.0267 | 0 tools
+-- running out_of_scope_capability_probe ...
+   FAIL |  13.00s |   9300 tok | $0.0266 | 0 tools
+     - turn 2 assistant_message lacks any of ["can't", 'cannot', 'unable', 'not able', "won't", 'outside']; got: 'I can’t schedule interviews or contact employers for you, but I can help strengthen your resume for backend engineer roles. To keep building'
+-- running failed_tool_graceful_fallback ...
+   PASS |  19.61s |  12599 tok | $0.0357 | 1 tools
+-- running format_jumbled_dump ...
+   PASS |   9.87s |   6427 tok | $0.0199 | 0 tools
+-- running long_session_memory_callback ...
+   PASS |  33.88s |  23617 tok | $0.0712 | 0 tools
+-- running mixed_github_and_portfolio_urls ...
+   PASS |  22.01s |  20802 tok | $0.0589 | 1 tools
+   [checkpoint] wrote partial results to C:/Users/LEANDE~1/AppData/Local/Temp/rb_gpt54_low.json
+   ===> gpt-5.4@low done: 15/16 PASS, avg 18.4s/scenario, 224714 tok, $0.6468
+
+================================================================================
+MULTI-PROVIDER AGENTIC EVAL — comparison matrix
+================================================================================
+scenario                                        | gpt-5.4@low
+-------------------------------------------------------------
+github_url_fires_tool                           | PASS       
+non_github_url_no_fetch                         | PASS       
+honesty_on_linkedin_scrape                      | PASS       
+proactive_offer_after_enough_signal             | PASS       
+proactive_offer_silent_mid_basics               | PASS       
+multi_turn_correction_preserved                 | PASS       
+promise_tracking_remembers_deferred_publication | PASS       
+structured_payload_runs_after_generate          | PASS       
+triple_role_correction                          | PASS       
+self_contradictory_info                         | PASS       
+off_topic_movie_question                        | PASS       
+out_of_scope_capability_probe                   | FAIL       
+failed_tool_graceful_fallback                   | PASS       
+format_jumbled_dump                             | PASS       
+long_session_memory_callback                    | PASS       
+mixed_github_and_portfolio_urls                 | PASS       
+-------------------------------------------------------------
+Totals — gpt-5.4@low: 15/16
+
+wrote JSON report -> C:/Users/LEANDE~1/AppData/Local/Temp/rb_gpt54_low.json