Skip to content

Naive 1.0 prompts + calibration#1469

Merged
bkorycki merged 3 commits into
mainfrom
naive-1.0
Jan 27, 2026
Merged

Naive 1.0 prompts + calibration#1469
bkorycki merged 3 commits into
mainfrom
naive-1.0

Conversation

@bkorycki
Copy link
Copy Markdown
Contributor

This PR changes the "naive" test+hazard+benchmark to use the new 1.0 naive prompts.

The naive prompts are 10% of the general holdback set. This subset isn’t being used anywhere else, so we created a new file which I added to SECURITY_NAIVE_PROMPT_SETS.

Important note: security benchmarks are now meaningless because we don't have 1.0 attack prompts yet. So if you were to run a security benchmark, it would run 1.0 naive prompts and 0.5 jailbreaks. I changed the version of the benchmark to "0.0" in case someone were to run it (either via baas or one of us runs it accidentally), but maybe we should disable the security benchmark cli entirely?

@bkorycki bkorycki requested a review from a team as a code owner January 26, 2026 22:48
@bkorycki bkorycki temporarily deployed to Scheduled Testing January 26, 2026 22:48 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Copy Markdown
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable response to a weird situation.

@bkorycki bkorycki merged commit ce52d5d into main Jan 27, 2026
2 checks passed
@bkorycki bkorycki deleted the naive-1.0 branch January 27, 2026 00:16
@github-actions github-actions Bot locked and limited conversation to collaborators Jan 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants