You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Rename CodeAttackAttack -> CodeAttack (Task 1)
- Collapse language + verbose into a single CodeAttackConverter.Template enum
modelled on BinaryConverter.BitsPerChar; custom pathlib.Path still accepted
for caller-supplied YAML templates (Task 2)
- CodeAttack.__init__ now accepts template: CodeAttackConverter.Template | Path
and forwards it to the converter; language/verbose params removed (Task 3)
- Add @ren2024codeattack entry to doc/references.bib after liu2024flipattack (Task 4)
- Add Code row to the attack table in 1_single_turn.py, add ## Code section
after ## Flip mirroring the FlipAttack shape, regenerate notebook (Task 5)
- Rebase onto upstream/main (doc/code/executor/attack/ directory was removed
upstream; old standalone code_attack.ipynb/.py deleted, content moved into
1_single_turn.py)
- Update all unit tests for the new Template-based API; add custom-Path cases
Copy file name to clipboardExpand all lines: doc/code/executor/1_single_turn.py
+14Lines changed: 14 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -35,6 +35,7 @@
35
35
# | Many-Shot Jailbreak | Prepends many faux question/answer pairs that demonstrate compliance, then asks the real question. |
36
36
# | Skeleton Key | Issues a known jailbreak that asks the model to revise its own safety guidelines. |
37
37
# | Flip | Obfuscates the prompt (e.g. reversing characters) and asks the model to decode and answer. |
38
+
# | Code | Encodes the objective into a code-completion template (e.g. a Python stack or list to fill in) so the request reads as a programming task. |
38
39
#
39
40
# Every example below follows the same shape: construct the attack, call `execute_async(objective=...)`,
40
41
# and print the `AttackResult`. See [Attack Configuration](3_attack_configuration.ipynb) for the inputs
0 commit comments