Skip to content

Commit 947d43a

Browse files
FEAT: Sora target: support remix, image-to-video (#1341)
Co-authored-by: Roman Lutz <romanlutz13@gmail.com>
1 parent 1973e07 commit 947d43a

8 files changed

Lines changed: 1193 additions & 61 deletions

File tree

doc/code/targets/4_openai_video_target.ipynb

Lines changed: 112 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,28 @@
77
"source": [
88
"# 4. OpenAI Video Target\n",
99
"\n",
10-
"This example shows how to use the video target to create a video from a text prompt.\n",
10+
"`OpenAIVideoTarget` supports three modes:\n",
11+
"- **Text-to-video**: Generate a video from a text prompt.\n",
12+
"- **Remix**: Create a variation of an existing video (using `video_id` from a prior generation).\n",
13+
"- **Text+Image-to-video**: Use an image as the first frame of the generated video.\n",
1114
"\n",
1215
"Note that the video scorer requires `opencv`, which is not a default PyRIT dependency. You need to install it manually or using `pip install pyrit[opencv]`."
1316
]
1417
},
18+
{
19+
"cell_type": "markdown",
20+
"id": "1",
21+
"metadata": {},
22+
"source": [
23+
"## Text-to-Video\n",
24+
"\n",
25+
"This example shows the simplest mode: generating video from text prompts, with scoring."
26+
]
27+
},
1528
{
1629
"cell_type": "code",
1730
"execution_count": null,
18-
"id": "1",
31+
"id": "2",
1932
"metadata": {},
2033
"outputs": [
2134
{
@@ -53,18 +66,18 @@
5366
},
5467
{
5568
"cell_type": "markdown",
56-
"id": "2",
69+
"id": "3",
5770
"metadata": {},
5871
"source": [
5972
"## Generating and scoring a video:\n",
6073
"\n",
61-
"Using the video target you can send prompts to generate a video. The video scorer can evaluate the video content itself. Note this section is simply scoring the **video** not the audio. "
74+
"Using the video target you can send prompts to generate a video. The video scorer can evaluate the video content itself. Note this section is simply scoring the **video** not the audio."
6275
]
6376
},
6477
{
6578
"cell_type": "code",
6679
"execution_count": null,
67-
"id": "3",
80+
"id": "4",
6881
"metadata": {},
6982
"outputs": [
7083
{
@@ -448,7 +461,7 @@
448461
},
449462
{
450463
"cell_type": "markdown",
451-
"id": "4",
464+
"id": "5",
452465
"metadata": {},
453466
"source": [
454467
"## Scoring video and audio **together**:\n",
@@ -461,7 +474,7 @@
461474
{
462475
"cell_type": "code",
463476
"execution_count": null,
464-
"id": "5",
477+
"id": "6",
465478
"metadata": {},
466479
"outputs": [
467480
{
@@ -661,11 +674,102 @@
661674
")\n",
662675
"\n",
663676
"for result in results:\n",
664-
" await ConsoleAttackResultPrinter().print_result_async(result=result, include_auxiliary_scores=True) # type: ignore"
677+
" await ConsoleAttackResultPrinter().print_result_async(result=result, include_auxiliary_scores=True) # type: ignore\n",
678+
"\n",
679+
"# Capture video_id from the first result for use in the remix section below\n",
680+
"video_id = results[0].last_response.prompt_metadata[\"video_id\"]\n",
681+
"print(f\"Video ID for remix: {video_id}\")"
682+
]
683+
},
684+
{
685+
"cell_type": "markdown",
686+
"id": "7",
687+
"metadata": {},
688+
"source": [
689+
"## Remix (Video Variation)\n",
690+
"\n",
691+
"Remix creates a variation of an existing video. After any successful generation, the response\n",
692+
"includes a `video_id` in `prompt_metadata`. Pass this back via `prompt_metadata={\"video_id\": \"<id>\"}` to remix."
693+
]
694+
},
695+
{
696+
"cell_type": "code",
697+
"execution_count": null,
698+
"id": "8",
699+
"metadata": {},
700+
"outputs": [],
701+
"source": [
702+
"from pyrit.models import Message, MessagePiece\n",
703+
"\n",
704+
"# Remix using the video_id captured from the text-to-video section above\n",
705+
"remix_piece = MessagePiece(\n",
706+
" role=\"user\",\n",
707+
" original_value=\"Make it a watercolor painting style\",\n",
708+
" prompt_metadata={\"video_id\": video_id},\n",
709+
")\n",
710+
"remix_result = await video_target.send_prompt_async(message=Message([remix_piece])) # type: ignore\n",
711+
"print(f\"Remixed video: {remix_result[0].message_pieces[0].converted_value}\")"
712+
]
713+
},
714+
{
715+
"cell_type": "markdown",
716+
"id": "9",
717+
"metadata": {},
718+
"source": [
719+
"## Text+Image-to-Video\n",
720+
"\n",
721+
"Use an image as the first frame of the generated video. The input image dimensions must match\n",
722+
"the video resolution (e.g. 1280x720). Pass both a text piece and an `image_path` piece in the same message."
723+
]
724+
},
725+
{
726+
"cell_type": "code",
727+
"execution_count": null,
728+
"id": "10",
729+
"metadata": {},
730+
"outputs": [],
731+
"source": [
732+
"import uuid\n",
733+
"\n",
734+
"# Create a simple test image matching the video resolution (1280x720)\n",
735+
"from PIL import Image\n",
736+
"\n",
737+
"from pyrit.common.path import HOME_PATH\n",
738+
"\n",
739+
"sample_image = HOME_PATH / \"assets\" / \"pyrit_architecture.png\"\n",
740+
"resized = Image.open(sample_image).resize((1280, 720)).convert(\"RGB\")\n",
741+
"\n",
742+
"import tempfile\n",
743+
"\n",
744+
"tmp = tempfile.NamedTemporaryFile(suffix=\".jpg\", delete=False)\n",
745+
"resized.save(tmp, format=\"JPEG\")\n",
746+
"tmp.close()\n",
747+
"image_path = tmp.name\n",
748+
"\n",
749+
"# Send text + image to the video target\n",
750+
"i2v_target = OpenAIVideoTarget()\n",
751+
"conversation_id = str(uuid.uuid4())\n",
752+
"\n",
753+
"text_piece = MessagePiece(\n",
754+
" role=\"user\",\n",
755+
" original_value=\"Animate this image with gentle camera motion\",\n",
756+
" conversation_id=conversation_id,\n",
757+
")\n",
758+
"image_piece = MessagePiece(\n",
759+
" role=\"user\",\n",
760+
" original_value=image_path,\n",
761+
" converted_value_data_type=\"image_path\",\n",
762+
" conversation_id=conversation_id,\n",
763+
")\n",
764+
"result = await i2v_target.send_prompt_async(message=Message([text_piece, image_piece])) # type: ignore\n",
765+
"print(f\"Text+Image-to-video result: {result[0].message_pieces[0].converted_value}\")"
665766
]
666767
}
667768
],
668769
"metadata": {
770+
"jupytext": {
771+
"main_language": "python"
772+
},
669773
"language_info": {
670774
"codemirror_mode": {
671775
"name": "ipython",

doc/code/targets/4_openai_video_target.py

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,18 @@
1111
# %% [markdown]
1212
# # 4. OpenAI Video Target
1313
#
14-
# This example shows how to use the video target to create a video from a text prompt.
14+
# `OpenAIVideoTarget` supports three modes:
15+
# - **Text-to-video**: Generate a video from a text prompt.
16+
# - **Remix**: Create a variation of an existing video (using `video_id` from a prior generation).
17+
# - **Text+Image-to-video**: Use an image as the first frame of the generated video.
1518
#
1619
# Note that the video scorer requires `opencv`, which is not a default PyRIT dependency. You need to install it manually or using `pip install pyrit[opencv]`.
1720

21+
# %% [markdown]
22+
# ## Text-to-Video
23+
#
24+
# This example shows the simplest mode: generating video from text prompts, with scoring.
25+
1826
# %%
1927
from pyrit.executor.attack import (
2028
AttackExecutor,
@@ -123,3 +131,67 @@
123131

124132
for result in results:
125133
await ConsoleAttackResultPrinter().print_result_async(result=result, include_auxiliary_scores=True) # type: ignore
134+
135+
# Capture video_id from the first result for use in the remix section below
136+
video_id = results[0].last_response.prompt_metadata["video_id"]
137+
print(f"Video ID for remix: {video_id}")
138+
139+
# %% [markdown]
140+
# ## Remix (Video Variation)
141+
#
142+
# Remix creates a variation of an existing video. After any successful generation, the response
143+
# includes a `video_id` in `prompt_metadata`. Pass this back via `prompt_metadata={"video_id": "<id>"}` to remix.
144+
145+
# %%
146+
from pyrit.models import Message, MessagePiece
147+
148+
# Remix using the video_id captured from the text-to-video section above
149+
remix_piece = MessagePiece(
150+
role="user",
151+
original_value="Make it a watercolor painting style",
152+
prompt_metadata={"video_id": video_id},
153+
)
154+
remix_result = await video_target.send_prompt_async(message=Message([remix_piece])) # type: ignore
155+
print(f"Remixed video: {remix_result[0].message_pieces[0].converted_value}")
156+
157+
# %% [markdown]
158+
# ## Text+Image-to-Video
159+
#
160+
# Use an image as the first frame of the generated video. The input image dimensions must match
161+
# the video resolution (e.g. 1280x720). Pass both a text piece and an `image_path` piece in the same message.
162+
163+
# %%
164+
import uuid
165+
166+
# Create a simple test image matching the video resolution (1280x720)
167+
from PIL import Image
168+
169+
from pyrit.common.path import HOME_PATH
170+
171+
sample_image = HOME_PATH / "assets" / "pyrit_architecture.png"
172+
resized = Image.open(sample_image).resize((1280, 720)).convert("RGB")
173+
174+
import tempfile
175+
176+
tmp = tempfile.NamedTemporaryFile(suffix=".jpg", delete=False)
177+
resized.save(tmp, format="JPEG")
178+
tmp.close()
179+
image_path = tmp.name
180+
181+
# Send text + image to the video target
182+
i2v_target = OpenAIVideoTarget()
183+
conversation_id = str(uuid.uuid4())
184+
185+
text_piece = MessagePiece(
186+
role="user",
187+
original_value="Animate this image with gentle camera motion",
188+
conversation_id=conversation_id,
189+
)
190+
image_piece = MessagePiece(
191+
role="user",
192+
original_value=image_path,
193+
converted_value_data_type="image_path",
194+
conversation_id=conversation_id,
195+
)
196+
result = await i2v_target.send_prompt_async(message=Message([text_piece, image_piece])) # type: ignore
197+
print(f"Text+Image-to-video result: {result[0].message_pieces[0].converted_value}")

pyrit/models/message.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,57 @@ def get_piece(self, n: int = 0) -> MessagePiece:
5151

5252
return self.message_pieces[n]
5353

54+
def get_pieces_by_type(
55+
self,
56+
*,
57+
data_type: Optional[PromptDataType] = None,
58+
original_value_data_type: Optional[PromptDataType] = None,
59+
converted_value_data_type: Optional[PromptDataType] = None,
60+
) -> list[MessagePiece]:
61+
"""
62+
Return all message pieces matching the given data type.
63+
64+
Args:
65+
data_type: Alias for converted_value_data_type (for convenience).
66+
original_value_data_type: The original_value_data_type to filter by.
67+
converted_value_data_type: The converted_value_data_type to filter by.
68+
69+
Returns:
70+
A list of matching MessagePiece objects (may be empty).
71+
"""
72+
effective_converted = converted_value_data_type or data_type
73+
results = self.message_pieces
74+
if effective_converted:
75+
results = [p for p in results if p.converted_value_data_type == effective_converted]
76+
if original_value_data_type:
77+
results = [p for p in results if p.original_value_data_type == original_value_data_type]
78+
return list(results)
79+
80+
def get_piece_by_type(
81+
self,
82+
*,
83+
data_type: Optional[PromptDataType] = None,
84+
original_value_data_type: Optional[PromptDataType] = None,
85+
converted_value_data_type: Optional[PromptDataType] = None,
86+
) -> Optional[MessagePiece]:
87+
"""
88+
Return the first message piece matching the given data type, or None.
89+
90+
Args:
91+
data_type: Alias for converted_value_data_type (for convenience).
92+
original_value_data_type: The original_value_data_type to filter by.
93+
converted_value_data_type: The converted_value_data_type to filter by.
94+
95+
Returns:
96+
The first matching MessagePiece, or None if no match is found.
97+
"""
98+
pieces = self.get_pieces_by_type(
99+
data_type=data_type,
100+
original_value_data_type=original_value_data_type,
101+
converted_value_data_type=converted_value_data_type,
102+
)
103+
return pieces[0] if pieces else None
104+
54105
@property
55106
def api_role(self) -> ChatMessageRole:
56107
"""

0 commit comments

Comments
 (0)