|
7 | 7 | "source": [ |
8 | 8 | "# 4. OpenAI Video Target\n", |
9 | 9 | "\n", |
10 | | - "This example shows how to use the video target to create a video from a text prompt.\n", |
| 10 | + "`OpenAIVideoTarget` supports three modes:\n", |
| 11 | + "- **Text-to-video**: Generate a video from a text prompt.\n", |
| 12 | + "- **Remix**: Create a variation of an existing video (using `video_id` from a prior generation).\n", |
| 13 | + "- **Text+Image-to-video**: Use an image as the first frame of the generated video.\n", |
11 | 14 | "\n", |
12 | 15 | "Note that the video scorer requires `opencv`, which is not a default PyRIT dependency. You need to install it manually or using `pip install pyrit[opencv]`." |
13 | 16 | ] |
14 | 17 | }, |
| 18 | + { |
| 19 | + "cell_type": "markdown", |
| 20 | + "id": "1", |
| 21 | + "metadata": {}, |
| 22 | + "source": [ |
| 23 | + "## Text-to-Video\n", |
| 24 | + "\n", |
| 25 | + "This example shows the simplest mode: generating video from text prompts, with scoring." |
| 26 | + ] |
| 27 | + }, |
15 | 28 | { |
16 | 29 | "cell_type": "code", |
17 | 30 | "execution_count": null, |
18 | | - "id": "1", |
| 31 | + "id": "2", |
19 | 32 | "metadata": {}, |
20 | 33 | "outputs": [ |
21 | 34 | { |
|
53 | 66 | }, |
54 | 67 | { |
55 | 68 | "cell_type": "markdown", |
56 | | - "id": "2", |
| 69 | + "id": "3", |
57 | 70 | "metadata": {}, |
58 | 71 | "source": [ |
59 | 72 | "## Generating and scoring a video:\n", |
60 | 73 | "\n", |
61 | | - "Using the video target you can send prompts to generate a video. The video scorer can evaluate the video content itself. Note this section is simply scoring the **video** not the audio. " |
| 74 | + "Using the video target you can send prompts to generate a video. The video scorer can evaluate the video content itself. Note this section is simply scoring the **video** not the audio." |
62 | 75 | ] |
63 | 76 | }, |
64 | 77 | { |
65 | 78 | "cell_type": "code", |
66 | 79 | "execution_count": null, |
67 | | - "id": "3", |
| 80 | + "id": "4", |
68 | 81 | "metadata": {}, |
69 | 82 | "outputs": [ |
70 | 83 | { |
|
448 | 461 | }, |
449 | 462 | { |
450 | 463 | "cell_type": "markdown", |
451 | | - "id": "4", |
| 464 | + "id": "5", |
452 | 465 | "metadata": {}, |
453 | 466 | "source": [ |
454 | 467 | "## Scoring video and audio **together**:\n", |
|
461 | 474 | { |
462 | 475 | "cell_type": "code", |
463 | 476 | "execution_count": null, |
464 | | - "id": "5", |
| 477 | + "id": "6", |
465 | 478 | "metadata": {}, |
466 | 479 | "outputs": [ |
467 | 480 | { |
|
661 | 674 | ")\n", |
662 | 675 | "\n", |
663 | 676 | "for result in results:\n", |
664 | | - " await ConsoleAttackResultPrinter().print_result_async(result=result, include_auxiliary_scores=True) # type: ignore" |
| 677 | + " await ConsoleAttackResultPrinter().print_result_async(result=result, include_auxiliary_scores=True) # type: ignore\n", |
| 678 | + "\n", |
| 679 | + "# Capture video_id from the first result for use in the remix section below\n", |
| 680 | + "video_id = results[0].last_response.prompt_metadata[\"video_id\"]\n", |
| 681 | + "print(f\"Video ID for remix: {video_id}\")" |
| 682 | + ] |
| 683 | + }, |
| 684 | + { |
| 685 | + "cell_type": "markdown", |
| 686 | + "id": "7", |
| 687 | + "metadata": {}, |
| 688 | + "source": [ |
| 689 | + "## Remix (Video Variation)\n", |
| 690 | + "\n", |
| 691 | + "Remix creates a variation of an existing video. After any successful generation, the response\n", |
| 692 | + "includes a `video_id` in `prompt_metadata`. Pass this back via `prompt_metadata={\"video_id\": \"<id>\"}` to remix." |
| 693 | + ] |
| 694 | + }, |
| 695 | + { |
| 696 | + "cell_type": "code", |
| 697 | + "execution_count": null, |
| 698 | + "id": "8", |
| 699 | + "metadata": {}, |
| 700 | + "outputs": [], |
| 701 | + "source": [ |
| 702 | + "from pyrit.models import Message, MessagePiece\n", |
| 703 | + "\n", |
| 704 | + "# Remix using the video_id captured from the text-to-video section above\n", |
| 705 | + "remix_piece = MessagePiece(\n", |
| 706 | + " role=\"user\",\n", |
| 707 | + " original_value=\"Make it a watercolor painting style\",\n", |
| 708 | + " prompt_metadata={\"video_id\": video_id},\n", |
| 709 | + ")\n", |
| 710 | + "remix_result = await video_target.send_prompt_async(message=Message([remix_piece])) # type: ignore\n", |
| 711 | + "print(f\"Remixed video: {remix_result[0].message_pieces[0].converted_value}\")" |
| 712 | + ] |
| 713 | + }, |
| 714 | + { |
| 715 | + "cell_type": "markdown", |
| 716 | + "id": "9", |
| 717 | + "metadata": {}, |
| 718 | + "source": [ |
| 719 | + "## Text+Image-to-Video\n", |
| 720 | + "\n", |
| 721 | + "Use an image as the first frame of the generated video. The input image dimensions must match\n", |
| 722 | + "the video resolution (e.g. 1280x720). Pass both a text piece and an `image_path` piece in the same message." |
| 723 | + ] |
| 724 | + }, |
| 725 | + { |
| 726 | + "cell_type": "code", |
| 727 | + "execution_count": null, |
| 728 | + "id": "10", |
| 729 | + "metadata": {}, |
| 730 | + "outputs": [], |
| 731 | + "source": [ |
| 732 | + "import uuid\n", |
| 733 | + "\n", |
| 734 | + "# Create a simple test image matching the video resolution (1280x720)\n", |
| 735 | + "from PIL import Image\n", |
| 736 | + "\n", |
| 737 | + "from pyrit.common.path import HOME_PATH\n", |
| 738 | + "\n", |
| 739 | + "sample_image = HOME_PATH / \"assets\" / \"pyrit_architecture.png\"\n", |
| 740 | + "resized = Image.open(sample_image).resize((1280, 720)).convert(\"RGB\")\n", |
| 741 | + "\n", |
| 742 | + "import tempfile\n", |
| 743 | + "\n", |
| 744 | + "tmp = tempfile.NamedTemporaryFile(suffix=\".jpg\", delete=False)\n", |
| 745 | + "resized.save(tmp, format=\"JPEG\")\n", |
| 746 | + "tmp.close()\n", |
| 747 | + "image_path = tmp.name\n", |
| 748 | + "\n", |
| 749 | + "# Send text + image to the video target\n", |
| 750 | + "i2v_target = OpenAIVideoTarget()\n", |
| 751 | + "conversation_id = str(uuid.uuid4())\n", |
| 752 | + "\n", |
| 753 | + "text_piece = MessagePiece(\n", |
| 754 | + " role=\"user\",\n", |
| 755 | + " original_value=\"Animate this image with gentle camera motion\",\n", |
| 756 | + " conversation_id=conversation_id,\n", |
| 757 | + ")\n", |
| 758 | + "image_piece = MessagePiece(\n", |
| 759 | + " role=\"user\",\n", |
| 760 | + " original_value=image_path,\n", |
| 761 | + " converted_value_data_type=\"image_path\",\n", |
| 762 | + " conversation_id=conversation_id,\n", |
| 763 | + ")\n", |
| 764 | + "result = await i2v_target.send_prompt_async(message=Message([text_piece, image_piece])) # type: ignore\n", |
| 765 | + "print(f\"Text+Image-to-video result: {result[0].message_pieces[0].converted_value}\")" |
665 | 766 | ] |
666 | 767 | } |
667 | 768 | ], |
668 | 769 | "metadata": { |
| 770 | + "jupytext": { |
| 771 | + "main_language": "python" |
| 772 | + }, |
669 | 773 | "language_info": { |
670 | 774 | "codemirror_mode": { |
671 | 775 | "name": "ipython", |
|
0 commit comments