Skip to content

AIDC-AI/Awesome-Multi-Image-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome Multi-Image Generation Awesome

Timeline

Figure 1: Timeline of Multi-Image Generation Methods. The timeline presents the chronological development of methods organized by their respective release years.

πŸ‘‰ What is This Repo for?

This repository provides a comprehensive collection of resources related to multi-image generation, featuring:

  • A curated list of methods organized by consistency dimensions
  • Categorized datasets for multi-view, character, temporal, and semantic consistency research
  • Benchmarks for evaluating multi-image generation quality across different consistency types

Designed to help researchers and practitioners explore, compare, and build state-of-the-art multi-image generation systems.

Contents

What is Multi-Image Generation?

Multi-Image Generation refers to the task of generating multiple images with inherent correlations and consistency constraints. Unlike traditional single-image generation, multi-image generation requires maintaining coherence across multiple outputs along one or more dimensions, such as geometric structure, identity attributes, temporal continuity, or semantic relationships. This repository collects methods organized by consistency dimensions, reflecting the primary type of coherence each approach aims to achieve.

点击播放视钑

Figure 2: Example of multi-view consistency. SyncDreamer generates multi-view consistent images from a single input.

Figure 3: Example of character consistency using StoryMaker. The first three rows show a day in the life of an office worker, and the last two rows are based on Before Sunrise.


Original Input
β†’
Step forward
β†’
Look up to sky
β†’
Zoom out

Figure 4: Example of temporal consistency. iMontage generates sequential image and maintains temporal consistency across generated transitions.

Figure 5: Example of semantic consistency. Wan-2.7-Image transforms a single reference image into nine cohesive comic panels.

Methods

Multi-View Consistency

Multi-View Consistency methods generate multiple images of the same 3D object or scene from different viewpoints while maintaining geometric coherence. This is inherently a multi-image task as it requires producing a set of views that correspond to the same underlying 3D structure, with cross-view constraints ensuring consistency across all generated perspectives.

🏷️ Name πŸ“„ Title πŸ›οΈ Venue πŸ“…Date πŸ’» Code 🌐 Demo
Geometry-Aware RoPE Geometry-Aware Rotary Position Embedding for Consistent Video World Model arXiv 2026-02 - -
AnchoredDream AnchoredDream: Zero-Shot 360Β° Indoor Scene Generation from a Single View via Geometric Grounding arXiv 2026-01 - -
MVRoom MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models arXiv 2025-12 - -
CAMEO CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models GitHub Repo stars arXiv 2025-12 GitHub Demo
DT-NVS DT-NVS: Diffusion Transformers for Novel View Synthesis arXiv 2025-11 - -
GeoMVD GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction GitHub Repo stars arXiv 2025-11 GitHub Demo
JCDM Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis GitHub Repo stars arXiv 2025-11 GitHub -
MVCustom MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion GitHub Repo stars ICLR 2025-10 GitHub Demo
LoomNet LoomNet: Enhancing Multi-View Image Generation via Latent Space Weaving arXiv 2025-07 - -
MV-AR Auto-Regressively Generating Multi-View Consistent Images GitHub Repo stars ICCV 2025-06 GitHub -
Generative GS Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors ICCV 2025-03 - Demo
MVGD Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion CVPR 2025-01 - Demo
MEt3R MEt3R: Measuring Multi-View Consistency in Generated Images GitHub Repo stars arXiv 2025-01 GitHub Demo
Sharp-It Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation GitHub Repo stars CVPR 2024-12 GitHub Demo
SeMv-3D SeMv-3D: Towards Concurrency of Semantic and Multi-view Consistency in General Text-to-3D Generation arXiv 2024-10 - -
SV4D SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency GitHub Repo stars arXiv 2024-07 GitHub Demo
MVG-Splatting MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification arXiv 2024-07 - Demo
NVS-Solver NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer GitHub Repo stars ICLR 2024-06 GitHub -
Era3D Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention GitHub Repo stars NeurIPS 2024-05 GitHub Demo
V3D V3D: Video Diffusion Models are Effective 3D Generators GitHub Repo stars arXiv 2024-03 GitHub Demo
SV3D SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion ECCV 2024-03 - Demo
SPAD SPAD: Spatially Aware Multiview Diffusers GitHub Repo stars CVPR 2024-02 GitHub Demo
Direct2.5 Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion GitHub Repo stars CVPR 2023-11 GitHub Demo
Zero123++ Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model GitHub Repo stars arXiv 2023-10 GitHub Demo
ConsistNet ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion GitHub Repo stars CVPR 2023-10 GitHub Demo
SyncDreamer SyncDreamer: Generating Multiview-consistent Images from a Single-view Image GitHub Repo stars ICLR 2023-09 GitHub Demo
MVDream MVDream: Multi-view Diffusion for 3D Generation GitHub Repo stars arXiv 2023-08 GitHub Demo
MVDiffusion MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion GitHub Repo stars NeurIPS 2023-07 GitHub Demo
Zero-1-to-3 Zero-1-to-3: Zero-shot One Image to 3D Object GitHub Repo stars ICCV 2023-03 GitHub Demo
Text2Room Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models GitHub Repo stars ICCV 2023-03 GitHub Demo
DreamBooth3D DreamBooth3D: Subject-Driven Text-to-3D Generation arXiv 2023-03 - Demo
Texture TEXTure: Text-Guided Texturing of 3D Shapes GitHub Repo stars SIGGRAPH 2023-02 GitHub Demo
RealFusion RealFusion: 360Β° Reconstruction of Any Object from a Single Image GitHub Repo stars CVPR 2023-02 GitHub Demo
DiffDreamer DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models ICCV 2022-11 - Demo
DreamFusion DreamFusion: Text-to-3D using 2D Diffusion ICLR 2022-09 - Demo
Infinite Nature Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image ICCV 2020-12 GitHub Demo

⬆️ Back to Top

Character Consistency

Character consistency methods aim to generate images of one or more subjects while preserving their identity and key features, such as facial attributes or other key characteristics. This is inherently a multi-image problem requiring that the same subject remains recognizable across different scenes or contexts, and is widely studied in applications like storyboards and narratives.

🏷️ Name πŸ“„ Title πŸ›οΈ Venue πŸ“…Date πŸ’» Code 🌐 Demo
Solaris Solaris: Building a Multiplayer Video World Model in Minecraft GitHub Repo stars arXiv 2026-02 GitHub Demo
DreamingComics DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models arXiv 2025-12 - Demo
CharCom CharCom: Composable Identity Control for Multi-Character Story Illustration ACM MM 2025-10 - -
ContextGen ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation GitHub Repo stars ICLR 2025-10 GitHub Demo
WithAnyone WithAnyone: Towards Controllable and ID Consistent Image Generation GitHub Repo stars ICLR 2025-10 GitHub Demo
OmniGen2 OmniGen2: Exploration to Advanced Multimodal Generation GitHub Repo stars arXiv 2025-06 GitHub Demo
Audit & Repair Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models arXiv 2025-06 - Demo
RefIPFR Reference-Guided Identity Preserving Face Restoration GitHub Repo stars arXiv 2025-05 GitHub -
UNO Less-to-More Generalization: Unlocking More Controllability by In-Context Generation GitHub Repo stars arXiv 2025-04 GitHub Demo
InfiniteYou InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity GitHub Repo stars ICCV 2025-03 GitHub Demo
StoryWeaver StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization GitHub Repo stars AAAI 2024-12 GitHub -
IR-Diffusion Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention arXiv 2024-11 - -
ID-Patch ID-Patch: Robust ID Association for Group Photo Personalization GitHub Repo stars CVPR 2024-11 GitHub Demo
StoryAgent StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration arXiv 2024-11 - -
StoryMaker StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation GitHub Repo stars arXiv 2024-09 GitHub -
DreamStory DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion GitHub Repo stars TPAMI 2024-07 GitHub Demo
SIGMA-Gen SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation GitHub Repo stars arXiv 2024-06 GitHub Demo
StoryDiffusion StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation GitHub Repo stars NeurIPS 2024-05 GitHub Demo
ConsiStory Training-Free Consistent Text-to-Image Generation GitHub Repo stars SIGGRAPH 2024-02 GitHub Demo
Elite ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation GitHub Repo stars ICCV 2023-02 GitHub -
Make-A-Story Make-A-Story: Visual Memory Conditioned Consistent Story Generation GitHub Repo stars CVPR 2022-11 GitHub -
VP-CSV Character-centric Story Visualization via Visual Planning and Token Alignment GitHub Repo stars EMNLP 2022-10 GitHub -
DuCo-StoryGAN Improving Generation and Evaluation of Visual Stories via Semantic Consistency GitHub Repo stars NAACL 2021-05 GitHub -
StoryGAN StoryGAN: A Sequential Conditional GAN for Story Visualization GitHub Repo stars CVPR 2018-12 GitHub -

⬆️ Back to Top

Temporal Consistency

Temporal consistency methods can be seen as multi-image generation tasks, as they involve producing a sequence of images or video frames over time. Each image or frame is generated conditioned on preceding ones, requiring smooth transitions and coherent motion, so that the temporal and physical dynamics of the sequence are preserved.

🏷️ Name πŸ“„ Title πŸ›οΈ Venue πŸ“…Date πŸ’» Code 🌐 Demo
MMM Mode Seeking meets Mean Seeking for Fast Long Video Generation arXiv 2026-02 - Demo
VideoGPA VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation GitHub Repo stars arXiv 2026-01 GitHub Demo
VideoAR VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction GitHub Repo stars arXiv 2026-01 GitHub Demo
RELIC RELIC: Interactive Video World Model with Long-Horizon Memory arXiv 2025-12 - -
Infinity-RoPE Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout GitHub Repo stars CVPR 2025-11 GitHub Demo
iMontage iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation GitHub Repo stars arXiv 2025-11 GitHub Demo
STCDiT STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution GitHub Repo stars CVPR 2025-11 GitHub Demo
ChronoEdit ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation GitHub Repo stars ICLR 2025-10 GitHub Demo
Self-Forcing++ Self-Forcing++: Towards Minute-Scale High-Quality Video Generation GitHub Repo stars arXiv 2025-10 GitHub Demo
LongLive LongLive: Real-time Interactive Long Video Generation GitHub Repo stars ICLR 2025-09 GitHub Demo
Mixture of Contexts (MoC) Mixture of Contexts for Long Video Generation ICLR 2025-08 - Demo
Matrix-Game 2.0 Matrix-Game 2.0: An Open-Source Real-Time and Streaming Interactive World Model GitHub Repo stars arXiv 2025-08 GitHub Demo
4D Video Generation Geometry-aware 4D Video Generation for Robot Manipulation GitHub Repo stars ICLR 2025-07 GitHub Demo
Self Forcing Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion GitHub Repo stars NeurIPS 2025-06 GitHub Demo
FlowMo FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation GitHub Repo stars arXiv 2025-06 GitHub Demo
FramePack Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models GitHub Repo stars arXiv 2025-04 GitHub Demo
EquiVDM On Equivariance and Fast Sampling in Video Diffusion Models Trained with Warped Noise arXiv 2025-04 - -
RealGeneral RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models GitHub Repo stars ICCV 2025-03 GitHub Demo
SimulateMotion Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss arXiv 2025-01 - Demo
Ouroboros-Diffusion Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion GitHub Repo stars AAAI 2025-01 GitHub -
CausVid CausVid: From Slow Bidirectional to Fast Autoregressive Video Diffusion Models GitHub Repo stars CVPR 2024-12 GitHub Demo
TiARA Enhancing Long Video Generation Consistency without Tuning ICML Workshop 2024-12 - -
Pathways Pathways on the Image Manifold: Image Editing via Video Generation GitHub Repo stars CVPR 2024-11 GitHub Demo
JVID JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation arXiv 2024-09 - -
RCDMs Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models GitHub Repo stars AAAI 2024-07 GitHub -
ConsistI2V ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation GitHub Repo stars arXiv 2024-02 GitHub Demo

⬆️ Back to Top

Semantic Consistency

Semantic consistency is essential for multi-image generation. These methods ensure that multiple generated images maintain coherent layouts, logical semantic relationships, and overall scene structure. In tasks like controllable generation, iterative image editing, and multi-region editing, semantic consistency provides the structural constraints needed to prevent conflicting content across different outputs.

🏷️ Name πŸ“„ Title πŸ›οΈ Venue πŸ“…Date πŸ’» Code 🌐 Demo
ConsistCompose ConsistCompose: Unified Multimodal Layout Control for Image Composition CVPR 2025-11 - -
Griffin Griffin: Generative Reference and Layout Guided Image Composition arXiv 2025-09 - -
UniVid UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models GitHub Repo stars WACV 2025-09 GitHub -
SemLayoutDiff SemLayoutDiff: Semantic Layout Generation with Diffusion Model for Indoor Scene Synthesis GitHub Repo stars arXiv 2025-08 GitHub Demo
LAMIC LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer GitHub Repo stars AAAI 2025-08 GitHub -
IMAGHarmony IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout GitHub Repo stars arXiv 2025-06 GitHub Demo
PSDiffusion PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment GitHub Repo stars arXiv 2025-05 GitHub -
In-Context Edit In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer GitHub Repo stars NeurIPS 2025-04 GitHub Demo
Step1X-Edit Step1X-Edit: A Practical Framework for General Image Editing GitHub Repo stars arXiv 2025-04 GitHub -
VisAgent VisAgent: Narrative-Preserving Story Visualization Framework ICASSP 2025-03 - -
DreamLayer DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model arXiv 2025-03 - Demo
UniReal UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics GitHub Repo stars arXiv 2024-12 GitHub Demo
MSD Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation arXiv 2024-10 - -
SpotActor SpotActor: Training-Free Layout-Controlled Consistent Image Generation arXiv 2024-09 - -
PCDMs Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models GitHub Repo stars ICLR 2023-10 GitHub -
LayoutGPT LayoutGPT: Compositional Visual Planning and Generation with Large Language Models GitHub Repo stars NeurIPS 2023-05 GitHub Demo
Layout-Guidance Training-Free Layout Control with Cross-Attention Guidance GitHub Repo stars WACV 2023-04 GitHub Demo
P+ P+: Extended Textual Conditioning in Text-to-Image Generation arXiv 2023-03 - Demo
GLIGEN GLIGEN: Open-Set Grounded Text-to-Image Generation GitHub Repo stars CVPR 2023-01 GitHub Demo
Attend-and-Excite Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models GitHub Repo stars SIGGRAPH 2023-01 GitHub Demo
Custom Diffusion Multi-Concept Customization of Text-to-Image Diffusion GitHub Repo stars CVPR 2022-12 GitHub Demo
Structure Diffusion Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis GitHub Repo stars ICLR 2022-12 GitHub Demo
BLT BLT: Bidirectional Layout Transformer for Controllable Layout Generation GitHub Repo stars ECCV 2021-12 GitHub Demo
VLCStoryGAN Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization EMNLP 2021-10 - -
ATISS ATISS: Autoregressive Transformers for Indoor Scene Synthesis GitHub Repo stars NeurIPS 2021-10 GitHub Demo
SceneFormer SceneFormer: Indoor Scene Generation with Transformers GitHub Repo stars 3DV 2020-12 GitHub Demo
Layout Transformer LayoutTransformer: Layout Generation and Completion with Self-Attention GitHub Repo stars ICCV 2020-06 GitHub Demo
LayoutVAE LayoutVAE: Stochastic Scene Layout Generation From a Label Set ICCV 2019-07 - -
LayoutGAN LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators GitHub Repo stars ICLR 2019-01 GitHub -

⬆️ Back to Top

Datasets

Multi-View Datasets

πŸ—„οΈ Dataset πŸ“Š Samples πŸ“„ Paper πŸ›οΈ Venue πŸ“… Date
Griffin ~30,000 frames, 270,000 images Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark AAAI 2025-03
MVImgNet2.0 520k MVImgNet2.0: A Larger-scale Dataset of Multi-view Images SIGGRAPH Asia 2024-12
OpenMaterial 1,001 OpenMaterial: A Large-scale Dataset of Complex Materials for 3D Reconstruction arXiv 2024-06
Objaverse-XL 10M+ Objaverse-XL: A Universe of 10M+ 3D Objects arXiv 2023-07
Objaverse 800k+ Objaverse: A Universe of Annotated 3D Objects CVPR 2022-11
CO3D 19K Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction ICCV 2021-09

⬆️ Back to Top

Character Datasets

πŸ—„οΈ Dataset πŸ“Š Samples πŸ“„ Paper πŸ›οΈ Venue πŸ“… Date
WildActor 1.6M videos, 18M frames WildActor: Unconstrained Identity-Preserving Video Generation arXiv 2026-03
Solaris Dataset 12.64M frames Solaris: Building a Multiplayer Video World Model in Minecraft arXiv 2026-02
2K-Characters-10K-Stories 2K chars, 10K stories 2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence Consistency arXiv 2025-12
OmniPerson 2000k OmniPerson: Unified Identity-Preserving Pedestrian Generation arXiv 2025-12
WithAnyone 2M WithAnyone: Towards Controllable and ID Consistent Image Generation ICLR 2025-10
MultiHuman-Testbench 1,800 samples, 5,550 faces MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans NeurIPS 2025-06
Openstory++ 796k Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling arXiv 2024-08
MIS 12M Many-to-many Image Generation with Auto-regressive Diffusion Models arXiv 2024-03

⬆️ Back to Top

Temporal Datasets

πŸ—„οΈ Dataset πŸ“Š Samples πŸ“„ Paper πŸ›οΈ Venue πŸ“… Date
GenVID 80k Artifact-Aware Evaluation for High-Quality Video Generation arXiv 2026-01
SeqBench 320 prompts, 2,560 videos SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models arXiv 2025-10
GeneVA 5,452 prompts, 16,356 videos GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts arXiv 2025-09
BrokenVideos 3,254 BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos arXiv 2025-06
FlintstonesSV 20k FlintstonesSV++ : Improving Story Narration using Visual Scene Graph ECIR 2025-04
MovieBench from 160 movies MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation CVPR 2024-06
PororoSV 14k+ Storygan: A sequential conditional gan for story visualization arXiv 2018-12

⬆️ Back to Top

Semantic Datasets

πŸ—„οΈ Dataset πŸ“Š Samples πŸ“„ Paper πŸ›οΈ Venue πŸ“… Date
MICo-150K 150k MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition CVPR 2025-12
Echo-4o 180k Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation arXiv 2025-08
LAION-SG 482k LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations arXiv 2024-12
SynCD 90k Generating Multi-Image Synthetic Data for Text-to-Image Customization ICCV 2025-02

⬆️ Back to Top

Benchmarks

Multi-View Benchmarks

🏷️ Name πŸ“„ Paper πŸ›οΈ Venue πŸ“… Date πŸ’» Code
Charge Charge: A Comprehensive Novel View Synthesis Benchmark and Dataset to Bind Them All arXiv 2025-12 -
MVGBench MVGBench: A Comprehensive Benchmark for Multi-view Generation Models ICCV 2025-07 GitHub
Griffin Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark AAAI 2025-03 GitHub
MEt3R MEt3R: Measuring Multi-View Consistency in Generated Images CVPR 2025-01 GitHub
Robust Multi-View Depth A Benchmark and a Baseline for Robust Multi-view Depth Estimation 3DV 2022-09 GitHub

⬆️ Back to Top

Character Benchmarks

🏷️ Name πŸ“„ Paper πŸ›οΈ Venue πŸ“… Date πŸ’» Code
Solaris Benchmark Solaris: Building a Multiplayer Video World Model in Minecraft arXiv 2026-02 GitHub
IdentityStory IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation AAAI 2025-12 GitHub
Envision Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights arXiv 2025-12 GitHub
OmniContext OmniGen2: Exploration to Advanced Multimodal Generation arXiv 2025-06 GitHub
MultiHuman-Testbench MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans NeurIPS 2025-06 GitHub
ViStoryBench ViStoryBench: Comprehensive Benchmark Suite for Story Visualization CVPR 2025-05 GitHub
TBC-Bench StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization AAAI 2024-12 GitHub
DS-500 DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion TPAMI 2024-07 GitHub
NewEpisode Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models arXiv 2024-05 -

⬆️ Back to Top

Temporal Benchmarks

🏷️ Name πŸ“„ Paper πŸ›οΈ Venue πŸ“… Date πŸ’» Code
SeqBench SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models arXiv 2025-10 GitHub
World Consistency Score World Consistency Score: A Unified Metric for Video Generation Quality ICCV 2025-08 GitHub
VBench++ VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models TPAMI 2024-11 GitHub
TC-Bench TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation arXiv 2024-06 GitHub
MovieBench MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation CVPR 2024-06 GitHub
VBench VBench: Comprehensive Benchmark Suite for Video Generative Models CVPR 2023-11 GitHub

⬆️ Back to Top

Semantic Benchmarks

🏷️ Name πŸ“„ Paper πŸ›οΈ Venue πŸ“… Date πŸ’» Code
3SGen-Bench 3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory arXiv 2025-12 -
MICo-150K MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition CVPR 2025-12 GitHub
MΒ³T2IBench MΒ³T2IBench: A Large-Scale Multi-Category, Multi-Instance, Multi-Relation Text-to-Image Benchmark arXiv 2025-10 -
LAMIC LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer AAAI 2025-08 GitHub
MMIG-Bench MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models arXiv 2025-05 GitHub
ImgEdit Benchmark ImgEdit: A Unified Image Editing Dataset and Benchmark NeurIPS 2025-05 GitHub
GEdit-Bench Step1X-Edit: A Practical Framework for General Image Editing arXiv 2025-04 GitHub
MRAMG-Bench MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal Generation SIGIR 2025-02 GitHub
T2I-CompBench++ T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation TPAMI 2023-07 GitHub

⬆️ Back to Top

Applications

🏷️Name 🌐 Demo
World Labs Demo
Skybox AI Demo
Luma Demo
Canva AI Demo
Kling AI Demo
Runway Demo
Multiverse Demo
Rodin Demo

⬆️ Back to Top

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. When adding new papers, please follow these rules:

  1. Ensure the paper is relevant to multi-image generation.
  2. Insert the new entry in reverse chronological order (newest first).
  3. Add links to paper and code (if available)

License

This project is released under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0, SPDX-License-identifier: Apache-2.0).

Citation

If you find this repo is helpful for your research, please cite our paper:

πŸ“„ A Survey on Multi-Image Generation: Advances, Challenges, and Future Directions

@article{chen2026survey,
  title={A Survey on Multi-Image Generation: Advances, Challenges, and Future Directions},
  author={Chen, Qirui and Wang, Guo-Hua and Chen, Jinyuan and Chen, Qing-Guo and Zhang, Jun and Luo, Weihua},
  year={2026}
}

About

This repository provides a comprehensive collection of resources related to multi-image generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors