Awesome Multi-Image Generation

Figure 1: Timeline of Multi-Image Generation Methods. The timeline presents the chronological development of methods organized by their respective release years.

👉 What is This Repo for?

This repository provides a comprehensive collection of resources related to multi-image generation, featuring:

A curated list of methods organized by consistency dimensions
Categorized datasets for multi-view, character, temporal, and semantic consistency research
Benchmarks for evaluating multi-image generation quality across different consistency types

Designed to help researchers and practitioners explore, compare, and build state-of-the-art multi-image generation systems.

What is Multi-Image Generation?

Multi-Image Generation refers to the task of generating multiple images with inherent correlations and consistency constraints. Unlike traditional single-image generation, multi-image generation requires maintaining coherence across multiple outputs along one or more dimensions, such as geometric structure, identity attributes, temporal continuity, or semantic relationships. This repository collects methods organized by consistency dimensions, reflecting the primary type of coherence each approach aims to achieve.

Figure 2: Example of multi-view consistency. SyncDreamer generates multi-view consistent images from a single input.

Figure 3: Example of character consistency using StoryMaker. The first three rows show a day in the life of an office worker, and the last two rows are based on Before Sunrise.

Original Input

→

Step forward

→

Look up to sky

→

Zoom out

Figure 4: Example of temporal consistency. iMontage generates sequential image and maintains temporal consistency across generated transitions.

Figure 5: Example of semantic consistency. Wan-2.7-Image transforms a single reference image into nine cohesive comic panels.

Methods

Multi-View Consistency

Multi-View Consistency methods generate multiple images of the same 3D object or scene from different viewpoints while maintaining geometric coherence. This is inherently a multi-image task as it requires producing a set of views that correspond to the same underlying 3D structure, with cross-view constraints ensuring consistency across all generated perspectives.

🏷️ Name	📄 Title	🏛️ Venue	📅Date	💻 Code	🌐 Demo
Geometry-Aware RoPE	Geometry-Aware Rotary Position Embedding for Consistent Video World Model	arXiv	2026-02	-	-
AnchoredDream	AnchoredDream: Zero-Shot 360° Indoor Scene Generation from a Single View via Geometric Grounding	arXiv	2026-01	-	-
MVRoom	MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models	arXiv	2025-12	-	-
CAMEO	CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models	arXiv	2025-12	GitHub	Demo
DT-NVS	DT-NVS: Diffusion Transformers for Novel View Synthesis	arXiv	2025-11	-	-
GeoMVD	GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction	arXiv	2025-11	GitHub	Demo
JCDM	Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis	arXiv	2025-11	GitHub	-
MVCustom	MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion	ICLR	2025-10	GitHub	Demo
LoomNet	LoomNet: Enhancing Multi-View Image Generation via Latent Space Weaving	arXiv	2025-07	-	-
MV-AR	Auto-Regressively Generating Multi-View Consistent Images	ICCV	2025-06	GitHub	-
Generative GS	Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors	ICCV	2025-03	-	Demo
MVGD	Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion	CVPR	2025-01	-	Demo
MEt3R	MEt3R: Measuring Multi-View Consistency in Generated Images	arXiv	2025-01	GitHub	Demo
Sharp-It	Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation	CVPR	2024-12	GitHub	Demo
SeMv-3D	SeMv-3D: Towards Concurrency of Semantic and Multi-view Consistency in General Text-to-3D Generation	arXiv	2024-10	-	-
SV4D	SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency	arXiv	2024-07	GitHub	Demo
MVG-Splatting	MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification	arXiv	2024-07	-	Demo
NVS-Solver	NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer	ICLR	2024-06	GitHub	-
Era3D	Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention	NeurIPS	2024-05	GitHub	Demo
V3D	V3D: Video Diffusion Models are Effective 3D Generators	arXiv	2024-03	GitHub	Demo
SV3D	SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion	ECCV	2024-03	-	Demo
SPAD	SPAD: Spatially Aware Multiview Diffusers	CVPR	2024-02	GitHub	Demo
Direct2.5	Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion	CVPR	2023-11	GitHub	Demo
Zero123++	Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model	arXiv	2023-10	GitHub	Demo
ConsistNet	ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion	CVPR	2023-10	GitHub	Demo
SyncDreamer	SyncDreamer: Generating Multiview-consistent Images from a Single-view Image	ICLR	2023-09	GitHub	Demo
MVDream	MVDream: Multi-view Diffusion for 3D Generation	arXiv	2023-08	GitHub	Demo
MVDiffusion	MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion	NeurIPS	2023-07	GitHub	Demo
Zero-1-to-3	Zero-1-to-3: Zero-shot One Image to 3D Object	ICCV	2023-03	GitHub	Demo
Text2Room	Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models	ICCV	2023-03	GitHub	Demo
DreamBooth3D	DreamBooth3D: Subject-Driven Text-to-3D Generation	arXiv	2023-03	-	Demo
Texture	TEXTure: Text-Guided Texturing of 3D Shapes	SIGGRAPH	2023-02	GitHub	Demo
RealFusion	RealFusion: 360° Reconstruction of Any Object from a Single Image	CVPR	2023-02	GitHub	Demo
DiffDreamer	DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models	ICCV	2022-11	-	Demo
DreamFusion	DreamFusion: Text-to-3D using 2D Diffusion	ICLR	2022-09	-	Demo
Infinite Nature	Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image	ICCV	2020-12	GitHub	Demo

⬆️ Back to Top

Character Consistency

Character consistency methods aim to generate images of one or more subjects while preserving their identity and key features, such as facial attributes or other key characteristics. This is inherently a multi-image problem requiring that the same subject remains recognizable across different scenes or contexts, and is widely studied in applications like storyboards and narratives.

🏷️ Name	📄 Title	🏛️ Venue	📅Date	💻 Code	🌐 Demo
Solaris	Solaris: Building a Multiplayer Video World Model in Minecraft	arXiv	2026-02	GitHub	Demo
DreamingComics	DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models	arXiv	2025-12	-	Demo
CharCom	CharCom: Composable Identity Control for Multi-Character Story Illustration	ACM MM	2025-10	-	-
ContextGen	ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation	ICLR	2025-10	GitHub	Demo
WithAnyone	WithAnyone: Towards Controllable and ID Consistent Image Generation	ICLR	2025-10	GitHub	Demo
OmniGen2	OmniGen2: Exploration to Advanced Multimodal Generation	arXiv	2025-06	GitHub	Demo
Audit & Repair	Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models	arXiv	2025-06	-	Demo
RefIPFR	Reference-Guided Identity Preserving Face Restoration	arXiv	2025-05	GitHub	-
UNO	Less-to-More Generalization: Unlocking More Controllability by In-Context Generation	arXiv	2025-04	GitHub	Demo
InfiniteYou	InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity	ICCV	2025-03	GitHub	Demo
StoryWeaver	StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization	AAAI	2024-12	GitHub	-
IR-Diffusion	Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention	arXiv	2024-11	-	-
ID-Patch	ID-Patch: Robust ID Association for Group Photo Personalization	CVPR	2024-11	GitHub	Demo
StoryAgent	StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration	arXiv	2024-11	-	-
StoryMaker	StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation	arXiv	2024-09	GitHub	-
DreamStory	DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion	TPAMI	2024-07	GitHub	Demo
SIGMA-Gen	SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation	arXiv	2024-06	GitHub	Demo
StoryDiffusion	StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation	NeurIPS	2024-05	GitHub	Demo
ConsiStory	Training-Free Consistent Text-to-Image Generation	SIGGRAPH	2024-02	GitHub	Demo
Elite	ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation	ICCV	2023-02	GitHub	-
Make-A-Story	Make-A-Story: Visual Memory Conditioned Consistent Story Generation	CVPR	2022-11	GitHub	-
VP-CSV	Character-centric Story Visualization via Visual Planning and Token Alignment	EMNLP	2022-10	GitHub	-
DuCo-StoryGAN	Improving Generation and Evaluation of Visual Stories via Semantic Consistency	NAACL	2021-05	GitHub	-
StoryGAN	StoryGAN: A Sequential Conditional GAN for Story Visualization	CVPR	2018-12	GitHub	-

⬆️ Back to Top

Temporal Consistency

Temporal consistency methods can be seen as multi-image generation tasks, as they involve producing a sequence of images or video frames over time. Each image or frame is generated conditioned on preceding ones, requiring smooth transitions and coherent motion, so that the temporal and physical dynamics of the sequence are preserved.

🏷️ Name	📄 Title	🏛️ Venue	📅Date	💻 Code	🌐 Demo
MMM	Mode Seeking meets Mean Seeking for Fast Long Video Generation	arXiv	2026-02	-	Demo
VideoGPA	VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation	arXiv	2026-01	GitHub	Demo
VideoAR	VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction	arXiv	2026-01	GitHub	Demo
RELIC	RELIC: Interactive Video World Model with Long-Horizon Memory	arXiv	2025-12	-	-
Infinity-RoPE	Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout	CVPR	2025-11	GitHub	Demo
iMontage	iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation	arXiv	2025-11	GitHub	Demo
STCDiT	STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution	CVPR	2025-11	GitHub	Demo
ChronoEdit	ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation	ICLR	2025-10	GitHub	Demo
Self-Forcing++	Self-Forcing++: Towards Minute-Scale High-Quality Video Generation	arXiv	2025-10	GitHub	Demo
LongLive	LongLive: Real-time Interactive Long Video Generation	ICLR	2025-09	GitHub	Demo
Mixture of Contexts (MoC)	Mixture of Contexts for Long Video Generation	ICLR	2025-08	-	Demo
Matrix-Game 2.0	Matrix-Game 2.0: An Open-Source Real-Time and Streaming Interactive World Model	arXiv	2025-08	GitHub	Demo
4D Video Generation	Geometry-aware 4D Video Generation for Robot Manipulation	ICLR	2025-07	GitHub	Demo
Self Forcing	Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion	NeurIPS	2025-06	GitHub	Demo
FlowMo	FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation	arXiv	2025-06	GitHub	Demo
FramePack	Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models	arXiv	2025-04	GitHub	Demo
EquiVDM	On Equivariance and Fast Sampling in Video Diffusion Models Trained with Warped Noise	arXiv	2025-04	-	-
RealGeneral	RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models	ICCV	2025-03	GitHub	Demo
SimulateMotion	Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss	arXiv	2025-01	-	Demo
Ouroboros-Diffusion	Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion	AAAI	2025-01	GitHub	-
CausVid	CausVid: From Slow Bidirectional to Fast Autoregressive Video Diffusion Models	CVPR	2024-12	GitHub	Demo
TiARA	Enhancing Long Video Generation Consistency without Tuning	ICML Workshop	2024-12	-	-
Pathways	Pathways on the Image Manifold: Image Editing via Video Generation	CVPR	2024-11	GitHub	Demo
JVID	JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation	arXiv	2024-09	-	-
RCDMs	Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models	AAAI	2024-07	GitHub	-
ConsistI2V	ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation	arXiv	2024-02	GitHub	Demo

⬆️ Back to Top

Semantic Consistency

Semantic consistency is essential for multi-image generation. These methods ensure that multiple generated images maintain coherent layouts, logical semantic relationships, and overall scene structure. In tasks like controllable generation, iterative image editing, and multi-region editing, semantic consistency provides the structural constraints needed to prevent conflicting content across different outputs.

🏷️ Name	📄 Title	🏛️ Venue	📅Date	💻 Code	🌐 Demo
ConsistCompose	ConsistCompose: Unified Multimodal Layout Control for Image Composition	CVPR	2025-11	-	-
Griffin	Griffin: Generative Reference and Layout Guided Image Composition	arXiv	2025-09	-	-
UniVid	UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models	WACV	2025-09	GitHub	-
SemLayoutDiff	SemLayoutDiff: Semantic Layout Generation with Diffusion Model for Indoor Scene Synthesis	arXiv	2025-08	GitHub	Demo
LAMIC	LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer	AAAI	2025-08	GitHub	-
IMAGHarmony	IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout	arXiv	2025-06	GitHub	Demo
PSDiffusion	PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment	arXiv	2025-05	GitHub	-
In-Context Edit	In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer	NeurIPS	2025-04	GitHub	Demo
Step1X-Edit	Step1X-Edit: A Practical Framework for General Image Editing	arXiv	2025-04	GitHub	-
VisAgent	VisAgent: Narrative-Preserving Story Visualization Framework	ICASSP	2025-03	-	-
DreamLayer	DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model	arXiv	2025-03	-	Demo
UniReal	UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics	arXiv	2024-12	GitHub	Demo
MSD	Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation	arXiv	2024-10	-	-
SpotActor	SpotActor: Training-Free Layout-Controlled Consistent Image Generation	arXiv	2024-09	-	-
PCDMs	Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models	ICLR	2023-10	GitHub	-
LayoutGPT	LayoutGPT: Compositional Visual Planning and Generation with Large Language Models	NeurIPS	2023-05	GitHub	Demo
Layout-Guidance	Training-Free Layout Control with Cross-Attention Guidance	WACV	2023-04	GitHub	Demo
P+	P+: Extended Textual Conditioning in Text-to-Image Generation	arXiv	2023-03	-	Demo
GLIGEN	GLIGEN: Open-Set Grounded Text-to-Image Generation	CVPR	2023-01	GitHub	Demo
Attend-and-Excite	Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models	SIGGRAPH	2023-01	GitHub	Demo
Custom Diffusion	Multi-Concept Customization of Text-to-Image Diffusion	CVPR	2022-12	GitHub	Demo
Structure Diffusion	Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis	ICLR	2022-12	GitHub	Demo
BLT	BLT: Bidirectional Layout Transformer for Controllable Layout Generation	ECCV	2021-12	GitHub	Demo
VLCStoryGAN	Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization	EMNLP	2021-10	-	-
ATISS	ATISS: Autoregressive Transformers for Indoor Scene Synthesis	NeurIPS	2021-10	GitHub	Demo
SceneFormer	SceneFormer: Indoor Scene Generation with Transformers	3DV	2020-12	GitHub	Demo
Layout Transformer	LayoutTransformer: Layout Generation and Completion with Self-Attention	ICCV	2020-06	GitHub	Demo
LayoutVAE	LayoutVAE: Stochastic Scene Layout Generation From a Label Set	ICCV	2019-07	-	-
LayoutGAN	LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators	ICLR	2019-01	GitHub	-

⬆️ Back to Top

Datasets

Multi-View Datasets

🗄️ Dataset	📊 Samples	📄 Paper	🏛️ Venue	📅 Date
Griffin	~30,000 frames, 270,000 images	Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark	AAAI	2025-03
MVImgNet2.0	520k	MVImgNet2.0: A Larger-scale Dataset of Multi-view Images	SIGGRAPH Asia	2024-12
OpenMaterial	1,001	OpenMaterial: A Large-scale Dataset of Complex Materials for 3D Reconstruction	arXiv	2024-06
Objaverse-XL	10M+	Objaverse-XL: A Universe of 10M+ 3D Objects	arXiv	2023-07
Objaverse	800k+	Objaverse: A Universe of Annotated 3D Objects	CVPR	2022-11
CO3D	19K	Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction	ICCV	2021-09

⬆️ Back to Top

Character Datasets

🗄️ Dataset	📊 Samples	📄 Paper	🏛️ Venue	📅 Date
WildActor	1.6M videos, 18M frames	WildActor: Unconstrained Identity-Preserving Video Generation	arXiv	2026-03
Solaris Dataset	12.64M frames	Solaris: Building a Multiplayer Video World Model in Minecraft	arXiv	2026-02
2K-Characters-10K-Stories	2K chars, 10K stories	2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence Consistency	arXiv	2025-12
OmniPerson	2000k	OmniPerson: Unified Identity-Preserving Pedestrian Generation	arXiv	2025-12
WithAnyone	2M	WithAnyone: Towards Controllable and ID Consistent Image Generation	ICLR	2025-10
MultiHuman-Testbench	1,800 samples, 5,550 faces	MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans	NeurIPS	2025-06
Openstory++	796k	Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling	arXiv	2024-08
MIS	12M	Many-to-many Image Generation with Auto-regressive Diffusion Models	arXiv	2024-03

⬆️ Back to Top

Temporal Datasets

🗄️ Dataset	📊 Samples	📄 Paper	🏛️ Venue	📅 Date
GenVID	80k	Artifact-Aware Evaluation for High-Quality Video Generation	arXiv	2026-01
SeqBench	320 prompts, 2,560 videos	SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models	arXiv	2025-10
GeneVA	5,452 prompts, 16,356 videos	GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts	arXiv	2025-09
BrokenVideos	3,254	BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos	arXiv	2025-06
FlintstonesSV	20k	FlintstonesSV++ : Improving Story Narration using Visual Scene Graph	ECIR	2025-04
MovieBench	from 160 movies	MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation	CVPR	2024-06
PororoSV	14k+	Storygan: A sequential conditional gan for story visualization	arXiv	2018-12

⬆️ Back to Top

Semantic Datasets

🗄️ Dataset	📊 Samples	📄 Paper	🏛️ Venue	📅 Date
MICo-150K	150k	MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition	CVPR	2025-12
Echo-4o	180k	Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation	arXiv	2025-08
LAION-SG	482k	LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations	arXiv	2024-12
SynCD	90k	Generating Multi-Image Synthetic Data for Text-to-Image Customization	ICCV	2025-02

⬆️ Back to Top

Benchmarks

Multi-View Benchmarks

🏷️ Name	📄 Paper	🏛️ Venue	📅 Date	💻 Code
Charge	Charge: A Comprehensive Novel View Synthesis Benchmark and Dataset to Bind Them All	arXiv	2025-12	-
MVGBench	MVGBench: A Comprehensive Benchmark for Multi-view Generation Models	ICCV	2025-07	GitHub
Griffin	Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark	AAAI	2025-03	GitHub
MEt3R	MEt3R: Measuring Multi-View Consistency in Generated Images	CVPR	2025-01	GitHub
Robust Multi-View Depth	A Benchmark and a Baseline for Robust Multi-view Depth Estimation	3DV	2022-09	GitHub

⬆️ Back to Top

Character Benchmarks

🏷️ Name	📄 Paper	🏛️ Venue	📅 Date	💻 Code
Solaris Benchmark	Solaris: Building a Multiplayer Video World Model in Minecraft	arXiv	2026-02	GitHub
IdentityStory	IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation	AAAI	2025-12	GitHub
Envision	Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights	arXiv	2025-12	GitHub
OmniContext	OmniGen2: Exploration to Advanced Multimodal Generation	arXiv	2025-06	GitHub
MultiHuman-Testbench	MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans	NeurIPS	2025-06	GitHub
ViStoryBench	ViStoryBench: Comprehensive Benchmark Suite for Story Visualization	CVPR	2025-05	GitHub
TBC-Bench	StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization	AAAI	2024-12	GitHub
DS-500	DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion	TPAMI	2024-07	GitHub
NewEpisode	Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models	arXiv	2024-05	-

⬆️ Back to Top

Temporal Benchmarks

🏷️ Name	📄 Paper	🏛️ Venue	📅 Date	💻 Code
SeqBench	SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models	arXiv	2025-10	GitHub
World Consistency Score	World Consistency Score: A Unified Metric for Video Generation Quality	ICCV	2025-08	GitHub
VBench++	VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models	TPAMI	2024-11	GitHub
TC-Bench	TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation	arXiv	2024-06	GitHub
MovieBench	MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation	CVPR	2024-06	GitHub
VBench	VBench: Comprehensive Benchmark Suite for Video Generative Models	CVPR	2023-11	GitHub

⬆️ Back to Top

Semantic Benchmarks

🏷️ Name	📄 Paper	🏛️ Venue	📅 Date	💻 Code
3SGen-Bench	3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory	arXiv	2025-12	-
MICo-150K	MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition	CVPR	2025-12	GitHub
M³T2IBench	M³T2IBench: A Large-Scale Multi-Category, Multi-Instance, Multi-Relation Text-to-Image Benchmark	arXiv	2025-10	-
LAMIC	LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer	AAAI	2025-08	GitHub
MMIG-Bench	MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models	arXiv	2025-05	GitHub
ImgEdit Benchmark	ImgEdit: A Unified Image Editing Dataset and Benchmark	NeurIPS	2025-05	GitHub
GEdit-Bench	Step1X-Edit: A Practical Framework for General Image Editing	arXiv	2025-04	GitHub
MRAMG-Bench	MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal Generation	SIGIR	2025-02	GitHub
T2I-CompBench++	T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation	TPAMI	2023-07	GitHub

⬆️ Back to Top

Applications

🏷️Name	🌐 Demo
World Labs	Demo
Skybox AI	Demo
Luma	Demo
Canva AI	Demo
Kling AI	Demo
Runway	Demo
Multiverse	Demo
Rodin	Demo

⬆️ Back to Top

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. When adding new papers, please follow these rules:

Ensure the paper is relevant to multi-image generation.
Insert the new entry in reverse chronological order (newest first).
Add links to paper and code (if available)

License

This project is released under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0, SPDX-License-identifier: Apache-2.0).

Citation

If you find this repo is helpful for your research, please cite our paper:

📄 A Survey on Multi-Image Generation: Advances, Challenges, and Future Directions

@article{chen2026survey,
  title={A Survey on Multi-Image Generation: Advances, Challenges, and Future Directions},
  author={Chen, Qirui and Wang, Guo-Hua and Chen, Jinyuan and Chen, Qing-Guo and Zhang, Jun and Luo, Weihua},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Multi-Image Generation

👉 What is This Repo for?

Contents

What is Multi-Image Generation?