DataEvolver

Autonomous Synthetic Data Construction via VLM-Guided Iterative Rendering

DataEvolver is a goal-driven data synthesis pipeline that generates high-quality training datasets through an automated loop of 3D rendering, VLM (Vision-Language Model) quality review, and intelligent parameter adjustment. Unlike traditional pipelines with rigid scoring rules, DataEvolver uses free-form VLM feedback to perceive, diagnose, and fix rendering issues — producing photorealistic, scene-aware training data without human intervention.

中文 · Website · Paper · Dataset: DataEvolver-Rotate

Key Features

Goal-Driven Loop Agents — VLM reviewer provides semantic feedback ("flat lighting", "floating object") → AI agent selects targeted actions → re-render → repeat until quality goals are met
24 Atomic Actions — Structured action space across 5 groups: lighting, object placement, scene environment, and material properties — with anti-oscillation control and step-scale scheduling
Scene-Aware Rendering — Objects placed in real Blender scenes with HDRI environments, raycast ground detection, and preserved scene lighting
Multi-Modal Output — RGB, mask, depth, normal maps, and geometry metadata
End-to-End Automation — From natural language seed concept to training-ready dataset, zero human intervention

Pipeline Overview

Stage	What it does	Model / Tool
1. Text Expansion	LLM expands seed concept into detailed T2I prompt	Claude API (Anthropic)
2. T2I Generation	Generate 1024×1024 object image	Qwen-Image-2512
2.5. Segmentation	Extract RGBA foreground, remove background	SAM3
3. 3D Reconstruction	Reconstruct textured mesh from single image	Hunyuan3D-2.1
4. Scene Rendering	Blender Cycles 512spp scene-aware insertion	Blender 4.24
5. VLM Review Loop	Free-form review → agent action → re-render until keep	Qwen3.5-35B-A3B

The VLM Review Loop (Stage 5)

The core innovation: a goal-driven loop agent that iteratively improves rendering quality.

Anti-oscillation control prevents parameter thrashing:

Sign-flip tracking: freeze a parameter after 3 direction reversals
Step-scale scheduling: Round 0 → 100%, Round 1 → 70%, Round 2 → 50%, Round 3+ → 40%
Score-adaptive boost: ×1.2 when hybrid_score < 0.65

Prerequisites

OS: Linux (tested on Ubuntu 20.04+)
GPU: NVIDIA GPU with ≥24 GB VRAM for rendering; ≥80 GB for VLM inference
Python: 3.10+
Blender: 4.24
CUDA: Compatible with your PyTorch version

Required Models

Model	Purpose	Approx. Size
Qwen-Image-2512	T2I generation	~56 GB
SAM3	Foreground segmentation	~2 GB
Hunyuan3D-2.1	Image-to-3D reconstruction	~20 GB
Qwen3.5-35B-A3B	VLM quality reviewer	~35 GB
Blender 4.24	3D rendering engine	~300 MB

Quick Start

# Clone the repo
git clone https://github.com/PRIS-CV/DataEvolver.git
cd DataEvolver

# Configure model paths in each pipeline stage:
#   pipeline/stage1_text_expansion.py  → Anthropic API key
#   pipeline/stage2_t2i_generate.py    → MODEL_PATH (Qwen-Image-2512)
#   pipeline/stage2_5_sam2_segment.py  → SAM3_CKPT
#   pipeline/stage3_image_to_3d.py     → HUNYUAN3D_REPO, MODEL_HUB
#   pipeline/stage5_5_vlm_review.py    → Qwen3.5-35B model path
#   configs/scene_template.json        → blend_path, blender_binary

# Place your .blend scene file in assets/scene/
# Place HDRI environment maps in assets/hdri/

# Run the full pipeline
bash pipeline/run_all.sh

Project Structure

DataEvolver/
├── pipeline/                          # Core pipeline stages
│   ├── stage1_text_expansion.py             # LLM prompt generation
│   ├── stage2_t2i_generate.py               # Text-to-image (Qwen-Image-2512)
│   ├── stage2_5_sam2_segment.py             # SAM3 foreground extraction
│   ├── stage3_image_to_3d.py                # 3D mesh reconstruction (Hunyuan3D-2.1)
│   ├── stage4_scene_render.py               # Blender scene-aware rendering
│   ├── stage5_5_vlm_review.py               # VLM quality review (Qwen3.5-35B-A3B)
│   ├── stage5_6_feedback_apply.py           # Action selection & anti-oscillation
│   ├── asset_lifecycle.py                   # Asset lifecycle management
│   └── rotation_geomodal_dataset.py         # Training dataset loader
├── configs/
│   ├── scene_action_space.json              # 24 atomic actions definition
│   ├── scene_template.json                  # Blender scene template config
│   ├── vlm_review_schema.json               # VLM review output schema
│   ├── dataset_profiles/                    # Dataset configuration profiles
│   └── seed_concepts/                       # Seed object definitions (20/50 objects)
├── scripts/                           # Utility & build scripts
│   ├── run_scene_agent_monitor.py           # VLM loop agent monitor
│   ├── run_scene_agent_step.py              # Single-step agent execution
│   ├── export_rotation8_from_best_object_state.py  # Consistent rotation export
│   ├── build_rotation8_trainready_dataset.py       # Build training pairs
│   ├── build_object_split_for_rotation_dataset.py  # Object-disjoint split
│   ├── run_full_pipeline.py                 # Full pipeline orchestrator
│   ├── run_vlm_quality_gate_loop.py         # VLM quality gate loop
│   ├── feedback_loop/                       # Feedback loop utilities
│   └── ...                                  # Additional build & eval scripts
├── assets/
│   ├── hdri/                                # HDRI environment maps
│   └── scene/                               # Blender scene files (.blend)
├── paper/                             # Technical report (LaTeX source)
└── web/                               # Project website (GitHub Pages)

Action Space

The AI agent selects from 24 structured atomic actions organized in 5 groups:

Group	Actions	Parameters
Lighting (4)	Key light intensity ↑↓, key light yaw ±15°	Multiplicative ×1.2/×0.8 or additive, bounded
Object (6)	Elevation ±0.02, yaw ±15°, scale ×1.1/×0.9	Bounded within safe ranges
Scene (5)	Env rotation ±30°, env intensity ↑↓, contact shadow	HDRI and environment controls
Material (9)	Saturation, value/brightness, hue offset, roughness, specular/sheen	Fine-grained material tuning
Camera (0)	Reserved for future use	—

Full action definitions: configs/scene_action_space.json

DataEvolver-Rotate

The first benchmark dataset produced by DataEvolver — for rotation-conditioned image editing.

Metric	Value
Unique Objects	50
Rotation Angles	8 (0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°)
Training Pairs	350 (front → 7 target views)
Train / Val / Test	245 / 49 / 56 pairs (object-disjoint, seed=42)
Modalities	RGB, mask, depth, normal

Methodology

Each object uses a single canonical yaw-0° best state as the base. The object is then rotated while the scene, camera, lighting, and material remain fixed — ensuring cross-angle consistency.

Loading the Dataset

import json
from pathlib import Path
from PIL import Image

root = Path("path/to/dataset_split")
rows = []
with (root / "pairs" / "train_pairs.jsonl").open("r") as f:
    for line in f:
        rows.append(json.loads(line))

row = rows[0]
source = Image.open(root / row["source_image"]).convert("RGB")
target = Image.open(root / row["target_image"]).convert("RGB")
instruction = row["instruction"]  # e.g., "Rotate the object 45 degrees clockwise"

Using with Claude Code

DataEvolver is designed to work with Claude Code as an AI-powered development and operations assistant. Claude Code reads a project-level CLAUDE.md file to understand your environment, then helps you run pipelines, analyze results, and build datasets through natural language.

Step 1: Install Claude Code

npm install -g @anthropic-ai/claude-code

Step 2: Create your `CLAUDE.md`

Create a CLAUDE.md in the project root (it's gitignored — each user maintains their own). This file tells Claude Code about your specific environment:

# CLAUDE.md

## Remote Server
- SSH alias: `my-server`
- GPU: 3x A800 80GB (or your setup)
- Python: `/path/to/python3` (3.10+, with PyTorch)
- Blender: `/path/to/blender` (4.24)
- Code directory: `/path/to/DataEvolver`

## Model Paths
- Qwen-Image-2512: `/path/to/Qwen-Image-2512`
- SAM3 checkpoint: `/path/to/sam3/sam3.pt`
- Hunyuan3D-2.1 repo: `/path/to/Hunyuan3D-2.1`
- Hunyuan3D-2.1 weights: `/path/to/model_hub/Hunyuan3D-2.1`
- Qwen3.5-35B-A3B: `/path/to/Qwen3.5-35B-A3B`

## Scene Config
- Blender scene file: `/path/to/scene.blend`
- HDRI directory: `/path/to/hdri/`
- Render engine: CYCLES, 512 samples, 1024x1024

## Key Configs
- Action space: `configs/scene_action_space.json` (24 atomic actions)
- Scene template: `configs/scene_template.json`
- Dataset profiles: `configs/dataset_profiles/`

## Pipeline
Stage 1 (Text Expansion) → Stage 2 (T2I) → Stage 2.5 (SAM3)
→ Stage 3 (3D Reconstruction) → Stage 4 (Blender Render) → Stage 5 (VLM Loop)

## Working Rules
- Always test on the remote server, not locally
- Use tmux for long-running tasks (screen not available on all servers)
- Read trace.json free-form text for VLM results — not just agg.json scores
- Only stop VLM loop when reviewer explicitly says "keep"
- Check GPU usage before launching new jobs

Step 3: Create Claude Code Skills (Optional)

You can create reusable skills in a skills/ directory for common workflows:

mkdir -p skills/scene-agent-loop

# skills/scene-agent-loop/SKILL.md
---
name: scene-agent-loop
description: Manage the VLM review loop for scene rendering
---
# Scene Agent Loop
Monitor and continue the VLM review → render → agent decision loop.
Read trace.json to understand current state, then decide next action.

Step 4: Launch Claude Code

cd DataEvolver
claude

Claude Code will read your CLAUDE.md and understand the full project context. Example commands:

> Check GPU usage on the server
> Run the full pipeline for 10 new furniture objects
> Export rotation8 dataset from the latest best states
> Build train-ready dataset with object-disjoint split

Citation

@misc{zhang2026dataevolverletdatabuild,
  title={DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents},
  author={Qisong Zhang and Wenzhuo Wu and Zhuangzhuang Jia and Yunhao Yang and Huayu Zhang and Xianghao Zang and Zhixiang He and Zhongjiang He and Kongming Liang and Zhanyu Ma},
  year={2026},
  eprint={2605.01789},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2605.01789},
}

Roadmap

Completed

WebSearch-integrated AI Agent — Stage0 WebSearch supports both single-paper reproduction and multi-paper universal dataset modes. It records paper evidence, exports handoff documents, and has been validated on the remote rendering environment.
Multi-paper universal 3D dataset synthesis — the workflow starts from an anchor paper, expands to related 3D-control papers, and extracts shared dataset requirements for target images, masks, OSCR/structure views, camera metadata, object layout, occlusion relations, scene graphs, and validation traces.
Universal 3D layout dataset contract — DataEvolver now tracks a reusable schema for Blender-backed target renders, per-object masks, structure views, depth-order proxies, orientation proxies, camera pose/intrinsics/viewpoint tokens, mesh metadata, 3D boxes, scene graphs, spatial relations, and promotion decisions.
Blender-backed dual-object scenes — the current universal 3D workflow can generate real-scene, dual-object records with DataEvolver scene and mesh assets. General N-object compositional scenes remain a separate extension.
VLM/CV self-evolution loop — selected universal 3D records can be reviewed with calibrated hybrid VLM and CV geometry scores, bounded repair actions, and explicit accept/reject promotion traces.
Paper-specific and universal export modes — the WebSearch and universal layout scripts support both single-paper dataset reproduction and multi-paper universal schema export.

In Progress

Structured VLM knowledge base — current scoring already uses structured VLM/CV criteria, mask coverage checks, depth-order proxy checks, geometry-review metadata, and a VGGT-Omega proxy hook. A persistent prior-knowledge store for reusable VLM assessment criteria is still under development.
Real-world dataset ingestion — WebSearch can discover public datasets and paper resources, and public artifacts such as SeeThrough3D are used as references. A full crawl-clean-process-ingest pipeline for arbitrary real-world datasets is still pending.
Auto-generated reasoning-evaluation datasets — current records include validation logs, hybrid scores, and promotion traces. Fully automated benchmark generation after downstream model training remains a future release target.

Pending

Video object datasets — extend image-level rendering to temporally consistent object removal, translation, rotation, insertion, and attribute editing across frames.
Temporal review and filtering — add sequence-level checks for flicker, trajectory smoothness, mask stability, depth continuity, and action consistency.
General N-object compositional scenes — beyond the current dual-object workflow, this requires relation-aware sampling, collision handling, occlusion planning, and stronger multi-object VLM/CV review.
Large-scale public dataset ingestion — scale WebSearch-discovered dataset ingestion into repeatable crawling, cleaning, licensing, provenance, and structured export workflows.

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
pipeline		pipeline
scripts		scripts
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataEvolver

Key Features

Pipeline Overview

The VLM Review Loop (Stage 5)

Prerequisites

Required Models

Quick Start

Project Structure

Action Space

DataEvolver-Rotate

Methodology

Loading the Dataset

Using with Claude Code

Step 1: Install Claude Code

Step 2: Create your `CLAUDE.md`

Step 3: Create Claude Code Skills (Optional)

Step 4: Launch Claude Code

Citation

Roadmap

Completed

In Progress

Pending

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataEvolver

Key Features

Pipeline Overview

The VLM Review Loop (Stage 5)

Prerequisites

Required Models

Quick Start

Project Structure

Action Space

DataEvolver-Rotate

Methodology

Loading the Dataset

Using with Claude Code

Step 1: Install Claude Code

Step 2: Create your CLAUDE.md

Step 3: Create Claude Code Skills (Optional)

Step 4: Launch Claude Code

Citation

Roadmap

Completed

In Progress

Pending

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 2: Create your `CLAUDE.md`

Packages