Skip to content

Latest commit

 

History

History
372 lines (269 loc) · 13.7 KB

File metadata and controls

372 lines (269 loc) · 13.7 KB

Datasets for Memory in Robotics

This document catalogs datasets relevant to memory research in robotics, organized by application domain and memory-related characteristics.

Overview

Datasets play a crucial role in training and evaluating memory systems for robotics. This collection includes datasets for manipulation, navigation, perception, and lifelong learning tasks, with emphasis on those that pose challenges requiring memory capabilities such as long-horizon planning, partial observability, and temporal reasoning.


Large-Scale Robot Learning Datasets

Open X-Embodiment Dataset

The largest open-source real robot dataset, designed for cross-embodiment generalization.

Attribute Details
Size 1M+ real robot trajectories
Embodiments 22 robot types
Institutions 21 collaborating institutions
Skills 527 skills (160,266 tasks)
Modalities RGB, depth, proprioception
Memory Challenge Cross-embodiment transfer, skill generalization
Links [Paper] [Project] [GitHub] [HuggingFace]

OXE-AugE

Large-scale augmentation of Open X-Embodiment dataset with additional robot embodiments.

Attribute Details
Embodiments 9 additional robot types
Datasets 16 augmented datasets
Memory Challenge Enhanced cross-embodiment learning
Links [Paper]

DROID Dataset

Large-scale in-the-wild robot manipulation dataset with diverse environments.

Attribute Details
Size 76,000 demonstration trajectories
Duration 350 hours of interaction data
Scenes 564 unique scenes
Tasks 86 manipulation tasks
Collectors 50 data collectors
Modalities RGB-D, proprioception
Memory Challenge Scene diversity, task generalization
Links [Project]

BridgeData V2

Large and diverse dataset of robotic manipulation behaviors for scalable robot learning.

Attribute Details
Size 60,096 trajectories
Skills 13 foundational manipulation skills
Tasks Pick-and-place, pushing, sweeping, etc.
Modalities RGB, proprioception
Memory Challenge Multi-skill learning, compositional generalization
Links [Paper] [Project]

RoboMIND

Multi-embodiment intelligence normative data for robot manipulation.

Attribute Details
Focus Multi-embodiment robot manipulation
Year 2024
Memory Challenge Cross-embodiment transfer
Links [Paper]

AgiBot World Colosseo

Full-stack large-scale robot learning platform for bimanual manipulation.

Attribute Details
Focus Bimanual manipulation
Award IROS 2025 Award Finalist
Memory Challenge Coordinated bimanual control, long-horizon tasks
Links [GitHub]

LeRobot

Open-source models, datasets, and tools for real-world robotics in PyTorch.

Attribute Details
Focus Democratizing robot learning
Modalities Various robot platforms
Links [HuggingFace]

Memory-oriented Manipulation Datasets

Unfortunately, there are not many memory-oriented datasets for manipulation tasks. Here we include some relevant ones for open discussion. Please feel free to criticize and add more datasets to the list.

MIKASA-Robo Datasets

32 visual-based datasets specifically designed for memory-intensive robotic manipulation tasks.

Attribute Details
Tasks 32 memory-intensive tasks in 12 groups
Episodes per Task 250 episodes
Memory Types Tested Object, Spatial, Sequential, Capacity
Task Categories ShellGame, Intercept, Rotate, RememberColor/Shape, ChainOfColors
Modalities RGB images, proprioception
Memory Challenge Object permanence, spatial reasoning, sequence recall
Links [Paper] [GitHub] [HuggingFace]

RoboTwin 2.0

Scalable framework for data generation and benchmarking in bimanual robotic manipulation.

Attribute Details
Focus Bimanual manipulation
Features Procedural data generation
Memory Challenge Coordinated long-horizon bimanual tasks
Links [Project]

RoboMME

Large-scale manipulation dataset and benchmark for memory-augmented robotic generalist policies.

Attribute Details
Tasks 16 manipulation tasks
Memory Types Temporal, spatial, object, and procedural memory
Memory Challenge History-dependent manipulation, counting, object permanence, reference, and imitation
Links [Paper] [Project] [GitHub]

RoboMemArena

Large-scale benchmark data for robotic memory with generated trajectories and memory annotations.

Attribute Details
Tasks 26 long-horizon memory tasks
Annotations Subtask instructions and native keyframe annotations
Memory Challenge Memory formation, long-horizon partial observability, real-world memory evaluation
Links [Paper] [Project]

RMBench

Memory-dependent manipulation benchmark built on RoboTwin.

Attribute Details
Tasks 9 manipulation tasks
Platform RoboTwin-based simulation
Memory Challenge Multiple levels of memory complexity for policy design analysis
Links [Paper] [Project] [GitHub]

LIBERO-Mem

Object-centric benchmark suite for non-Markovian robotic manipulation.

Attribute Details
Focus Object tracking and sequenced subgoals
Memory Challenge Object identity, object-level partial observability, persistent interaction history
Links [Paper]

MemMimic

Non-Markovian imitation benchmark introduced with Gated Memory Policy.

Attribute Details
Focus Memory-dependent visuomotor imitation
Memory Regimes In-trial working memory and cross-trial reference memory
Links [Paper] [Project]

Camo-Dataset

Real-robot UR5e dataset introduced with Chameleon for episodic recall and memory-dependent manipulation.

Attribute Details
Platform UR5e real robot
Tasks Episodic recall, spatial tracking, sequential manipulation
Memory Challenge Perceptual aliasing, geometry-grounded recall, long-horizon control
Links [Paper]

MoMani

Automated benchmark and trajectory-generation setup introduced with EchoVLA for long-horizon mobile manipulation.

Attribute Details
Focus Mobile manipulation with navigation and manipulation
Data Generation MLLM-guided planning and feedback-driven refinement, supplemented with real-robot demonstrations
Memory Challenge Scene memory, episodic memory, changing spatial contexts
Links [Paper]

Navigation Datasets

Habitat-Matterport 3D (HM3D)

Large-scale 3D environments for embodied AI navigation research.

Attribute Details
Scenes 1,000+ large-scale 3D environments
Source Real-world 3D scans
Metrics Navigable area, navigation complexity
Robot Model Cylindrical robot (0.1m radius, 1.5m height)
Memory Challenge Large-scale spatial memory, exploration
Links [Paper] [Project] [GitHub]

HM3D Semantics (HM3DSEM)

Largest dataset of 3D real-world spaces with dense semantic annotations.

Attribute Details
Annotations Dense semantic labels
Tasks Object Goal Navigation
Memory Challenge Semantic scene understanding
Links [Paper]

HM3D-OVON

Open Vocabulary Object Goal Navigation benchmark.

Attribute Details
Focus Open-vocabulary navigation
Year 2024
Memory Challenge Semantic generalization, novel object recognition
Links [Paper]

NaVQA Dataset

Long-horizon robot navigation videos for question answering.

Attribute Details
Focus Spatio-temporal memory for navigation
Tasks Perceptual question-answering
Memory Challenge Long-horizon reasoning, semantic memory
Links [Project]

VLNGo2-Matterport

Vision-Language Navigation dataset for quadruped robots.

Attribute Details
Focus Quadruped robot navigation
Year 2025
Memory Challenge Continuous navigation, language grounding
Links [Paper]

Embodied AI Benchmarks with Datasets

BEHAVIOR-1K

Human-centered embodied AI benchmark with 1,000 everyday activities.

Attribute Details
Activities 1,000 everyday household tasks
Demonstrations 10,000 human trajectories (200 per task)
Duration 1,200+ hours of demonstration data
Knowledge Base Crowdsourced activity definitions
Memory Challenge Long-horizon planning, state tracking
Links [Paper] [Project]

Mini-BEHAVIOR

Procedurally generated benchmark for long-horizon decision-making.

Attribute Details
Tasks 20 long-horizon, human-centered tasks
Features Procedural generation
Memory Challenge Multi-step planning, state persistence
Links [Paper]

EmbodiedBench

Comprehensive benchmark for evaluating MLLMs as embodied agents.

Attribute Details
Tasks 1,128 testing tasks
Environments 4 (ALFRED, Habitat, Navigation, Manipulation)
Capabilities Tested 6 critical agent capabilities
Memory Challenge Long-horizon planning, spatial awareness
Links [Paper] [Project] [HuggingFace]

CookBench

Long-horizon embodied planning benchmark for complex cooking scenarios.

Attribute Details
Focus Cooking tasks
Year 2025
Memory Challenge Multi-step procedural planning
Links [Paper]

Lifelong Learning Datasets

Humanoid Everyday

Comprehensive robotic dataset for humanoid robots in everyday activities.

Attribute Details
Focus Full-body humanoid capabilities
Tasks Locomotion to dexterous manipulation
Year 2025
Memory Challenge Continual skill acquisition
Links [Paper]

RealSource Dataset

Open-source robot dataset from RealMan Robotics.

Attribute Details
Environments 10 real-world simulated environments
Source RealMan Beijing Humanoid Robot Data Training Center
Year 2025
Links [News]

Dataset Comparison

Dataset Year Size Embodiments Memory Focus
Open X-Embodiment 2023 1M+ trajectories 22 Cross-embodiment
DROID 2023 76K trajectories 1 Scene diversity
BridgeData V2 2023 60K trajectories 1 Multi-skill
MIKASA-Robo 2025 32 datasets 1 Memory-intensive
HM3D 2021 1000 scenes N/A Navigation
BEHAVIOR-1K 2024 10K demos N/A Long-horizon
EmbodiedBench 2025 1128 tasks N/A MLLM evaluation

Contributing Datasets

If you know of a relevant dataset that should be included, please:

  1. Open an issue or submit a pull request
  2. Include all relevant information about the dataset:
    • Name and description
    • Size (trajectories, episodes, hours)
    • Modalities (RGB, depth, proprioception, etc.)
    • Memory-related challenges it addresses
    • Links to paper, download, and project page
  3. Ensure the dataset is publicly available or has clear access instructions