OpenMOSS

OpenMOSS presents a collection of our research on Large Language Models and Multimodal Foundation Models, supported by Shanghai Innovation Institute (SII), Fudan University, and MOSI.AI.

📌 This page is a curated overview. For the complete and most recent list of repositories, visit the OpenMOSS organization.

Last updated: 2026-05-27

Projects

MOSS-LLM

Foundation language models and training infrastructure.

Project	Description
MOSS	An open-source tool-augmented conversational language model from Fudan University — the founding project of the OpenMOSS series.
CoLLiE	A library for collaborative training of large language models in an efficient way.

MOSS-VL

Multimodal models for visual and video understanding.

Project	Description
MOSS-VL	Core multimodal model series within the OpenMOSS ecosystem, dedicated to visual understanding. Includes the XRoPE architecture and a fully open training stack.
MOSS-Video-Preview	A real-time video understanding foundation model built on Llama-3.2-Vision, with comprehensively extended video processing and multimodal reasoning capabilities.

MOSS-Audio

End-to-end models for audio understanding and generation — speech, sound, music.

Project	Description
MOSS-TTS-Nano	A 0.1B-parameter open-source multilingual TTS model — runs in real time on CPU without a GPU, designed for local demos, web serving, and lightweight product integration.
MOSS-TTS	Open-source TTS family covering stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS.
MOSS-TTSD · 🤗 HF	Spoken dialogue generation model with expressive multi-speaker synthesis, long-context modeling, flexible speaker control, multilingual support, and zero-shot voice cloning.
MOSS-Audio	Open-source foundation model for unified audio understanding — speech, sound, music, captioning, QA, and reasoning in real-world scenarios.
MOSS-Audio-Tokenizer	Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of audio, supports streaming and variable bitrates, delivers SOTA reconstruction.
MOSS-Speech	A true speech-to-speech large language model without text guidance.
MOSS-Music	Music understanding model for captioning, lyrics ASR, structural analysis, chord/key/tempo reasoning, and long-form musical QA.
SpeechGPT-2.0-preview	GPT-4o-level, real-time spoken dialogue system.

MOSS-Omni

Unified multimodal generation across modalities.

Project	Description
AnyGPT	Unified multimodal LLM with discrete sequence modeling.
MOVA	Towards scalable and synchronized video–audio generation.

MOSS-Robot

Embodied AI: humanoid control, robotic manipulation, and embodied planning.

Project	Description
FRoM-W1	Towards general humanoid whole-body control with language instructions (arXiv 2026). Supports Unitree H1/G1 and FFTAI humanoid robots.
RoboOmni	Proactive robot manipulation in omni-modal context.
Embodied-Planner-R1 · arXiv	A reinforcement learning framework that enables LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
RoboJuDo	Deployment framework for the FRoM-W1 humanoid project.
VehicleWorld	First comprehensive multi-device environment for intelligent vehicle interaction, modeling complex interconnected systems in modern cockpits.

MOSS-Aiology

Mechanistic interpretability of large language models.

Project	Description
Llamascopium · 🤗 HF · Neuronpedia	(formerly Language-Model-SAEs) A performant, fully-distributed framework for training, analyzing, and visualizing Sparse Autoencoders (SAEs) and frontier variants, empowering scalable and systematic mechanistic interpretability research.
Lorsa	Low-rank sparse attention for interpretability.

Research

Embodied-AI

The Embodied AI Team empowers large models to execute real-world tasks, aiming to automate tedious chores and unlock superhuman intelligence through environmental interaction. We believe true AI emerges from engaging with the physical world.

Project	Venue	Description
VLABench · arXiv · GitHub	ICCV 2025	The first large-scale robot manipulation benchmark designed to fairly evaluate the multi-dimensional ability of general-purpose Vision-Language-Action models.
D2PO · arXiv · GitHub	ACL 2025	A unified learning framework that empowers embodied agents with stronger world modeling and embodied planning ability via dual preference optimization.
World-Aware-Planning · arXiv · GitHub	—	World-aware narrative enhancement bridging high-level task instructions and nuanced real-world environment details.
Embodied-Planner-R1 · arXiv · GitHub	—	RL framework enabling LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
Awesome-WAM · GitHub	—	A curated, continuously updated reading list, paper blogs, and resources for World Action Models in embodied AI.

NewArch

The SII-OpenMOSS New Architecture Team explores new architectures and paradigms of LLMs, particularly from the perspective of long-context capability and efficiency.

Project	Venue	Description
ReAttention · arXiv · GitHub	ICLR 2025	Training-free approach that enables LLMs to support infinite context length extrapolation with finite attention scope.
LongLLaDA · arXiv · GitHub	AAAI 2026	First systematic investigation comparing long-context performance of diffusion LLMs and traditional auto-regressive LLMs.
RoPE++ · GitHub	ICLR 2026	Beyond Real: imaginary extension of Rotary Position Embeddings for long-context LLMs.
Sparse-dLLM · GitHub	—	Sparse diffusion-based large language models.
FourierAttention · arXiv	—	Training-free framework that exploits the heterogeneous roles of transformer head dimensions.
Thus Spake Long-Context LLM · arXiv · GitHub	—	A survey on the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation.

Multimodal Evaluation

Project	Venue	Description
GAOKAO-MM · GitHub	ACL 2024 Findings	A Chinese human-level benchmark for multimodal model evaluation.

Alignment & Safety

Project	Venue	Description
HalluQA · GitHub	—	Dataset and evaluation script for evaluating hallucinations in Chinese large language models.
Say-I-Don't-Know · GitHub	ICML 2024	Can AI assistants know what they don't know?
LongSafety · GitHub	—	Safety evaluation for long-context LLMs.

Tool Use & Agents

Project	Description
UnifiedToolHub · GitHub	A comprehensive project supporting LLM-based tool use — unifies dataset formats and provides training, annotation, and evaluation functionalities.
ABC-Bench · GitHub	A benchmark for agentic backend coding — evaluates whether code agents can explore repos, edit code, configure environments, deploy services, and pass external end-to-end API tests.
OurClaw · GitHub	Institutional OpenClaw Solution. Share One Claw with Others.

Contact

For collaborations, internships, or general inquiries: openmoss@sii.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenMOSS

Projects

MOSS-LLM

MOSS-VL

MOSS-Audio

MOSS-Omni

MOSS-Robot

MOSS-Aiology

Research

Embodied-AI

NewArch

Multimodal Evaluation

Alignment & Safety

Tool Use & Agents

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OpenMOSS

Projects

MOSS-LLM

MOSS-VL

MOSS-Audio

MOSS-Omni

MOSS-Robot

MOSS-Aiology

Research

Embodied-AI

NewArch

Multimodal Evaluation

Alignment & Safety

Tool Use & Agents

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages