Skip to content

sii-research/OpenMOSS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 

Repository files navigation

OpenMOSS

OpenMOSS presents a collection of our research on Large Language Models and Multimodal Foundation Models, supported by Shanghai Innovation Institute (SII), Fudan University, and MOSI.AI.

📌 This page is a curated overview. For the complete and most recent list of repositories, visit the OpenMOSS organization.

Last updated: 2026-05-27


Projects

MOSS-LLM

Foundation language models and training infrastructure.

Project Description
MOSS An open-source tool-augmented conversational language model from Fudan University — the founding project of the OpenMOSS series.
CoLLiE A library for collaborative training of large language models in an efficient way.

MOSS-VL

Multimodal models for visual and video understanding.

Project Description
MOSS-VL Core multimodal model series within the OpenMOSS ecosystem, dedicated to visual understanding. Includes the XRoPE architecture and a fully open training stack.
MOSS-Video-Preview A real-time video understanding foundation model built on Llama-3.2-Vision, with comprehensively extended video processing and multimodal reasoning capabilities.

MOSS-Audio

End-to-end models for audio understanding and generation — speech, sound, music.

Project Description
MOSS-TTS-Nano A 0.1B-parameter open-source multilingual TTS model — runs in real time on CPU without a GPU, designed for local demos, web serving, and lightweight product integration.
MOSS-TTS Open-source TTS family covering stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS.
MOSS-TTSD · 🤗 HF Spoken dialogue generation model with expressive multi-speaker synthesis, long-context modeling, flexible speaker control, multilingual support, and zero-shot voice cloning.
MOSS-Audio Open-source foundation model for unified audio understanding — speech, sound, music, captioning, QA, and reasoning in real-world scenarios.
MOSS-Audio-Tokenizer Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of audio, supports streaming and variable bitrates, delivers SOTA reconstruction.
MOSS-Speech A true speech-to-speech large language model without text guidance.
MOSS-Music Music understanding model for captioning, lyrics ASR, structural analysis, chord/key/tempo reasoning, and long-form musical QA.
SpeechGPT-2.0-preview GPT-4o-level, real-time spoken dialogue system.

MOSS-Omni

Unified multimodal generation across modalities.

Project Description
AnyGPT Unified multimodal LLM with discrete sequence modeling.
MOVA Towards scalable and synchronized video–audio generation.

MOSS-Robot

Embodied AI: humanoid control, robotic manipulation, and embodied planning.

Project Description
FRoM-W1 Towards general humanoid whole-body control with language instructions (arXiv 2026). Supports Unitree H1/G1 and FFTAI humanoid robots.
RoboOmni Proactive robot manipulation in omni-modal context.
Embodied-Planner-R1 · arXiv A reinforcement learning framework that enables LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
RoboJuDo Deployment framework for the FRoM-W1 humanoid project.
VehicleWorld First comprehensive multi-device environment for intelligent vehicle interaction, modeling complex interconnected systems in modern cockpits.

MOSS-Aiology

Mechanistic interpretability of large language models.

Project Description
Llamascopium · 🤗 HF · Neuronpedia (formerly Language-Model-SAEs) A performant, fully-distributed framework for training, analyzing, and visualizing Sparse Autoencoders (SAEs) and frontier variants, empowering scalable and systematic mechanistic interpretability research.
Lorsa Low-rank sparse attention for interpretability.

Research

Embodied-AI

The Embodied AI Team empowers large models to execute real-world tasks, aiming to automate tedious chores and unlock superhuman intelligence through environmental interaction. We believe true AI emerges from engaging with the physical world.

Project Venue Description
VLABench · arXiv · GitHub ICCV 2025 The first large-scale robot manipulation benchmark designed to fairly evaluate the multi-dimensional ability of general-purpose Vision-Language-Action models.
D2PO · arXiv · GitHub ACL 2025 A unified learning framework that empowers embodied agents with stronger world modeling and embodied planning ability via dual preference optimization.
World-Aware-Planning · arXiv · GitHub World-aware narrative enhancement bridging high-level task instructions and nuanced real-world environment details.
Embodied-Planner-R1 · arXiv · GitHub RL framework enabling LLMs to acquire embodied planning capabilities through autonomous exploration with sparse rewards.
Awesome-WAM · GitHub A curated, continuously updated reading list, paper blogs, and resources for World Action Models in embodied AI.

NewArch

The SII-OpenMOSS New Architecture Team explores new architectures and paradigms of LLMs, particularly from the perspective of long-context capability and efficiency.

Project Venue Description
ReAttention · arXiv · GitHub ICLR 2025 Training-free approach that enables LLMs to support infinite context length extrapolation with finite attention scope.
LongLLaDA · arXiv · GitHub AAAI 2026 First systematic investigation comparing long-context performance of diffusion LLMs and traditional auto-regressive LLMs.
RoPE++ · GitHub ICLR 2026 Beyond Real: imaginary extension of Rotary Position Embeddings for long-context LLMs.
Sparse-dLLM · GitHub Sparse diffusion-based large language models.
FourierAttention · arXiv Training-free framework that exploits the heterogeneous roles of transformer head dimensions.
Thus Spake Long-Context LLM · arXiv · GitHub A survey on the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation.

Multimodal Evaluation

Project Venue Description
GAOKAO-MM · GitHub ACL 2024 Findings A Chinese human-level benchmark for multimodal model evaluation.

Alignment & Safety

Project Venue Description
HalluQA · GitHub Dataset and evaluation script for evaluating hallucinations in Chinese large language models.
Say-I-Don't-Know · GitHub ICML 2024 Can AI assistants know what they don't know?
LongSafety · GitHub Safety evaluation for long-context LLMs.

Tool Use & Agents

Project Description
UnifiedToolHub · GitHub A comprehensive project supporting LLM-based tool use — unifies dataset formats and provides training, annotation, and evaluation functionalities.
ABC-Bench · GitHub A benchmark for agentic backend coding — evaluates whether code agents can explore repos, edit code, configure environments, deploy services, and pass external end-to-end API tests.
OurClaw · GitHub Institutional OpenClaw Solution. Share One Claw with Others.

Contact

For collaborations, internships, or general inquiries: openmoss@sii.edu.cn

About

OpenMOSS presents a collection of our research on LLMs, supported by SII, Fudan and Mosi.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors