Skip to content

[Model] CogACT #19

@skt0725

Description

@skt0725

ID (slug)

cogact

Name

CogACT

Organization

THUDM / Microsoft Research Asia

Year

2024

Description (English)

CogACT is a componentized Vision-Language-Action architecture that decouples cognition from action. It utilizes powerful Vision-Language Models to extract cognitive features, which then condition a specialized Diffusion Transformer-based action module to predict continuous, temporally-correlated robotic action sequences.

Description (Korean)

CogACT는 기존의 단일 신경망 모델들과 달리 인지와 행동을 명확히 분리한 컴포넌트형 비전-언어-행동 모델입니다. 강력한 비전-언어 모델을 통해 인지적 특징을 추출하고, 이를 조건으로 특화된 Diffusion Transformer 기반의 행동 모델이 복잡하고 연속적인 로봇의 물리적 제어 궤적을 예측하도록 설계되었습니다.

GitHub URL

https://github.com/microsoft/CogACT

Paper URL (arXiv)

https://cogact.github.io/CogACT_paper.pdf

HuggingFace URL

https://huggingface.co/CogACT

Project Page URL

https://cogact.github.io/

Categories

  • manipulation
  • locomotion
  • navigation
  • dexterous
  • whole-body
  • aerial

Hardware Targets

  • manipulator
  • humanoid
  • quadruped
  • biped
  • mobile
  • drone
  • hand

Learning Methods

  • VLA
  • IL
  • RL
  • diffusion
  • world_model
  • sim2real

Framework

  • pytorch
  • jax
  • tensorflow
  • other

Communication

  • ros2
  • grpc
  • lcm
  • zenoh

Tags (optional)

VLA, diffusion, foundation-model

Checklist

  • The model is open-source (code or weights publicly available)
  • At least one URL (GitHub, paper, or HuggingFace) is provided
  • I have read the contribution guidelines

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions