Bug/Docs Issue: “Multimodal (text/images/audio/video)” claim doesn’t match main-branch implementation

### Summary
The README/paper/docs describe MIRIX as supporting multimodal inputs (text, images, audio, video). In the current `main` branch (v0.1.6), the code path for the memory layer appears to natively support **text + images**, but **not audio/video as first-class inputs**.

### What I expected
- Ability to send audio/video inputs (e.g., as message content types) and have MIRIX:
  - ingest them,
  - optionally transcribe/extract representations,
  - pass them to the LLM (or convert to text/frames),
  - and then store results in memory.

### What actually happens
- Message content types include `text`, `image_url`, `file_uri`, `google_cloud_file_uri` (no `audio` / `video`).
  - Code: `mirix/schemas/mirix_message_content.py`
- Images are handled end-to-end: they become `ImageContent(image_id=...)` and are expanded into real image bytes/URL for provider requests.
  - Code: `mirix/utils.py` (`convert_message_to_mirix_message`)
  - Code: `mirix/llm_api/openai_client.py` and `mirix/llm_api/anthropic_client.py` (`fill_image_content_in_messages`)
- Audio/video are not handled as native content types and are not automatically transcribed/processed in the main `/memory/add` pipeline.
  - There is a transcription helper (`mirix/voice_utils.py`), but it does not appear wired into the REST/queue processing path.
- Additionally, OpenAI provider code explicitly does not support `file_uri` as LLM input.
  - Code: `mirix/llm_api/openai_client.py` (raises `NotImplementedError` for `file_uri`)

@wangyu-ustc 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug/Docs Issue: “Multimodal (text/images/audio/video)” claim doesn’t match main-branch implementation #116

Summary

What I expected

What actually happens

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug/Docs Issue: “Multimodal (text/images/audio/video)” claim doesn’t match main-branch implementation #116

Description

Summary

What I expected

What actually happens

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions