Summary
The README/paper/docs describe MIRIX as supporting multimodal inputs (text, images, audio, video). In the current main branch (v0.1.6), the code path for the memory layer appears to natively support text + images, but not audio/video as first-class inputs.
What I expected
- Ability to send audio/video inputs (e.g., as message content types) and have MIRIX:
- ingest them,
- optionally transcribe/extract representations,
- pass them to the LLM (or convert to text/frames),
- and then store results in memory.
What actually happens
- Message content types include
text, image_url, file_uri, google_cloud_file_uri (no audio / video).
- Code:
mirix/schemas/mirix_message_content.py
- Images are handled end-to-end: they become
ImageContent(image_id=...) and are expanded into real image bytes/URL for provider requests.
- Code:
mirix/utils.py (convert_message_to_mirix_message)
- Code:
mirix/llm_api/openai_client.py and mirix/llm_api/anthropic_client.py (fill_image_content_in_messages)
- Audio/video are not handled as native content types and are not automatically transcribed/processed in the main
/memory/add pipeline.
- There is a transcription helper (
mirix/voice_utils.py), but it does not appear wired into the REST/queue processing path.
- Additionally, OpenAI provider code explicitly does not support
file_uri as LLM input.
- Code:
mirix/llm_api/openai_client.py (raises NotImplementedError for file_uri)
@wangyu-ustc
Summary
The README/paper/docs describe MIRIX as supporting multimodal inputs (text, images, audio, video). In the current
mainbranch (v0.1.6), the code path for the memory layer appears to natively support text + images, but not audio/video as first-class inputs.What I expected
What actually happens
text,image_url,file_uri,google_cloud_file_uri(noaudio/video).mirix/schemas/mirix_message_content.pyImageContent(image_id=...)and are expanded into real image bytes/URL for provider requests.mirix/utils.py(convert_message_to_mirix_message)mirix/llm_api/openai_client.pyandmirix/llm_api/anthropic_client.py(fill_image_content_in_messages)/memory/addpipeline.mirix/voice_utils.py), but it does not appear wired into the REST/queue processing path.file_urias LLM input.mirix/llm_api/openai_client.py(raisesNotImplementedErrorforfile_uri)@wangyu-ustc