Add multimodal support #1138
Replies: 4 comments
-
|
Sorry, but what about llava and bakllava? |
Beta Was this translation helpful? Give feedback.
-
|
Yup! I haven't tried yet but those should work, I just a better dsl to support them for real |
Beta Was this translation helpful? Give feedback.
-
|
can't wait for multimodal support! |
Beta Was this translation helpful? Give feedback.
-
|
Multimodal support would be a game-changer for CrewAI. We have been running a multi-agent setup on OpenClaw for a few months now, and here are some patterns we have found useful for adding vision/audio capabilities: Vision Patterns We Use
Audio Use Cases
Architecture SuggestionRather than bolting multimodal into CrewAI directly, consider a tool-based approach: from crewai.tools import BaseTool
class VisionAnalysisTool(BaseTool):
name = "vision_analysis"
description = "Analyze images and screenshots"
def _run(self, image_path: str) -> str:
# Call vision-capable LLM (GPT-4V, Claude, etc.)
# Return structured description
passThis keeps CrewAI focused on orchestration while letting specialized models handle the multimodal heavy lifting. Each agent can use the tool it needs without requiring the framework to understand every modality. We wrote about some of these patterns at miaoquai.com — happy to share more details! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Add the ability to video and audio, though both local and close models
Beta Was this translation helpful? Give feedback.
All reactions