Abstract VLM providers into an extensible Strategy Pattern#607
Abstract VLM providers into an extensible Strategy Pattern#607Nancy-3012 wants to merge 1 commit into
Conversation
|
The test collection is failing due to an ImportError in Root CauseThe import statement on line 5 is incorrect: from app.vision.base import BaseVisionProviderThis import path doesn't exist or the module structure is different. The test file is located in SolutionUpdate the import statements in # Line 5-6: Update these imports
from app.vision.base import BaseVisionProvider
from app.vision.registry import _REGISTRY, get_vision_provider, register_providerTo: from backend.app.vision.base import BaseVisionProvider
from backend.app.vision.registry import _REGISTRY, get_vision_provider, register_providerOR if [pytest]
pythonpath = backendWhy This MattersOnce the import error is fixed, pytest will successfully collect all 204 tests and run them properly, generating the coverage reports needed for the codecov upload step. |
|
@param20h please merge this branch |
Closes #592
This PR refactors the vision captioning pipeline by introducing a modular provider-based architecture for Vision-Language Models (VLMs).
Previously, rag/vision.py contained a hardcoded OpenAI implementation, making it difficult to support additional providers without adding provider-specific logic directly into the file.
Changes made:
Created a new backend/app/vision/ package with:
base.py – abstract base class defining a common caption(image_bytes) -> str interface.
registry.py – provider registration and retrieval system.
providers/ – provider implementations for OpenAI (refactored from existing code), Anthropic, Gemini, and Ollama.
Refactored rag/vision.py:
Removed the hardcoded _openai_caption() function.
Updated caption_image() to dynamically load providers using the registry.
Preserved the existing fallback flow (Vision Model → OCR → Placeholder).
Added new configuration settings:
ANTHROPIC_API_KEY
GOOGLE_API_KEY
OLLAMA_BASE_URL
Updated .env.example with configuration examples for all supported providers.
Benefits:
Easier addition of future vision providers.
Improved maintainability and separation of concerns.
Preserves existing functionality while enabling OpenAI, Anthropic, Gemini, and Ollama support.
🗂️ Type of Change
🧪 How was this tested?
uvicorn app.main:app --reload)npm run devinsidefrontend/)