Skip to content

Abstract VLM providers into an extensible Strategy Pattern#607

Open
Nancy-3012 wants to merge 1 commit into
param20h:devfrom
Nancy-3012:abstract-vlm-providers-strategy-pattern
Open

Abstract VLM providers into an extensible Strategy Pattern#607
Nancy-3012 wants to merge 1 commit into
param20h:devfrom
Nancy-3012:abstract-vlm-providers-strategy-pattern

Conversation

@Nancy-3012

@Nancy-3012 Nancy-3012 commented Jun 15, 2026

Copy link
Copy Markdown

Closes #592

This PR refactors the vision captioning pipeline by introducing a modular provider-based architecture for Vision-Language Models (VLMs).

Previously, rag/vision.py contained a hardcoded OpenAI implementation, making it difficult to support additional providers without adding provider-specific logic directly into the file.

Changes made:

Created a new backend/app/vision/ package with:
base.py – abstract base class defining a common caption(image_bytes) -> str interface.
registry.py – provider registration and retrieval system.
providers/ – provider implementations for OpenAI (refactored from existing code), Anthropic, Gemini, and Ollama.
Refactored rag/vision.py:
Removed the hardcoded _openai_caption() function.
Updated caption_image() to dynamically load providers using the registry.
Preserved the existing fallback flow (Vision Model → OCR → Placeholder).
Added new configuration settings:
ANTHROPIC_API_KEY
GOOGLE_API_KEY
OLLAMA_BASE_URL
Updated .env.example with configuration examples for all supported providers.

Benefits:

Easier addition of future vision providers.
Improved maintainability and separation of concerns.
Preserves existing functionality while enabling OpenAI, Anthropic, Gemini, and Ollama support.

🗂️ Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🔧 Refactor / code cleanup
  • 📝 Documentation update
  • 🎨 UI / styling change
  • ⚙️ CI / tooling / config change
  • 🧪 Tests

🧪 How was this tested?

  • Ran the backend locally (uvicorn app.main:app --reload)
  • Ran the frontend locally (npm run dev inside frontend/)
  • Tested the affected API endpoints manually
  • Added / updated tests

@Nancy-3012 Nancy-3012 requested a review from param20h as a code owner June 15, 2026 03:40
@param20h

Copy link
Copy Markdown
Owner

The test collection is failing due to an ImportError in backend/tests/test_vision_providers.py. The error occurs when pytest tries to import the test module.

Root Cause

The import statement on line 5 is incorrect:

from app.vision.base import BaseVisionProvider

This import path doesn't exist or the module structure is different. The test file is located in backend/tests/, so relative imports or the correct absolute path needs to be used.

Solution

Update the import statements in backend/tests/test_vision_providers.py to use the correct module path. Based on the file location, change:

# Line 5-6: Update these imports
from app.vision.base import BaseVisionProvider
from app.vision.registry import _REGISTRY, get_vision_provider, register_provider

To:

from backend.app.vision.base import BaseVisionProvider
from backend.app.vision.registry import _REGISTRY, get_vision_provider, register_provider

OR if app is properly configured as a package root in your Python path, ensure your pytest.ini or pyproject.toml has the correct pythonpath configuration:

[pytest]
pythonpath = backend

Why This Matters

Once the import error is fixed, pytest will successfully collect all 204 tests and run them properly, generating the coverage reports needed for the codecov upload step.

@Nancy-3012

Copy link
Copy Markdown
Author

@param20h please merge this branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] Abstract VLM providers into an extensible Strategy Pattern

2 participants