Skip to content

[FEAT] Abstract VLM providers into an extensible Strategy Pattern #592

@suhaniiz

Description

@suhaniiz

Is your feature request related to a problem? Please describe.

Yes. The current caption_image function hardcodes the initialization, API payload mapping, and error-handling logic specifically for OpenAI. If we want to expand our RAG pipeline to support other Vision-Language Models (VLMs) like Anthropic Claude, Google Gemini, or local models via Ollama/LLaVA, this function will rapidly grow into a deeply nested, brittle block of if/elif statements. This violates the Open-Closed Principle, making maintenance and scaling difficult.

Describe the solution you'd like

We need to decouple the core image captioning pipeline from individual vendor SDK implementations by introducing a Strategy Pattern:

Create a dedicated directory structure (e.g., app/vision/providers/).

Define a standard abstract base class or interface (e.g., BaseVisionProvider) requiring a .caption(image_bytes: bytes) -> str method.

Refactor the existing OpenAI code into its own class (OpenAIVisionProvider) conforming to this interface.

Implement a simple factory or registry lookup inside caption_image that instantiates the correct provider dynamically based on the VISION_PROVIDER string setting.

Describe alternatives you've considered

Keeping it inline: Continuing to append elif provider == "anthropic": blocks directly inside caption_image. This was rejected because vendor-specific error handling and dependencies will clutter the core pipeline file.

Function-mapping dict: Mapping string keys to simple standalone helper functions within the same file. While cleaner than nested conditions, it still leaves the file bloated with vendor-specific configuration code.

Additional Context

GSSoC '26

  • Yes, I am participating in GirlScript Summer of Code and would like to build this.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or improvementgssocGirlScript Summer of Code 2026 issue/PR

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions