The openai_compatible provider type has been successfully added to the Document-Analyzer-Operator Platform.
-
backend/app/models/llm_provider.py- Added
OPENAI_COMPATIBLE = "openai_compatible"to theProviderTypeenum - Updated enum documentation to clarify support for custom OpenAI-compatible APIs
- Added
-
backend/app/services/llm_client.py- Added
OpenAICompatibleProviderclass extendingBaseProvider - Implemented all required methods:
chat_completion()- Chat completions with streaming supporttext_completion()- Text completionsembeddings()- Embedding generationlist_models()- List available modelsestimate_cost()- Cost estimation with configurable pricing
- Added provider to
LLMClient.PROVIDER_CLASSESregistry - Proper error handling for authentication, rate limits, and server errors
- Added
-
frontend/src/types/index.ts- Added
'openai_compatible'toProviderTypetype union
- Added
-
frontend/src/app/dashboard/settings/llm-providers/create-dialog.tsx- Added "OpenAI-Compatible" option to provider type selector
- Added description: "Custom OpenAI-compatible APIs (LocalAI, FastChat, Together AI, Anyscale, etc.)"
- Made API key optional (displayed as optional for all compatible providers)
- Added comprehensive preset dropdown with 8 options:
- LocalAI
- FastChat
- Together AI
- Anyscale Endpoints
- Groq
- DeepInfra
- Lepton AI
- Custom
- Added contextual help text for OpenAI-compatible providers
backend/docs/LLM_PROVIDERS.md- Added comprehensive "OpenAI-Compatible Providers" section
- Listed 12+ popular OpenAI-compatible services (local and cloud)
- Provided configuration examples for each major service
- Added pricing comparison table
- Included setup guides for LocalAI, Together AI, Groq, and Anyscale
- Added troubleshooting section for OpenAI-compatible providers
| Preset | Base URL | API Key Required | Description |
|---|---|---|---|
| LocalAI | http://localhost:8080/v1 |
No | Self-hosted LocalAI instance |
| FastChat | http://localhost:8000/v1 |
No | FastChat local server |
| Preset | Base URL | API Key Required | Description |
|---|---|---|---|
| Together AI | https://api.together.xyz/v1 |
Yes | Cloud hosted open-source models |
| Anyscale Endpoints | https://api.endpoints.anyscale.com/v1 |
Yes | Ray-powered model serving |
| Groq | https://api.groq.com/openai/v1 |
Yes | Ultra-fast LPU inference |
| DeepInfra | https://api.deepinfra.com/v1 |
Yes | Serverless model inference |
| Lepton AI | https://<workspace>.lepton.run/api/v1 |
Yes | Lepton AI cloud platform |
| Custom | (user enters) | Optional | Custom URL entry |
{
"name": "LocalAI",
"provider_type": "openai_compatible",
"base_url": "http://localhost:8080/v1",
"model_name": "llama-2-7b",
"is_active": true,
"is_default": false,
"config": {
"temperature": 0.7,
"max_tokens": 4096
}
}{
"name": "Together AI",
"provider_type": "openai_compatible",
"base_url": "https://api.together.xyz/v1",
"api_key": "your-together-api-key",
"model_name": "togethercomputer/llama-2-70b-chat",
"is_active": true,
"is_default": false,
"config": {
"temperature": 0.7,
"max_tokens": 4096,
"pricing": {
"input": 0.0009,
"output": 0.0009
}
}
}{
"name": "Groq",
"provider_type": "openai_compatible",
"base_url": "https://api.groq.com/openai/v1",
"api_key": "your-groq-api-key",
"model_name": "llama3-70b-8192",
"is_active": true,
"is_default": false,
"config": {
"temperature": 0.7,
"max_tokens": 4096
}
}{
"name": "Anyscale Endpoints",
"provider_type": "openai_compatible",
"base_url": "https://api.endpoints.anyscale.com/v1",
"api_key": "your-anyscale-api-key",
"model_name": "meta-llama/Llama-2-7b-chat-hf",
"is_active": true,
"is_default": false,
"config": {
"temperature": 0.7,
"max_tokens": 4096
}
}For cloud services, you can add pricing information to the config to enable cost estimation:
{
"config": {
"pricing": {
"input": 0.0009,
"output": 0.0009
}
}
}Pricing is specified in USD per 1,000 tokens.
The default timeout is 60 seconds with 3 retry attempts. These can be configured in the LLMClient initialization if needed.
Model names are service-specific. Check your provider's documentation for available models:
- Together AI:
togethercomputer/llama-2-70b-chat,togethercomputer/CodeLlama-34b-Instruct, etc. - Anyscale:
meta-llama/Llama-2-7b-chat-hf,mistralai/Mixtral-8x7B-Instruct-v0.1, etc. - Groq:
llama3-70b-8192,mixtral-8x7b-32768,gemma-7b-it, etc. - LocalAI: Depends on models you've installed locally
- ✅ LocalAI
- ✅ FastChat
- ✅ vLLM API Server
- ✅ Text Generation Inference
- ✅ Together AI
- ✅ Anyscale Endpoints
- ✅ Groq
- ✅ DeepInfra
- ✅ Lepton AI
- ✅ Replicate (OpenAI-compatible mode)
To test the implementation:
-
Backend Validation
python -m py_compile backend/app/models/llm_provider.py python -m py_compile backend/app/services/llm_client.py
-
Frontend Build
cd frontend npm run build -
API Test
curl -X POST "http://localhost:8000/api/v1/llm-providers" \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "Test LocalAI", "provider_type": "openai_compatible", "base_url": "http://localhost:8080/v1", "model_name": "llama-2-7b", "is_active": true }'
-
Database Migration: If using SQLAlchemy migrations, create a migration to add the new provider type to any database constraints.
-
Update API Documentation: The OpenAPI/Swagger docs at
/docswill automatically reflect the new provider type. -
User Documentation: Consider adding a blog post or announcement about the new provider support.
-
Testing: Test with actual services (LocalAI, Together AI, Groq, etc.) to verify end-to-end functionality.
✅ Cost-Effective: Access to models at lower prices than official APIs
✅ Model Variety: Hundreds of open-source models available
✅ Privacy: Self-hosted options keep data on-premises
✅ Performance: Specialized hardware options (Groq LPU)
✅ Flexibility: Easy to switch between providers
✅ No Vendor Lock-in: Standard API format across providers
Implementation Date: 2026-03-13
Status: ✅ Complete and Validated