@@ -73,6 +73,7 @@ The service includes comprehensive user data collection capabilities for various
7373 * [ OpenAPI specification] ( #openapi-specification )
7474 * [ Readiness Endpoint] ( #readiness-endpoint )
7575 * [ Liveness Endpoint] ( #liveness-endpoint )
76+ * [ Models endpoint] ( #models-endpoint )
7677* [ Database structure] ( #database-structure )
7778* [ Publish the service as Python package on PyPI] ( #publish-the-service-as-python-package-on-pypi )
7879 * [ Generate distribution archives to be uploaded into Python registry] ( #generate-distribution-archives-to-be-uploaded-into-python-registry )
@@ -129,14 +130,14 @@ Lightspeed Core Stack is based on the FastAPI framework (Uvicorn). The service i
129130
130131 Lightspeed Stack supports multiple LLM providers.
131132
132- | Provider | Setup Documentation |
133- | ----------------| -----------------------------------------------------------------------|
134- | OpenAI | https://platform.openai.com |
135- | Azure OpenAI | https://azure.microsoft.com/en-us/products/ai-services/openai-service |
136- | Google VertexAI| https://cloud.google.com/vertex-ai |
137- | IBM WatsonX | https://www.ibm.com/products/watsonx |
138- | RHOAI (vLLM) | See tests/e2e-prow/rhoai/configs/run.yaml |
139- | RHEL AI (vLLM) | See tests/e2e/configs/run-rhelai.yaml |
133+ | Provider | Setup Documentation |
134+ | ----------------- | -----------------------------------------------------------------------|
135+ | OpenAI | https://platform.openai.com |
136+ | Azure OpenAI | https://azure.microsoft.com/en-us/products/ai-services/openai-service |
137+ | Google VertexAI | https://cloud.google.com/vertex-ai |
138+ | IBM WatsonX | https://www.ibm.com/products/watsonx |
139+ | RHOAI (vLLM) | See tests/e2e-prow/rhoai/configs/run.yaml |
140+ | RHEL AI (vLLM) | See tests/e2e/configs/run-rhelai.yaml |
140141
141142 See ` docs/providers.md ` for configuration details.
142143
@@ -199,17 +200,17 @@ To quickly get hands on LCS, we can run it using the default configurations prov
199200Lightspeed Core Stack (LCS) provides support for Large Language Model providers. The models listed in the table below represent specific examples that have been tested within LCS.
200201__Note__: Support for individual models is dependent on the specific inference provider' s implementation within the currently supported version of Llama Stack.
201202
202- | Provider | Model | Tool Calling | provider_type | Example |
203- | -------- | ---------------------------------------------- | ------------ | -------------- | -------------------------------------------------------------------------- |
204- | OpenAI | gpt-5, gpt-4o, gpt4- turbo, gpt-4.1, o1, o3, o4 | Yes | remote::openai | [1](examples/openai-faiss-run.yaml) [2](examples/openai-pgvector-run.yaml) |
205- | OpenAI | gpt-3.5-turbo, gpt-4 | No | remote::openai | |
206- | RHOAI (vLLM)| meta-llama/Llama-3.2-1B-Instruct | Yes | remote::vllm | [1](tests/e2e-prow/rhoai/configs/run.yaml) |
207- | RHAIIS (vLLM)| meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhaiis.yaml) |
208- | RHEL AI (vLLM)| meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhelai.yaml) |
209- | Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1| Yes | remote::azure | [1](examples/azure-run.yaml) |
210- | Azure | gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1-mini | No or limited | remote::azure | |
211- | VertexAI | google/gemini-2.0-flash, google/gemini-2.5-flash, google/gemini-2.5-pro [^1] | Yes | remote::vertexai | [1](examples/vertexai-run.yaml) |
212- | WatsonX | meta-llama/llama-3-3-70b-instruct | Yes | remote::watsonx | [1](examples/watsonx-run.yaml) |
203+ | Provider | Model | Tool Calling | provider_type | Example |
204+ | ---------------- | ------------------------------------------------------------------------------ | --------------- | ------------------ | ---------------------------------------------------------------------------- |
205+ | OpenAI | gpt-5, gpt-4o, gpt-4- turbo, gpt-4.1, o1, o3, o4 | Yes | remote::openai | [1](examples/openai-faiss-run.yaml) [2](examples/openai-pgvector-run.yaml) |
206+ | OpenAI | gpt-3.5-turbo, gpt-4 | No | remote::openai | |
207+ | RHOAI (vLLM) | meta-llama/Llama-3.2-1B-Instruct | Yes | remote::vllm | [1](tests/e2e-prow/rhoai/configs/run.yaml) |
208+ | RHAIIS (vLLM) | meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhaiis.yaml) |
209+ | RHEL AI (vLLM) | meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhelai.yaml) |
210+ | Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1 | Yes | remote::azure | [1](examples/azure-run.yaml) |
211+ | Azure | gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1-mini | No or limited | remote::azure | |
212+ | VertexAI | google/gemini-2.0-flash, google/gemini-2.5-flash, google/gemini-2.5-pro [^1] | Yes | remote::vertexai | [1](examples/vertexai-run.yaml) |
213+ | WatsonX | meta-llama/llama-3-3-70b-instruct | Yes | remote::watsonx | [1](examples/watsonx-run.yaml) |
213214
214215[^1]: List of models is limited by design in llama-stack, future versions will probably allow to use more models (see [here](https://github.com/llamastack/llama-stack/blob/release-0.3.x/llama_stack/providers/remote/inference/vertexai/vertexai.py#L54))
215216
@@ -491,12 +492,13 @@ mcp_servers:
491492
492493# #### Authentication Method Comparison
493494
494- | Method | Use Case | Configuration | Token Scope | Example |
495- | --------| ----------| ---------------| -------------| ---------|
496- | ** Static File** | Service tokens, API keys | File path in config | Global (all users) | ` " /var/secrets/token" ` |
497- | ** Kubernetes** | K8s service accounts | ` " kubernetes" ` keyword | Per-user (from auth) | ` " kubernetes" ` |
498- | ** Client** | User-specific tokens | ` " client" ` keyword + HTTP header | Per-request | ` " client" ` |
499- | ** OAuth** | OAuth-protected MCP servers | ` " oauth" ` keyword + HTTP header | Per-request (from OAuth flow) | ` " oauth" ` |
495+ | Method | Use Case | Configuration | Token Scope | Example |
496+ | -----------------| -----------------------------| ----------------------------------| -------------------------------| ------------------------|
497+ | ** Static File** | Service tokens, API keys | File path in config | Global (all users) | ` " /var/secrets/token" ` |
498+ | ** Kubernetes** | K8s service accounts | ` " kubernetes" ` keyword | Per-user (from auth) | ` " kubernetes" ` |
499+ | ** Client** | User-specific tokens | ` " client" ` keyword + HTTP header | Per-request | ` " client" ` |
500+ | ** OAuth** | OAuth-protected MCP servers | ` " oauth" ` keyword + HTTP header | Per-request (from OAuth flow) | ` " oauth" ` |
501+
500502
501503# #### Important: Automatic Server Skipping
502504
@@ -803,7 +805,7 @@ verify Run all linters
803805distribution-archives Generate distribution archives to be uploaded into Python registry
804806upload-distribution-archives Upload distribution archives into Python registry
805807konflux-requirements generate hermetic requirements.* .txt file for konflux build
806- konflux-rpm-lock generate rpm.lock.yaml file for konflux build
808+ konflux-rpm-lock generate rpm.lock.yaml file for konflux build
807809` ` `
808810
809811# # Running Linux container image
@@ -1045,6 +1047,62 @@ The liveness endpoint performs a basic health check to verify the service is ali
10451047}
10461048` ` `
10471049
1050+ # # Models endpoint
1051+
1052+ ** Endpoint:** ` GET /v1/models`
1053+
1054+ Process GET requests and returns a list of available models from the Llama
1055+ Stack service. It is possible to specify " model_type" query parameter that is
1056+ used as a filter. For example, if model type is set to " llm" , only LLM models
1057+ will be returned:
1058+
1059+ ` ` ` bash
1060+ curl http://localhost:8080/v1/models? model_type=llm
1061+ ` ` `
1062+
1063+ The " model_type" query parameter is optional. When not specified, all models
1064+ will be returned.
1065+
1066+ ** Response Body:**
1067+ ` ` ` json
1068+ {
1069+ " models" : [
1070+ {
1071+ " identifier" : " sentence-transformers/.llama" ,
1072+ " metadata" : {
1073+ " embedding_dimension" : 384
1074+ },
1075+ " api_model_type" : " embedding" ,
1076+ " provider_id" : " sentence-transformers" ,
1077+ " type" : " model" ,
1078+ " provider_resource_id" : " .llama" ,
1079+ " model_type" : " embedding"
1080+ },
1081+ {
1082+ " identifier" : " openai/gpt-4o-mini" ,
1083+ " metadata" : {},
1084+ " api_model_type" : " llm" ,
1085+ " provider_id" : " openai" ,
1086+ " type" : " model" ,
1087+ " provider_resource_id" : " gpt-4o-mini" ,
1088+ " model_type" : " llm"
1089+ },
1090+ {
1091+ " identifier" : " sentence-transformers/nomic-ai/nomic-embed-text-v1.5" ,
1092+ " metadata" : {
1093+ " embedding_dimension" : 768
1094+ },
1095+ " api_model_type" : " embedding" ,
1096+ " provider_id" : " sentence-transformers" ,
1097+ " type" : " model" ,
1098+ " provider_resource_id" : " nomic-ai/nomic-embed-text-v1.5" ,
1099+ " model_type" : " embedding"
1100+ }
1101+ ]
1102+ }
1103+ ` ` `
1104+
1105+
10481106# Database structure
10491107
10501108Database structure is described on [this page](https://lightspeed-core.github.io/lightspeed-stack/DB/index.html)
0 commit comments