Skip to content

Commit b48223f

Browse files
authored
Merge branch 'lightspeed-core:main' into restart_prow_e2e_pod
2 parents 67b55d1 + 051c06c commit b48223f

75 files changed

Lines changed: 4048 additions & 2181 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 84 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ The service includes comprehensive user data collection capabilities for various
7373
* [OpenAPI specification](#openapi-specification)
7474
* [Readiness Endpoint](#readiness-endpoint)
7575
* [Liveness Endpoint](#liveness-endpoint)
76+
* [Models endpoint](#models-endpoint)
7677
* [Database structure](#database-structure)
7778
* [Publish the service as Python package on PyPI](#publish-the-service-as-python-package-on-pypi)
7879
* [Generate distribution archives to be uploaded into Python registry](#generate-distribution-archives-to-be-uploaded-into-python-registry)
@@ -129,14 +130,14 @@ Lightspeed Core Stack is based on the FastAPI framework (Uvicorn). The service i
129130

130131
Lightspeed Stack supports multiple LLM providers.
131132

132-
| Provider | Setup Documentation |
133-
|----------------|-----------------------------------------------------------------------|
134-
| OpenAI | https://platform.openai.com |
135-
| Azure OpenAI | https://azure.microsoft.com/en-us/products/ai-services/openai-service |
136-
| Google VertexAI| https://cloud.google.com/vertex-ai |
137-
| IBM WatsonX | https://www.ibm.com/products/watsonx |
138-
| RHOAI (vLLM) | See tests/e2e-prow/rhoai/configs/run.yaml |
139-
| RHEL AI (vLLM) | See tests/e2e/configs/run-rhelai.yaml |
133+
| Provider | Setup Documentation |
134+
|-----------------|-----------------------------------------------------------------------|
135+
| OpenAI | https://platform.openai.com |
136+
| Azure OpenAI | https://azure.microsoft.com/en-us/products/ai-services/openai-service |
137+
| Google VertexAI | https://cloud.google.com/vertex-ai |
138+
| IBM WatsonX | https://www.ibm.com/products/watsonx |
139+
| RHOAI (vLLM) | See tests/e2e-prow/rhoai/configs/run.yaml |
140+
| RHEL AI (vLLM) | See tests/e2e/configs/run-rhelai.yaml |
140141

141142
See `docs/providers.md` for configuration details.
142143

@@ -199,17 +200,17 @@ To quickly get hands on LCS, we can run it using the default configurations prov
199200
Lightspeed Core Stack (LCS) provides support for Large Language Model providers. The models listed in the table below represent specific examples that have been tested within LCS.
200201
__Note__: Support for individual models is dependent on the specific inference provider's implementation within the currently supported version of Llama Stack.
201202

202-
| Provider | Model | Tool Calling | provider_type | Example |
203-
| -------- | ---------------------------------------------- | ------------ | -------------- | -------------------------------------------------------------------------- |
204-
| OpenAI | gpt-5, gpt-4o, gpt4-turbo, gpt-4.1, o1, o3, o4 | Yes | remote::openai | [1](examples/openai-faiss-run.yaml) [2](examples/openai-pgvector-run.yaml) |
205-
| OpenAI | gpt-3.5-turbo, gpt-4 | No | remote::openai | |
206-
| RHOAI (vLLM)| meta-llama/Llama-3.2-1B-Instruct | Yes | remote::vllm | [1](tests/e2e-prow/rhoai/configs/run.yaml) |
207-
| RHAIIS (vLLM)| meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhaiis.yaml) |
208-
| RHEL AI (vLLM)| meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhelai.yaml) |
209-
| Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1| Yes | remote::azure | [1](examples/azure-run.yaml) |
210-
| Azure | gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1-mini | No or limited | remote::azure | |
211-
| VertexAI | google/gemini-2.0-flash, google/gemini-2.5-flash, google/gemini-2.5-pro [^1] | Yes | remote::vertexai | [1](examples/vertexai-run.yaml) |
212-
| WatsonX | meta-llama/llama-3-3-70b-instruct | Yes | remote::watsonx | [1](examples/watsonx-run.yaml) |
203+
| Provider | Model | Tool Calling | provider_type | Example |
204+
|----------------|------------------------------------------------------------------------------|---------------|------------------|----------------------------------------------------------------------------|
205+
| OpenAI | gpt-5, gpt-4o, gpt-4-turbo, gpt-4.1, o1, o3, o4 | Yes | remote::openai | [1](examples/openai-faiss-run.yaml) [2](examples/openai-pgvector-run.yaml) |
206+
| OpenAI | gpt-3.5-turbo, gpt-4 | No | remote::openai | |
207+
| RHOAI (vLLM) | meta-llama/Llama-3.2-1B-Instruct | Yes | remote::vllm | [1](tests/e2e-prow/rhoai/configs/run.yaml) |
208+
| RHAIIS (vLLM) | meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhaiis.yaml) |
209+
| RHEL AI (vLLM) | meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhelai.yaml) |
210+
| Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1 | Yes | remote::azure | [1](examples/azure-run.yaml) |
211+
| Azure | gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1-mini | No or limited | remote::azure | |
212+
| VertexAI | google/gemini-2.0-flash, google/gemini-2.5-flash, google/gemini-2.5-pro [^1] | Yes | remote::vertexai | [1](examples/vertexai-run.yaml) |
213+
| WatsonX | meta-llama/llama-3-3-70b-instruct | Yes | remote::watsonx | [1](examples/watsonx-run.yaml) |
213214

214215
[^1]: List of models is limited by design in llama-stack, future versions will probably allow to use more models (see [here](https://github.com/llamastack/llama-stack/blob/release-0.3.x/llama_stack/providers/remote/inference/vertexai/vertexai.py#L54))
215216

@@ -491,12 +492,13 @@ mcp_servers:
491492

492493
##### Authentication Method Comparison
493494

494-
| Method | Use Case | Configuration | Token Scope | Example |
495-
|--------|----------|---------------|-------------|---------|
496-
| **Static File** | Service tokens, API keys | File path in config | Global (all users) | `"/var/secrets/token"` |
497-
| **Kubernetes** | K8s service accounts | `"kubernetes"` keyword | Per-user (from auth) | `"kubernetes"` |
498-
| **Client** | User-specific tokens | `"client"` keyword + HTTP header | Per-request | `"client"` |
499-
| **OAuth** | OAuth-protected MCP servers | `"oauth"` keyword + HTTP header | Per-request (from OAuth flow) | `"oauth"` |
495+
| Method | Use Case | Configuration | Token Scope | Example |
496+
|-----------------|-----------------------------|----------------------------------|-------------------------------|------------------------|
497+
| **Static File** | Service tokens, API keys | File path in config | Global (all users) | `"/var/secrets/token"` |
498+
| **Kubernetes** | K8s service accounts | `"kubernetes"` keyword | Per-user (from auth) | `"kubernetes"` |
499+
| **Client** | User-specific tokens | `"client"` keyword + HTTP header | Per-request | `"client"` |
500+
| **OAuth** | OAuth-protected MCP servers | `"oauth"` keyword + HTTP header | Per-request (from OAuth flow) | `"oauth"` |
501+
500502

501503
##### Important: Automatic Server Skipping
502504

@@ -803,7 +805,7 @@ verify Run all linters
803805
distribution-archives Generate distribution archives to be uploaded into Python registry
804806
upload-distribution-archives Upload distribution archives into Python registry
805807
konflux-requirements generate hermetic requirements.*.txt file for konflux build
806-
konflux-rpm-lock generate rpm.lock.yaml file for konflux build
808+
konflux-rpm-lock generate rpm.lock.yaml file for konflux build
807809
```
808810

809811
## Running Linux container image
@@ -1045,6 +1047,62 @@ The liveness endpoint performs a basic health check to verify the service is ali
10451047
}
10461048
```
10471049

1050+
## Models endpoint
1051+
1052+
**Endpoint:** `GET /v1/models`
1053+
1054+
Process GET requests and returns a list of available models from the Llama
1055+
Stack service. It is possible to specify "model_type" query parameter that is
1056+
used as a filter. For example, if model type is set to "llm", only LLM models
1057+
will be returned:
1058+
1059+
```bash
1060+
curl http://localhost:8080/v1/models?model_type=llm
1061+
```
1062+
1063+
The "model_type" query parameter is optional. When not specified, all models
1064+
will be returned.
1065+
1066+
**Response Body:**
1067+
```json
1068+
{
1069+
"models": [
1070+
{
1071+
"identifier": "sentence-transformers/.llama",
1072+
"metadata": {
1073+
"embedding_dimension": 384
1074+
},
1075+
"api_model_type": "embedding",
1076+
"provider_id": "sentence-transformers",
1077+
"type": "model",
1078+
"provider_resource_id": ".llama",
1079+
"model_type": "embedding"
1080+
},
1081+
{
1082+
"identifier": "openai/gpt-4o-mini",
1083+
"metadata": {},
1084+
"api_model_type": "llm",
1085+
"provider_id": "openai",
1086+
"type": "model",
1087+
"provider_resource_id": "gpt-4o-mini",
1088+
"model_type": "llm"
1089+
},
1090+
{
1091+
"identifier": "sentence-transformers/nomic-ai/nomic-embed-text-v1.5",
1092+
"metadata": {
1093+
"embedding_dimension": 768
1094+
},
1095+
"api_model_type": "embedding",
1096+
"provider_id": "sentence-transformers",
1097+
"type": "model",
1098+
"provider_resource_id": "nomic-ai/nomic-embed-text-v1.5",
1099+
"model_type": "embedding"
1100+
}
1101+
]
1102+
}
1103+
```
1104+
1105+
10481106
# Database structure
10491107

10501108
Database structure is described on [this page](https://lightspeed-core.github.io/lightspeed-stack/DB/index.html)

0 commit comments

Comments
 (0)