Skip to content

Commit c706062

Browse files
authored
Merge pull request #1180 from tisnik/lcore-1237-updated-documentation-for-models-endpoint
LCORE-1237: updated documentation for models endpoint
2 parents a263db1 + 463a0d6 commit c706062

4 files changed

Lines changed: 161 additions & 45 deletions

File tree

README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1045,6 +1045,60 @@ The liveness endpoint performs a basic health check to verify the service is ali
10451045
}
10461046
```
10471047

1048+
## Models endpoint
1049+
1050+
**Endpoint:** `GET /v1/models`
1051+
1052+
Process GET requests and returns a list of available models from the Llama
1053+
Stack service. It is possible to specify "model_type" query parameter that is
1054+
used as a filter. For example, if model type is set to "llm", only LLM models
1055+
will be returned:
1056+
1057+
curl http://localhost:8080/v1/models?model_type=llm
1058+
1059+
The "model_type" query parameter is optional. When not specified, all models
1060+
will be returned.
1061+
1062+
**Response Body:**
1063+
```json
1064+
{
1065+
"models": [
1066+
{
1067+
"identifier": "sentence-transformers/.llama",
1068+
"metadata": {
1069+
"embedding_dimension": 384
1070+
},
1071+
"api_model_type": "embedding",
1072+
"provider_id": "sentence-transformers",
1073+
"type": "model",
1074+
"provider_resource_id": ".llama",
1075+
"model_type": "embedding"
1076+
},
1077+
{
1078+
"identifier": "openai/gpt-4o-mini",
1079+
"metadata": {},
1080+
"api_model_type": "llm",
1081+
"provider_id": "openai",
1082+
"type": "model",
1083+
"provider_resource_id": "gpt-4o-mini",
1084+
"model_type": "llm"
1085+
},
1086+
{
1087+
"identifier": "sentence-transformers/nomic-ai/nomic-embed-text-v1.5",
1088+
"metadata": {
1089+
"embedding_dimension": 768
1090+
},
1091+
"api_model_type": "embedding",
1092+
"provider_id": "sentence-transformers",
1093+
"type": "model",
1094+
"provider_resource_id": "nomic-ai/nomic-embed-text-v1.5",
1095+
"model_type": "embedding"
1096+
}
1097+
]
1098+
}
1099+
```
1100+
1101+
10481102
# Database structure
10491103

10501104
Database structure is described on [this page](https://lightspeed-core.github.io/lightspeed-stack/DB/index.html)

docs/openapi.json

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@
245245
"models"
246246
],
247247
"summary": "Models Endpoint Handler",
248-
"description": "Handle requests to the /models endpoint.\n\nProcess GET requests to the /models endpoint, returning a list of available\nmodels from the Llama Stack service.\n\nParameters:\n request: The incoming HTTP request.\n auth: Authentication tuple from the auth dependency.\n model_type: Optional filter to return only models matching this type.\n\nRaises:\n HTTPException: If unable to connect to the Llama Stack server or if\n model retrieval fails for any reason.\n\nReturns:\n ModelsResponse: An object containing the list of available models.",
248+
"description": "Handle requests to the /models endpoint.\n\nProcess GET requests to the /models endpoint, returning a list of available\nmodels from the Llama Stack service. It is possible to specify \"model_type\"\nquery parameter that is used as a filter. For example, if model type is set\nto \"llm\", only LLM models will be returned:\n\n curl http://localhost:8080/v1/models?model_type=llm\n\nThe \"model_type\" query parameter is optional. When not specified, all models\nwill be returned.\n\n## Parameters:\n request: The incoming HTTP request.\n auth: Authentication tuple from the auth dependency.\n model_type: Optional filter to return only models matching this type.\n\n## Raises:\n HTTPException: If unable to connect to the Llama Stack server or if\n model retrieval fails for any reason.\n\n## Returns:\n ModelsResponse: An object containing the list of available models.",
249249
"operationId": "models_endpoint_handler_v1_models_get",
250250
"parameters": [
251251
{
@@ -3763,6 +3763,26 @@
37633763
}
37643764
}
37653765
},
3766+
"413": {
3767+
"description": "Prompt is too long",
3768+
"content": {
3769+
"application/json": {
3770+
"schema": {
3771+
"$ref": "#/components/schemas/PromptTooLongResponse"
3772+
},
3773+
"examples": {
3774+
"prompt too long": {
3775+
"value": {
3776+
"detail": {
3777+
"cause": "The prompt exceeds the maximum allowed length.",
3778+
"response": "Prompt is too long"
3779+
}
3780+
}
3781+
}
3782+
}
3783+
}
3784+
}
3785+
},
37663786
"422": {
37673787
"description": "Request validation failed",
37683788
"content": {
@@ -7201,7 +7221,7 @@
72017221
},
72027222
"type": "object",
72037223
"title": "Authorization headers",
7204-
"description": "Headers to send to the MCP server. The map contains the header name and the path to a file containing the header value (secret). There are 2 special cases: 1. Usage of the kubernetes token in the header. To specify this use a string 'kubernetes' instead of the file path. 2. Usage of the client provided token in the header. To specify this use a string 'client' instead of the file path."
7224+
"description": "Headers to send to the MCP server. The map contains the header name and the path to a file containing the header value (secret). There are 3 special cases: 1. Usage of the kubernetes token in the header. To specify this use a string 'kubernetes' instead of the file path. 2. Usage of the client-provided token in the header. To specify this use a string 'client' instead of the file path. 3. Usage of the oauth token in the header. To specify this use a string 'oauth' instead of the file path. "
72057225
},
72067226
"timeout": {
72077227
"anyOf": [
@@ -7565,6 +7585,33 @@
75657585
"title": "PostgreSQLDatabaseConfiguration",
75667586
"description": "PostgreSQL database configuration.\n\nPostgreSQL database is used by Lightspeed Core Stack service for storing\ninformation about conversation IDs. It can also be leveraged to store\nconversation history and information about quota usage.\n\nUseful resources:\n\n- [Psycopg: connection classes](https://www.psycopg.org/psycopg3/docs/api/connections.html)\n- [PostgreSQL connection strings](https://www.connectionstrings.com/postgresql/)\n- [How to Use PostgreSQL in Python](https://www.freecodecamp.org/news/postgresql-in-python/)"
75677587
},
7588+
"PromptTooLongResponse": {
7589+
"properties": {
7590+
"status_code": {
7591+
"type": "integer",
7592+
"title": "Status Code"
7593+
},
7594+
"detail": {
7595+
"$ref": "#/components/schemas/DetailModel"
7596+
}
7597+
},
7598+
"type": "object",
7599+
"required": [
7600+
"status_code",
7601+
"detail"
7602+
],
7603+
"title": "PromptTooLongResponse",
7604+
"description": "413 Payload Too Large - Prompt is too long.",
7605+
"examples": [
7606+
{
7607+
"detail": {
7608+
"cause": "The prompt exceeds the maximum allowed length.",
7609+
"response": "Prompt is too long"
7610+
},
7611+
"label": "prompt too long"
7612+
}
7613+
]
7614+
},
75687615
"ProviderHealthStatus": {
75697616
"properties": {
75707617
"provider_id": {

docs/openapi.md

Lines changed: 47 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -247,18 +247,25 @@ Examples
247247
Handle requests to the /models endpoint.
248248

249249
Process GET requests to the /models endpoint, returning a list of available
250-
models from the Llama Stack service.
250+
models from the Llama Stack service. It is possible to specify "model_type"
251+
query parameter that is used as a filter. For example, if model type is set
252+
to "llm", only LLM models will be returned:
251253

252-
Parameters:
254+
curl http://localhost:8080/v1/models?model_type=llm
255+
256+
The "model_type" query parameter is optional. When not specified, all models
257+
will be returned.
258+
259+
## Parameters:
253260
request: The incoming HTTP request.
254261
auth: Authentication tuple from the auth dependency.
255262
model_type: Optional filter to return only models matching this type.
256263

257-
Raises:
264+
## Raises:
258265
HTTPException: If unable to connect to the Llama Stack server or if
259266
model retrieval fails for any reason.
260267

261-
Returns:
268+
## Returns:
262269
ModelsResponse: An object containing the list of available models.
263270

264271

@@ -275,14 +282,14 @@ Returns:
275282
| Status Code | Description | Component |
276283
|-------------|-------------|-----------|
277284
| 200 | Successful response | [ModelsResponse](#modelsresponse) |
278-
| 401 | Unauthorized | [UnauthorizedResponse](#unauthorizedresponse)
285+
| 401 | Unauthorized | [UnauthorizedResponse](#unauthorizedresponse) |
286+
| 403 | Permission denied | [ForbiddenResponse](#forbiddenresponse) |
287+
| 500 | Internal server error | [InternalServerErrorResponse](#internalservererrorresponse) |
288+
| 503 | Service unavailable | [ServiceUnavailableResponse](#serviceunavailableresponse) |
289+
| 422 | Validation Error | [HTTPValidationError](#httpvalidationerror) |
279290

280291
Examples
281292

282-
283-
284-
285-
286293
```json
287294
{
288295
"detail": {
@@ -292,9 +299,6 @@ Examples
292299
}
293300
```
294301

295-
296-
297-
298302
```json
299303
{
300304
"detail": {
@@ -303,14 +307,6 @@ Examples
303307
}
304308
}
305309
```
306-
|
307-
| 403 | Permission denied | [ForbiddenResponse](#forbiddenresponse)
308-
309-
Examples
310-
311-
312-
313-
314310

315311
```json
316312
{
@@ -320,14 +316,6 @@ Examples
320316
}
321317
}
322318
```
323-
|
324-
| 500 | Internal server error | [InternalServerErrorResponse](#internalservererrorresponse)
325-
326-
Examples
327-
328-
329-
330-
331319

332320
```json
333321
{
@@ -337,14 +325,6 @@ Examples
337325
}
338326
}
339327
```
340-
|
341-
| 503 | Service unavailable | [ServiceUnavailableResponse](#serviceunavailableresponse)
342-
343-
Examples
344-
345-
346-
347-
348328

349329
```json
350330
{
@@ -354,8 +334,7 @@ Examples
354334
}
355335
}
356336
```
357-
|
358-
| 422 | Validation Error | [HTTPValidationError](#httpvalidationerror) |
337+
359338
## GET `/v1/tools`
360339

361340
> **Tools Endpoint Handler**
@@ -3275,6 +3254,23 @@ Examples
32753254
"response": "User does not have permission to access this endpoint"
32763255
}
32773256
}
3257+
```
3258+
|
3259+
| 413 | Prompt is too long | [PromptTooLongResponse](#prompttoolongresponse)
3260+
3261+
Examples
3262+
3263+
3264+
3265+
3266+
3267+
```json
3268+
{
3269+
"detail": {
3270+
"cause": "The prompt exceeds the maximum allowed length.",
3271+
"response": "Prompt is too long"
3272+
}
3273+
}
32783274
```
32793275
|
32803276
| 422 | Request validation failed | [UnprocessableEntityResponse](#unprocessableentityresponse)
@@ -4945,7 +4941,7 @@ Useful resources:
49454941
| name | string | MCP server name that must be unique |
49464942
| provider_id | string | MCP provider identification |
49474943
| url | string | URL of the MCP server |
4948-
| authorization_headers | object | Headers to send to the MCP server. The map contains the header name and the path to a file containing the header value (secret). There are 2 special cases: 1. Usage of the kubernetes token in the header. To specify this use a string 'kubernetes' instead of the file path. 2. Usage of the client provided token in the header. To specify this use a string 'client' instead of the file path. |
4944+
| authorization_headers | object | Headers to send to the MCP server. The map contains the header name and the path to a file containing the header value (secret). There are 3 special cases: 1. Usage of the kubernetes token in the header. To specify this use a string 'kubernetes' instead of the file path. 2. Usage of the client-provided token in the header. To specify this use a string 'client' instead of the file path. 3. Usage of the oauth token in the header. To specify this use a string 'oauth' instead of the file path. |
49494945
| timeout | | Timeout in seconds for requests to the MCP server. If not specified, the default timeout from Llama Stack will be used. Note: This field is reserved for future use when Llama Stack adds timeout support. |
49504946

49514947

@@ -5067,6 +5063,18 @@ Useful resources:
50675063
| ca_cert_path | | Path to CA certificate |
50685064

50695065

5066+
## PromptTooLongResponse
5067+
5068+
5069+
413 Payload Too Large - Prompt is too long.
5070+
5071+
5072+
| Field | Type | Description |
5073+
|-------|------|-------------|
5074+
| status_code | integer | |
5075+
| detail | | |
5076+
5077+
50705078
## ProviderHealthStatus
50715079

50725080

src/app/endpoints/models.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,18 +83,25 @@ async def models_endpoint_handler(
8383
Handle requests to the /models endpoint.
8484
8585
Process GET requests to the /models endpoint, returning a list of available
86-
models from the Llama Stack service.
86+
models from the Llama Stack service. It is possible to specify "model_type"
87+
query parameter that is used as a filter. For example, if model type is set
88+
to "llm", only LLM models will be returned:
8789
88-
Parameters:
90+
curl http://localhost:8080/v1/models?model_type=llm
91+
92+
The "model_type" query parameter is optional. When not specified, all models
93+
will be returned.
94+
95+
## Parameters:
8996
request: The incoming HTTP request.
9097
auth: Authentication tuple from the auth dependency.
9198
model_type: Optional filter to return only models matching this type.
9299
93-
Raises:
100+
## Raises:
94101
HTTPException: If unable to connect to the Llama Stack server or if
95102
model retrieval fails for any reason.
96103
97-
Returns:
104+
## Returns:
98105
ModelsResponse: An object containing the list of available models.
99106
"""
100107
# Used only by the middleware

0 commit comments

Comments
 (0)