Skip to content

Commit 6dbece1

Browse files
committed
OpenAPI documentation
1 parent 9742c74 commit 6dbece1

1 file changed

Lines changed: 216 additions & 3 deletions

File tree

docs/openapi.md

Lines changed: 216 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4969,14 +4969,16 @@ activates the OKP provider; all other IDs refer to entries in ``byok_rag``.
49694969

49704970
Backward compatibility:
49714971
- ``inline`` defaults to ``[]`` (no inline RAG).
4972-
- ``tool`` defaults to ``None`` which means all registered vector stores
4973-
are used (identical to the previous ``tool.byok.enabled = True`` default).
4972+
- ``tool`` defaults to ``[]`` (no tool RAG).
4973+
4974+
If no RAG strategy is defined (inline and tool are empty),
4975+
the RAG tool will register all stores available to llama-stack.
49744976

49754977

49764978
| Field | Type | Description |
49774979
|-------|------|-------------|
49784980
| inline | array | RAG IDs whose sources are injected as context before the LLM call. Use 'okp' to enable OKP inline RAG. Empty by default (no inline RAG). |
4979-
| tool | | RAG IDs made available to the LLM as a file_search tool. Use 'okp' to include the OKP vector store. When omitted, all registered BYOK vector stores are used (backward compatibility). |
4981+
| tool | array | RAG IDs made available to the LLM as a file_search tool. Use 'okp' to include the OKP vector store. When omitted, all registered BYOK vector stores are used (backward compatibility). |
49804982

49814983

49824984
## ReadinessResponse
@@ -5029,6 +5031,161 @@ Attributes:
50295031
| source | | Index name identifying the knowledge source from configuration |
50305032

50315033

5034+
## ResponseInput
5035+
5036+
5037+
5038+
5039+
5040+
## ResponseItem
5041+
5042+
5043+
5044+
5045+
5046+
## ResponsesRequest
5047+
5048+
5049+
Model representing a request for the Responses API following LCORE specification.
5050+
5051+
Attributes:
5052+
input: Input text or structured input items containing the query.
5053+
model: Model identifier in format "provider/model". Auto-selected if not provided.
5054+
conversation: Conversation ID linking to an existing conversation. Accepts both
5055+
OpenAI and LCORE formats. Mutually exclusive with previous_response_id.
5056+
include: Explicitly specify output item types that are excluded by default but
5057+
should be included in the response.
5058+
instructions: System instructions or guidelines provided to the model (acts as
5059+
the system prompt).
5060+
max_infer_iters: Maximum number of inference iterations the model can perform.
5061+
max_output_tokens: Maximum number of tokens allowed in the response.
5062+
max_tool_calls: Maximum number of tool calls allowed in a single response.
5063+
metadata: Custom metadata dictionary with key-value pairs for tracking or logging.
5064+
parallel_tool_calls: Whether the model can make multiple tool calls in parallel.
5065+
previous_response_id: Identifier of the previous response in a multi-turn
5066+
conversation. Mutually exclusive with conversation.
5067+
prompt: Prompt object containing a template with variables for dynamic
5068+
substitution.
5069+
reasoning: Reasoning configuration for the response.
5070+
safety_identifier: Safety identifier for the response.
5071+
store: Whether to store the response in conversation history. Defaults to True.
5072+
stream: Whether to stream the response as it is generated. Defaults to False.
5073+
temperature: Sampling temperature controlling randomness (typically 0.0–2.0).
5074+
text: Text response configuration specifying output format constraints (JSON
5075+
schema, JSON object, or plain text).
5076+
tool_choice: Tool selection strategy ("auto", "required", "none", or specific
5077+
tool configuration).
5078+
tools: List of tools available to the model (file search, web search, function
5079+
calls, MCP tools). Defaults to all tools available to the model.
5080+
generate_topic_summary: LCORE-specific flag indicating whether to generate a
5081+
topic summary for new conversations. Defaults to True.
5082+
shield_ids: LCORE-specific list of safety shield IDs to apply. If None, all
5083+
configured shields are used.
5084+
solr: LCORE-specific Solr vector_io provider query parameters (e.g. filter
5085+
queries). Optional.
5086+
5087+
5088+
| Field | Type | Description |
5089+
|-------|------|-------------|
5090+
| input | | |
5091+
| model | | |
5092+
| conversation | | |
5093+
| include | | |
5094+
| instructions | | |
5095+
| max_infer_iters | | |
5096+
| max_output_tokens | | |
5097+
| max_tool_calls | | |
5098+
| metadata | | |
5099+
| parallel_tool_calls | | |
5100+
| previous_response_id | | |
5101+
| prompt | | |
5102+
| reasoning | | |
5103+
| safety_identifier | | |
5104+
| store | boolean | |
5105+
| stream | boolean | |
5106+
| temperature | | |
5107+
| text | | |
5108+
| tool_choice | | |
5109+
| tools | | |
5110+
| generate_topic_summary | | |
5111+
| shield_ids | | |
5112+
| solr | | |
5113+
5114+
5115+
## ResponsesResponse
5116+
5117+
5118+
Model representing a response from the Responses API following LCORE specification.
5119+
5120+
Attributes:
5121+
created_at: Unix timestamp when the response was created.
5122+
completed_at: Unix timestamp when the response was completed, if applicable.
5123+
error: Error details if the response failed or was blocked.
5124+
id: Unique identifier for this response.
5125+
model: Model identifier in "provider/model" format used for generation.
5126+
object: Object type identifier, always "response".
5127+
output: List of structured output items containing messages, tool calls, and
5128+
other content. This is the primary response content.
5129+
parallel_tool_calls: Whether the model can make multiple tool calls in parallel.
5130+
previous_response_id: Identifier of the previous response in a multi-turn
5131+
conversation.
5132+
prompt: The input prompt object that was sent to the model.
5133+
status: Current status of the response (e.g., "completed", "blocked",
5134+
"in_progress").
5135+
temperature: Temperature parameter used for generation (controls randomness).
5136+
text: Text response configuration object used for OpenAI responses.
5137+
top_p: Top-p sampling parameter used for generation.
5138+
tools: List of tools available to the model during generation.
5139+
tool_choice: Tool selection strategy used (e.g., "auto", "required", "none").
5140+
truncation: Strategy used for handling content that exceeds context limits.
5141+
usage: Token usage statistics including input_tokens, output_tokens, and
5142+
total_tokens.
5143+
instructions: System instructions or guidelines provided to the model.
5144+
max_tool_calls: Maximum number of tool calls allowed in a single response.
5145+
reasoning: Reasoning configuration (effort level) used for the response.
5146+
max_output_tokens: Upper bound for tokens generated in the response.
5147+
safety_identifier: Safety/guardrail identifier applied to the request.
5148+
metadata: Additional metadata dictionary with custom key-value pairs.
5149+
store: Whether the response was stored.
5150+
conversation: Conversation ID linking this response to a conversation thread
5151+
(LCORE-specific).
5152+
available_quotas: Remaining token quotas for the user (LCORE-specific).
5153+
output_text: Aggregated text output from all output_text items in the
5154+
output array.
5155+
5156+
5157+
| Field | Type | Description |
5158+
|-------|------|-------------|
5159+
| created_at | integer | |
5160+
| completed_at | | |
5161+
| error | | |
5162+
| id | string | |
5163+
| model | string | |
5164+
| object | string | |
5165+
| output | array | |
5166+
| parallel_tool_calls | boolean | |
5167+
| previous_response_id | | |
5168+
| prompt | | |
5169+
| status | string | |
5170+
| temperature | | |
5171+
| text | | |
5172+
| top_p | | |
5173+
| tools | | |
5174+
| tool_choice | | |
5175+
| truncation | | |
5176+
| usage | | |
5177+
| instructions | | |
5178+
| max_tool_calls | | |
5179+
| reasoning | | |
5180+
| max_output_tokens | | |
5181+
| safety_identifier | | |
5182+
| metadata | | |
5183+
| store | | |
5184+
| conversation | | |
5185+
| available_quotas | object | |
5186+
| output_text | string | |
5187+
5188+
50325189
## RlsapiV1Attachment
50335190

50345191

@@ -5187,6 +5344,62 @@ SQLite database configuration.
51875344
| db_path | string | Path to file where SQLite database is stored |
51885345

51895346

5347+
## SearchRankingOptions
5348+
5349+
5350+
Options for ranking and filtering search results.
5351+
5352+
This class configures how search results are ranked and filtered. You can use algorithm-based
5353+
rerankers (weighted, RRF) or neural rerankers. Defaults from VectorStoresConfig are
5354+
used when parameters are not provided.
5355+
5356+
Examples:
5357+
# Weighted ranker with custom alpha
5358+
SearchRankingOptions(ranker="weighted", alpha=0.7)
5359+
5360+
# RRF ranker with custom impact factor
5361+
SearchRankingOptions(ranker="rrf", impact_factor=50.0)
5362+
5363+
# Use config defaults (just specify ranker type)
5364+
SearchRankingOptions(ranker="weighted") # Uses alpha from VectorStoresConfig
5365+
5366+
# Score threshold filtering
5367+
SearchRankingOptions(ranker="weighted", score_threshold=0.5)
5368+
5369+
:param ranker: (Optional) Name of the ranking algorithm to use. Supported values:
5370+
- "weighted": Weighted combination of vector and keyword scores
5371+
- "rrf": Reciprocal Rank Fusion algorithm
5372+
- "neural": Neural reranking model (requires model parameter, Part II)
5373+
Note: For OpenAI API compatibility, any string value is accepted, but only the above values are supported.
5374+
:param score_threshold: (Optional) Minimum relevance score threshold for results. Default: 0.0
5375+
:param alpha: (Optional) Weight factor for weighted ranker (0-1).
5376+
- 0.0 = keyword only
5377+
- 0.5 = equal weight (default)
5378+
- 1.0 = vector only
5379+
Only used when ranker="weighted" and weights is not provided.
5380+
Falls back to VectorStoresConfig.chunk_retrieval_params.weighted_search_alpha if not provided.
5381+
:param impact_factor: (Optional) Impact factor (k) for RRF algorithm.
5382+
Lower values emphasize higher-ranked results. Default: 60.0 (optimal from research).
5383+
Only used when ranker="rrf".
5384+
Falls back to VectorStoresConfig.chunk_retrieval_params.rrf_impact_factor if not provided.
5385+
:param weights: (Optional) Dictionary of weights for combining different signal types.
5386+
Keys can be "vector", "keyword", "neural". Values should sum to 1.0.
5387+
Used when combining algorithm-based reranking with neural reranking (Part II).
5388+
Example: {"vector": 0.3, "keyword": 0.3, "neural": 0.4}
5389+
:param model: (Optional) Model identifier for neural reranker (e.g., "vllm/Qwen3-Reranker-0.6B").
5390+
Required when ranker="neural" or when weights contains "neural" (Part II).
5391+
5392+
5393+
| Field | Type | Description |
5394+
|-------|------|-------------|
5395+
| ranker | | |
5396+
| score_threshold | | |
5397+
| alpha | | Weight factor for weighted ranker |
5398+
| impact_factor | | Impact factor for RRF algorithm |
5399+
| weights | | Weights for combining vector, keyword, and neural scores. Keys: 'vector', 'keyword', 'neural' |
5400+
| model | | Model identifier for neural reranker |
5401+
5402+
51905403
## SecurityScheme
51915404

51925405

0 commit comments

Comments
 (0)