Skip to content

Commit fa8e7e9

Browse files
authored
LCORE-86: Prioritize BYOK content over built-in content (#1208)
* Add chunk prioritization and inline RAG support - Add configurable RAG strategies: inline RAG which is performed at each query (OKP + BYOK) and tool RAG can be used independently or together - BYOK and OKP stores can to be listed for each strategy - Placeholder for OKP store "okp" - Add chunk prioritization with score multipliers per vector store for inline RAG - Added knobs in config to select the RAG strategy - Tool RAG enabled by default for backward compatibility - Update lightspeed stack configuration enrichment script to build the OKP section in llama stack and fix bugs in building the vector stores - Updated byok and rag documentation - Updated unit tests
1 parent e00ec21 commit fa8e7e9

28 files changed

Lines changed: 2362 additions & 495 deletions

docs/byok_guide.md

Lines changed: 94 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The BYOK (Bring Your Own Knowledge) feature in Lightspeed Core enables users to
1616
* [Step 2: Create Vector Database](#step-2-create-vector-database)
1717
* [Step 3: Configure Embedding Model](#step-3-configure-embedding-model)
1818
* [Step 4: Configure Llama Stack](#step-4-configure-llama-stack)
19-
* [Step 5: Enable RAG Tools](#step-5-enable-rag-tools)
19+
* [Step 5: Configure RAG Strategy](#step-5-configure-rag-strategy)
2020
* [Supported Vector Database Types](#supported-vector-database-types)
2121
* [Configuration Examples](#configuration-examples)
2222
* [Conclusion](#conclusion)
@@ -34,27 +34,58 @@ BYOK (Bring Your Own Knowledge) is Lightspeed Core's implementation of Retrieval
3434

3535
## How BYOK Works
3636

37-
The BYOK system operates through a sophisticated chain of components:
37+
BYOK knowledge sources can be queried in two complementary modes, configured independently:
3838

39-
1. **Agent Orchestrator**: The AI agent acts as the central coordinator, using the LLM as its reasoning engine
40-
2. **RAG Tool**: When the agent needs external information, it queries your custom vector database
41-
3. **Vector Database**: Your indexed knowledge sources, stored as vector embeddings for semantic search
42-
4. **Embedding Model**: Converts queries and documents into vector representations for similarity matching
43-
5. **Context Integration**: Retrieved knowledge is integrated into the AI's response generation process
39+
### Inline RAG
40+
41+
Context is fetched from your BYOK vector stores and/or OKP and injected before the LLM request. No tool calls are required.
42+
43+
```mermaid
44+
graph TD
45+
A[User Query] --> B[Fetch Context]
46+
B --> C[BYOK Vector Stores]
47+
B --> D[OKP Vector Stores]
48+
C --> E[Retrieved Chunks]
49+
D --> E
50+
E --> F[Inject Context into Prompt Context]
51+
F --> G[LLM Generates Response]
52+
G --> H[Response to User]
53+
```
54+
55+
### Tool RAG (on-demand retrieval)
56+
57+
The LLM can call the `file_search` tool during generation when it decides external knowledge is needed. Both BYOK vector stores and OKP are supported in Tool RAG mode.
4458

4559
```mermaid
4660
graph TD
47-
A[User Query] --> B[AI Agent]
61+
A[User Query] --> P{Inline RAG enabled?}
62+
P -->|Yes| Q[Fetch Context]
63+
Q --> R[BYOK / OKP Vector Stores]
64+
R --> S[Inject Context into Prompt Context]
65+
S --> B[LLM]
66+
P -->|No| B
4867
B --> C{Need External Knowledge?}
49-
C -->|Yes| D[RAG Tool]
68+
C -->|Yes| D[file_search Tool]
5069
C -->|No| E[Generate Response]
51-
D --> F[Vector Database]
70+
D --> F[BYOK / OKP Vector Stores]
5271
F --> G[Retrieve Relevant Context]
53-
G --> H[Integrate Context]
54-
H --> E
55-
E --> I[Response to User]
72+
G --> B
73+
E --> H[Response to User]
5674
```
5775

76+
Both modes rely on:
77+
- **Vector Database**: Your indexed knowledge sources stored as vector embeddings
78+
- **Embedding Model**: Converts queries and documents into vector representations for similarity matching
79+
80+
Inline RAG additionally supports:
81+
- **Score Multiplier**: Optional weight applied per BYOK vector store when mixing multiple sources. Allows custom prioritization of content.
82+
83+
> [!NOTE]
84+
> OKP and BYOK scores are not directly comparable (different scoring systems), so
85+
> `score_multiplier` does not apply to OKP results. To control the amount of retrieved
86+
> context, set the `BYOK_RAG_MAX_CHUNKS` and `OKP_RAG_MAX_CHUNKS` constants in `src/constants.py`
87+
> (defaults: 10 and 5 respectively). For Tool RAG, use `TOOL_RAG_MAX_CHUNKS` (default: 10).
88+
5889
---
5990

6091
## Prerequisites
@@ -244,12 +275,58 @@ registered_resources:
244275

245276
**⚠️ Important**: The `vector_store_id` value must exactly match the ID you provided when creating the vector database using the rag-content tool. This identifier links your Llama Stack configuration to the specific vector database index you created.
246277

247-
### Step 5: Enable RAG Tools
278+
> [!TIP]
279+
> Instead of manually editing `run.yaml`, you can declare your knowledge sources in the `byok_rag`
280+
> section of `lightspeed-stack.yaml`. The lightspeed-stack service automatically generates the required configuration
281+
> at startup.
282+
>
283+
> ```yaml
284+
> byok_rag:
285+
> - rag_id: my-docs # Unique identifier for this knowledge source
286+
> rag_type: inline::faiss
287+
> embedding_model: sentence-transformers/all-mpnet-base-v2
288+
> embedding_dimension: 768
289+
> vector_db_id: your-index-id # Llama Stack vector store ID (from index generation)
290+
> db_path: /path/to/vector_db/faiss_store.db
291+
> score_multiplier: 1.0 # Optional: weight results when mixing multiple sources
292+
> ```
293+
>
294+
> When multiple BYOK sources are configured, `score_multiplier` adjusts the relative importance of
295+
> each store's results during Inline RAG retrieval. Values above 1.0 boost a store; below 1.0 reduce it.
296+
297+
### Step 5: Configure RAG Strategy
298+
299+
Add a `rag` section to your `lightspeed-stack.yaml` to choose how BYOK knowledge is used.
300+
Each list entry is a `rag_id` from `byok_rag`, or the special value `okp` for OKP.
301+
302+
```yaml
303+
rag:
304+
# Inline RAG: inject context before the LLM request (no tool calls needed)
305+
inline:
306+
- my-docs # rag_id from byok_rag
307+
- okp # include OKP context inline
308+
309+
# Tool RAG: the LLM can call file_search to retrieve context on demand
310+
# Omit to use all registered BYOK stores (backward compatibility)
311+
tool:
312+
- my-docs # expose this BYOK store as the file_search tool
313+
- okp # expose OKP as the file_search tool
314+
315+
# OKP provider settings (only relevant when okp is listed above)
316+
okp:
317+
offline: true # true = use parent_id for source URLs, false = use reference_url
318+
```
319+
320+
Both modes can be enabled simultaneously. Choose based on your latency and control preferences:
248321

249-
The configuration above automatically enables the RAG tools. The system will:
322+
| Mode | When context is fetched | Tool call needed | score_multiplier |
323+
|------|------------------------|------------------|-----------------|
324+
| Inline RAG | With every query | No | Yes (BYOK only) |
325+
| Tool RAG | On LLM demand | Yes | No |
250326

251-
1. **Detect RAG availability**: Automatically identify when RAG is available
252-
2. **Enhance prompts**: Encourage the AI to use RAG tools
327+
> [!TIP]
328+
> A ready-to-use example combining BYOK and OKP is available at
329+
> [`examples/lightspeed-stack-byok-okp-rag.yaml`](../examples/lightspeed-stack-byok-okp-rag.yaml).
253330

254331
---
255332

docs/config.md

Lines changed: 66 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -110,15 +110,32 @@ Microsoft Entra ID authentication attributes for Azure.
110110

111111
BYOK (Bring Your Own Knowledge) RAG configuration.
112112

113+
Each entry registers a local vector store. The `rag_id` is the
114+
identifier used in `rag.inline` and `rag.tool` to select which stores to use.
115+
116+
Example:
117+
118+
```yaml
119+
byok_rag:
120+
- rag_id: my-docs # referenced in rag.inline / rag.tool
121+
rag_type: inline::faiss
122+
embedding_model: sentence-transformers/all-MiniLM-L6-v2
123+
embedding_dimension: 384
124+
vector_db_id: vs_abc123
125+
db_path: /path/to/faiss_store.db
126+
score_multiplier: 1.0
127+
```
128+
113129
114130
| Field | Type | Description |
115131
|-------|------|-------------|
116132
| rag_id | string | Unique RAG ID |
117-
| rag_type | string | Type of RAG database. |
133+
| rag_type | string | Type of RAG database (e.g. `inline::faiss`). |
118134
| embedding_model | string | Embedding model identification |
119135
| embedding_dimension | integer | Dimensionality of embedding vectors. |
120136
| vector_db_id | string | Vector database identification. |
121137
| db_path | string | Path to RAG database. |
138+
| score_multiplier | number | Multiplier applied to relevance scores from this vector store when querying multiple sources. Values > 1 boost results; values < 1 reduce them. Default: 1.0. |
122139

123140

124141
## CORSConfiguration
@@ -170,7 +187,7 @@ Global service configuration.
170187
| azure_entra_id | | |
171188
| splunk | | Splunk HEC configuration for sending telemetry events. |
172189
| deployment_environment | string | Deployment environment name (e.g., 'development', 'staging', 'production'). Used in telemetry events. |
173-
| solr | | Configuration for Solr vector search operations. |
190+
| rag | | RAG strategy configuration (OKP and BYOK). Controls pre-query (Inline RAG) and tool-based (Tool RAG) retrieval. |
174191

175192

176193
## ConversationHistoryConfiguration
@@ -520,19 +537,60 @@ the service can handle requests concurrently.
520537
| cors | | Cross-Origin Resource Sharing configuration for cross-domain requests |
521538

522539

523-
## SolrConfiguration
540+
## RagConfiguration
541+
542+
543+
Top-level RAG strategy configuration. Controls two complementary retrieval modes:
544+
545+
- **Inline RAG**: context is fetched from the listed sources and injected before the
546+
LLM request.
547+
- **Tool RAG**: the LLM can call the `file_search` tool during generation to retrieve
548+
context on demand from the listed vector stores. Supports both BYOK and OKP.
549+
550+
Each strategy is configured as a list of RAG IDs referencing entries in `byok_rag`.
551+
The special ID `okp` activates the OKP provider (no `byok_rag` entry needed).
552+
553+
**Backward compatibility**: omitting `tool` uses all registered BYOK vector stores
554+
(equivalent to the old `tool.byok.enabled = True`). Omitting `inline` means no
555+
context is injected before the LLM request.
556+
557+
Example:
558+
559+
```yaml
560+
rag:
561+
inline:
562+
- my-docs # inject context from my-docs before the LLM request
563+
tool:
564+
- okp # LLM can search OKP as a tool
565+
- my-docs # LLM can also search my-docs as a tool
566+
567+
okp:
568+
offline: true # use parent_id for OKP URL construction
569+
```
570+
571+
572+
| Field | Type | Description |
573+
|-------|------|-------------|
574+
| inline | list[string] | RAG IDs whose content is injected before the LLM request. Use `okp` for OKP. Empty by default (no inline RAG). |
575+
| tool | list[string] or null | RAG IDs exposed as a `file_search` tool the LLM can invoke. Use `okp` to include OKP. When omitted, all registered BYOK vector stores are used (backward compatibility). |
576+
524577

578+
## OkpConfiguration
525579

526-
Solr configuration for vector search queries.
580+
OKP (Offline Knowledge Portal) provider settings. Only used when `okp` is listed in `rag.inline` or `rag.tool`.
527581

528-
Controls whether to use offline or online mode when building document URLs
529-
from vector search results, and enables/disables Solr vector IO functionality.
582+
Example:
530583

584+
```yaml
585+
okp:
586+
offline: true # use parent_id for OKP URL construction
587+
chunk_filter_query: "is_chunk:true"
588+
```
531589

532590
| Field | Type | Description |
533591
|-------|------|-------------|
534-
| enabled | boolean | When True, enables Solr vector IO functionality for vector search queries. When False, disables Solr vector search processing. |
535-
| offline | boolean | When True, use parent_id for chunk source URLs. When False, use reference_url for chunk source URLs. |
592+
| offline | boolean | When `true` (default), use `parent_id` for OKP chunk source URLs. When `false`, use `reference_url`. |
593+
| chunk_filter_query | string | OKP filter query (`fq`) applied to every OKP search request. Defaults to `"is_chunk:true"`. Extend with `AND` for extra constraints. |
536594

537595

538596
## SplunkConfiguration

docs/openapi.json

Lines changed: 67 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -5503,6 +5503,13 @@
55035503
"format": "file-path",
55045504
"title": "DB path",
55055505
"description": "Path to RAG database."
5506+
},
5507+
"score_multiplier": {
5508+
"type": "number",
5509+
"exclusiveMinimum": 0.0,
5510+
"title": "Score multiplier",
5511+
"description": "Multiplier applied to relevance scores from this vector store. Used to weight results when querying multiple knowledge sources. Values > 1 boost this store's results; values < 1 reduce them.",
5512+
"default": 1.0
55065513
}
55075514
},
55085515
"additionalProperties": false,
@@ -5714,17 +5721,15 @@
57145721
"description": "Deployment environment name (e.g., 'development', 'staging', 'production'). Used in telemetry events.",
57155722
"default": "development"
57165723
},
5717-
"solr": {
5718-
"anyOf": [
5719-
{
5720-
"$ref": "#/components/schemas/SolrConfiguration"
5721-
},
5722-
{
5723-
"type": "null"
5724-
}
5725-
],
5726-
"title": "Solr configuration",
5727-
"description": "Configuration for Solr vector search operations."
5724+
"rag": {
5725+
"$ref": "#/components/schemas/RagConfiguration",
5726+
"title": "RAG configuration",
5727+
"description": "Configuration for all RAG strategies (inline and tool-based)."
5728+
},
5729+
"okp": {
5730+
"$ref": "#/components/schemas/OkpConfiguration",
5731+
"title": "OKP configuration",
5732+
"description": "OKP provider settings. Only used when 'okp' is listed in rag.inline or rag.tool."
57285733
}
57295734
},
57305735
"additionalProperties": false,
@@ -7575,6 +7580,26 @@
75757580
"title": "OAuthFlows",
75767581
"description": "Defines the configuration for the supported OAuth 2.0 flows."
75777582
},
7583+
"OkpConfiguration": {
7584+
"properties": {
7585+
"offline": {
7586+
"type": "boolean",
7587+
"title": "OKP offline mode",
7588+
"description": "When True, use parent_id for OKP chunk source URLs. When False, use reference_url for chunk source URLs.",
7589+
"default": true
7590+
},
7591+
"chunk_filter_query": {
7592+
"type": "string",
7593+
"title": "OKP chunk filter query",
7594+
"description": "OKP filter query applied to every OKP search request. Defaults to 'is_chunk:true' to restrict results to chunk documents. To add extra constraints, extend the expression using boolean syntax, e.g. 'is_chunk:true AND product:*openshift*'.",
7595+
"default": "is_chunk:true"
7596+
}
7597+
},
7598+
"additionalProperties": false,
7599+
"type": "object",
7600+
"title": "OkpConfiguration",
7601+
"description": "OKP (Offline Knowledge Portal) provider configuration.\n\nControls provider-specific behaviour for the OKP vector store.\nOnly relevant when ``\"okp\"`` is listed in ``rag.inline`` or ``rag.tool``."
7602+
},
75787603
"OpenIdConnectSecurityScheme": {
75797604
"properties": {
75807605
"description": {
@@ -8749,6 +8774,37 @@
87498774
"title": "RHIdentityConfiguration",
87508775
"description": "Red Hat Identity authentication configuration."
87518776
},
8777+
"RagConfiguration": {
8778+
"properties": {
8779+
"inline": {
8780+
"items": {
8781+
"type": "string"
8782+
},
8783+
"type": "array",
8784+
"title": "Inline RAG IDs",
8785+
"description": "RAG IDs whose sources are injected as context before the LLM call. Use 'okp' to enable OKP inline RAG. Empty by default (no inline RAG)."
8786+
},
8787+
"tool": {
8788+
"anyOf": [
8789+
{
8790+
"items": {
8791+
"type": "string"
8792+
},
8793+
"type": "array"
8794+
},
8795+
{
8796+
"type": "null"
8797+
}
8798+
],
8799+
"title": "Tool RAG IDs",
8800+
"description": "RAG IDs made available to the LLM as a file_search tool. Use 'okp' to include the OKP vector store. When omitted, all registered BYOK vector stores are used (backward compatibility)."
8801+
}
8802+
},
8803+
"additionalProperties": false,
8804+
"type": "object",
8805+
"title": "RagConfiguration",
8806+
"description": "RAG strategy configuration.\n\nControls which RAG sources are used for inline and tool-based retrieval.\n\nEach strategy lists RAG IDs to include. The special ID ``\"okp\"`` defined in constants,\nactivates the OKP provider; all other IDs refer to entries in ``byok_rag``.\n\nBackward compatibility:\n - ``inline`` defaults to ``[]`` (no inline RAG).\n - ``tool`` defaults to ``None`` which means all registered vector stores\n are used (identical to the previous ``tool.byok.enabled = True`` default)."
8807+
},
87528808
"ReadinessResponse": {
87538809
"properties": {
87548810
"ready": {
@@ -9260,26 +9316,6 @@
92609316
}
92619317
]
92629318
},
9263-
"SolrConfiguration": {
9264-
"properties": {
9265-
"enabled": {
9266-
"type": "boolean",
9267-
"title": "Solr enabled",
9268-
"description": "When True, enables Solr vector IO functionality for vector search queries. When False, disables Solr vector search processing.",
9269-
"default": false
9270-
},
9271-
"offline": {
9272-
"type": "boolean",
9273-
"title": "Offline mode",
9274-
"description": "When True, use parent_id for chunk source URLs. When False, use reference_url for chunk source URLs.",
9275-
"default": true
9276-
}
9277-
},
9278-
"additionalProperties": false,
9279-
"type": "object",
9280-
"title": "SolrConfiguration",
9281-
"description": "Solr configuration for vector search queries.\n\nControls whether to use offline or online mode when building document URLs\nfrom vector search results, and enables/disables Solr vector IO functionality."
9282-
},
92839319
"SplunkConfiguration": {
92849320
"properties": {
92859321
"enabled": {

0 commit comments

Comments
 (0)