Skip to content

Commit 7a4d6d1

Browse files
authored
Merge pull request #1178 from Anxhela21/anx/solr-filter
[LCORE-1331] Add Solr filter and update doc
2 parents 31eadea + c087345 commit 7a4d6d1

2 files changed

Lines changed: 43 additions & 1 deletion

File tree

docs/rag_guide.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,14 @@ providers:
282282
content_field: chunk
283283
embedding_dimension: 384
284284
embedding_model: ${env.EMBEDDING_MODEL_DIR}
285+
chunk_window_config:
286+
chunk_parent_id_field: "parent_id"
287+
chunk_content_field: "chunk_field"
288+
chunk_index_field: "chunk_index"
289+
chunk_token_count_field: "num_tokens"
290+
parent_total_chunks_field: "total_chunks"
291+
parent_total_tokens_field: "total_tokens"
292+
chunk_filter_query: "is_chunk:true"
285293
persistence:
286294
namespace: portal-rag
287295
backend: kv_default
@@ -294,6 +302,19 @@ registered_resources:
294302
embedding_dimension: 384
295303
```
296304

305+
Note: if the vector database (portal-rag) is not in the persistent data store within the vector_io provider
306+
(e.g. after deleting the llama stack cache) you will need to register the vector database under registered resources:
307+
308+
309+
```yaml
310+
vector_stores:
311+
- embedding_dimension: 384
312+
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
313+
provider_id: solr-vector
314+
vector_store_id: portal-rag
315+
```
316+
317+
297318
**2. Configure Lightspeed Stack (`lightspeed-stack.yaml`):**
298319

299320
```yaml
@@ -324,6 +345,14 @@ Note: Solr does not currently work with RAG tools. You will need to specify "no_
324345
- **Offline mode**: Uses `parent_id` with Mimir base URL
325346
- **Online mode**: Uses `reference_url` from document metadata
326347
348+
**Query Filtering:**
349+
350+
To filter the Solr context edit the *chunk_filter_query* field in the
351+
Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
352+
ex. `"product:*openshift*"`
353+
354+
Note: This static filter is a temporary work-around.
355+
327356
**Prerequisites:**
328357
329358
- Solr must be running and accessible at the configured URL

run.yaml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,14 @@ providers:
6767
content_field: chunk
6868
embedding_dimension: 384
6969
embedding_model: ${env.EMBEDDING_MODEL_DIR}
70+
chunk_window_config:
71+
chunk_parent_id_field: "parent_id"
72+
chunk_content_field: "chunk_field"
73+
chunk_index_field: "chunk_index"
74+
chunk_token_count_field: "num_tokens"
75+
parent_total_chunks_field: "total_chunks"
76+
parent_total_tokens_field: "total_tokens"
77+
chunk_filter_query: "is_chunk:true"
7078
persistence:
7179
namespace: portal-rag
7280
backend: kv_default
@@ -152,7 +160,11 @@ registered_resources:
152160
- shield_id: llama-guard
153161
provider_id: llama-guard
154162
provider_shield_id: openai/gpt-4o-mini
155-
vector_stores: []
163+
vector_stores:
164+
- embedding_dimension: 384
165+
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
166+
provider_id: solr-vector
167+
vector_store_id: portal-rag
156168
datasets: []
157169
scoring_fns: []
158170
benchmarks: []
@@ -166,3 +178,4 @@ vector_stores:
166178
model_id: nomic-ai/nomic-embed-text-v1.5
167179
safety:
168180
default_shield_id: llama-guard
181+

0 commit comments

Comments
 (0)