Skip to content

Commit 2a3c8c7

Browse files
committed
add docs for RAG Quickstart pattern
1 parent 7c04403 commit 2a3c8c7

24 files changed

Lines changed: 759 additions & 0 deletions
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
title: RAG AI Quickstart
3+
date: 2026-05-13
4+
tier: sandbox
5+
summary: This pattern deploys the RAG AI Quickstart with test pipelines on CPU or GPU.
6+
rh_products:
7+
- Red Hat OpenShift Container Platform
8+
- Red Hat OpenShift GitOps
9+
- Red Hat OpenShift AI
10+
industries:
11+
- General
12+
aliases: /rag-quickstart/
13+
links:
14+
github: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag
15+
install: getting-started
16+
bugs: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag/issues
17+
feedback: https://docs.google.com/forms/d/e/1FAIpQLScI76b6tD1WyPu2-d_9CCVDr3Fu5jYERthqLKJDUGwqBg7Vcg/viewform
18+
---
19+
:toc:
20+
:imagesdir: /images
21+
:_content-type: ASSEMBLY
22+
include::modules/comm-attributes.adoc[]
23+
24+
include::modules/rag-quickstart-about.adoc[leveloffset=+1]
25+
26+
include::modules/rag-quickstart-architecture.adoc[leveloffset=+1]
27+
28+
[id="next-steps-rag-quickstart"]
29+
== Next steps
30+
31+
* link:rag-quickstart-getting-started[Install this pattern.]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: Cluster sizing
3+
weight: 30
4+
aliases: /rag-quickstart/cluster-sizing/
5+
---
6+
7+
:toc:
8+
:imagesdir: /images
9+
:_content-type: ASSEMBLY
10+
include::modules/comm-attributes.adoc[]
11+
include::modules/ai-quickstart-rag/metadata-ai-quickstart-rag.adoc[]
12+
13+
include::modules/cluster-sizing-template.adoc[]
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
---
2+
title: Customizing this pattern
3+
weight: 20
4+
aliases: /rag-quickstart/customizing/
5+
---
6+
7+
:toc:
8+
:imagesdir: /images
9+
:_content-type: ASSEMBLY
10+
include::modules/comm-attributes.adoc[]
11+
12+
[id="customizing-rag-quickstart"]
13+
== Customizing the RAG AI Quickstart pattern
14+
15+
Without any changes, this pattern runs a CPU-backed LLM and does not require a GPU. This can be limiting in terms of usable models as well as speed, so you might want to use a GPU instead.
16+
17+
[id="enabling-gpu"]
18+
=== Enabling GPU support
19+
20+
To enable GPU support, set `global.device` to `gpu` in `values-global.yaml` and push your changes to GitHub. This adds NFD and the NVIDIA GPU Operator to the pattern installation and enables the models to run using an NVIDIA accelerator.
21+
22+
[NOTE]
23+
====
24+
If you are running this pattern on an OpenShift cluster on AWS, setting `global.device` to `gpu` automatically creates a GPU (`g6.2xlarge`) machine and add it as a worker node to your cluster.
25+
====
26+
27+
[id="changing-models"]
28+
=== Changing models
29+
30+
To update the models, edit `overrides/values-cpu.yaml` (if `global.device` is set to `cpu`) or `overrides/values-gpu.yaml` (if set to `gpu`).
31+
32+
The default CPU-based model is defined as follows:
33+
34+
[source,yaml]
35+
----
36+
global:
37+
models:
38+
llama-3-2-3b-instruct-cpu:
39+
id: meta-llama/Llama-3.2-3B-Instruct
40+
enabled: true
41+
resources:
42+
limits:
43+
cpu: "6"
44+
memory: 48Gi
45+
requests:
46+
cpu: "2"
47+
memory: 24Gi
48+
args:
49+
- --enable-auto-tool-choice
50+
- --chat-template
51+
- /chat-templates/tool_chat_template_llama3.2_json.jinja
52+
- --tool-call-parser
53+
- llama3_json
54+
- --dtype
55+
- auto
56+
- --max-model-len
57+
- "16384"
58+
- --max-num-seqs
59+
- "1"
60+
----
61+
62+
You can change this to any vLLM-compatible model that you have accepted the terms and conditions for with your HuggingFace API token. You can also adjust the resource parameters as needed for your environment.
63+
64+
The runtime defaults to `vllm/vllm-openai:v0.11.1`. If you need a later version, you can override the image:
65+
66+
[source,yaml]
67+
----
68+
llm-service:
69+
deviceConfigs:
70+
gpu:
71+
image: vllm/vllm-openai:nightly
72+
----
73+
74+
[NOTE]
75+
====
76+
The example above sets a GPU-specific container image. To override the CPU-based image instead, use the key `llm-service.deviceConfigs.cpu.image`.
77+
====
78+
79+
[id="multiple-models"]
80+
=== Defining multiple models
81+
82+
You can define multiple LLM models to be served simultaneously. For example:
83+
84+
[source,yaml]
85+
----
86+
global:
87+
models:
88+
deepseek-r1:
89+
id: Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ
90+
enabled: true
91+
resources:
92+
limits:
93+
cpu: "32"
94+
memory: 200Gi
95+
requests:
96+
cpu: "24"
97+
memory: 150Gi
98+
args:
99+
- --reasoning-parser
100+
- deepseek_r1
101+
- --tool-call-parser
102+
- llama3_json
103+
- --enable-auto-tool-choice
104+
- --quantization
105+
- awq_marlin
106+
- --dtype
107+
- float16
108+
- --max-model-len
109+
- "65536"
110+
gpt-oss-120b:
111+
id: openai/gpt-oss-120b
112+
enabled: true
113+
resources:
114+
limits:
115+
cpu: "32"
116+
memory: 200Gi
117+
requests:
118+
cpu: "24"
119+
memory: 150Gi
120+
args:
121+
- --tool-call-parser
122+
- openai
123+
- --enable-auto-tool-choice
124+
----
125+
126+
For a complete list of customizable values, see the link:https://github.com/rh-ai-quickstart/ai-architecture-charts[AI Architecture charts] repository.
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
---
2+
title: Getting started
3+
weight: 10
4+
aliases: /rag-quickstart/getting-started/
5+
---
6+
7+
:toc:
8+
:imagesdir: /images
9+
:_content-type: ASSEMBLY
10+
include::modules/comm-attributes.adoc[]
11+
12+
[id="deploying-rag-quickstart-pattern"]
13+
== Deploying the RAG AI Quickstart pattern
14+
15+
.Prerequisites
16+
17+
* An OpenShift cluster (version 4.18 or later)
18+
** To create an OpenShift cluster, go to the https://console.redhat.com/[Red Hat Hybrid Cloud console].
19+
** Select *OpenShift \-> Red Hat OpenShift Container Platform \-> Create cluster*.
20+
* A https://huggingface.co/[HuggingFace] account with an API token that has read permissions.
21+
** You must accept the terms and conditions for the https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct[meta-llama/Llama-3.2-3B-Instruct] model with the account that the API token belongs to.
22+
* The Helm binary. For instructions, see link:https://helm.sh/docs/intro/install/[Installing Helm].
23+
* Additional installation tool dependencies. For details, see link:https://validatedpatterns.io/learn/quickstart/[Patterns quick start].
24+
25+
[id="preparing-for-deployment"]
26+
== Preparing for deployment
27+
.Procedure
28+
29+
. Fork the link:https://github.com/validatedpatterns-sandbox/ai-quickstart-rag[ai-quickstart-rag] repository on GitHub. You must fork the repository to customize this pattern.
30+
31+
. Clone the forked copy of this repository.
32+
+
33+
[source,terminal]
34+
----
35+
$ git clone git@github.com:your-username/ai-quickstart-rag.git
36+
----
37+
38+
. Go to the root directory of your Git repository:
39+
+
40+
[source,terminal]
41+
----
42+
$ cd ai-quickstart-rag
43+
----
44+
45+
. Run the following command to set the upstream repository:
46+
+
47+
[source,terminal]
48+
----
49+
$ git remote add -f upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git
50+
----
51+
52+
. Verify the setup of your remote repositories by running the following command:
53+
+
54+
[source,terminal]
55+
----
56+
$ git remote -v
57+
----
58+
+
59+
.Example output
60+
+
61+
[source,terminal]
62+
----
63+
origin git@github.com:your-username/ai-quickstart-rag.git (fetch)
64+
origin git@github.com:your-username/ai-quickstart-rag.git (push)
65+
upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git (fetch)
66+
upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git (push)
67+
----
68+
69+
. Make a local copy of the secrets template outside of your repository to hold credentials for the pattern.
70+
+
71+
[WARNING]
72+
====
73+
Do not add, commit, or push this file to your repository. Doing so may expose personal credentials to GitHub.
74+
====
75+
+
76+
Run the following command:
77+
+
78+
[source,terminal]
79+
----
80+
$ cp values-secret.yaml.template ~/values-secret-ai-quickstart-rag.yaml
81+
----
82+
83+
. Populate this file with secrets, or credentials, that are needed to deploy the pattern successfully:
84+
+
85+
[source,terminal]
86+
----
87+
$ vim ~/values-secret-ai-quickstart-rag.yaml
88+
----
89+
90+
.. Edit the `llm-service` section to use your HuggingFace API token:
91+
+
92+
[source,yaml]
93+
----
94+
- name: llm-service
95+
fields:
96+
- name: hf_token
97+
value: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
98+
----
99+
100+
. Optional: To customize the deployment, create and switch to a new branch by running the following command:
101+
+
102+
[source,terminal]
103+
----
104+
$ git checkout -b my-branch
105+
----
106+
+
107+
Make your changes, then stage and commit them:
108+
+
109+
[source,terminal]
110+
----
111+
$ git add <changed-files>
112+
$ git commit -m "Customize deployment"
113+
----
114+
+
115+
Push the changes to your forked repository:
116+
+
117+
[source,terminal]
118+
----
119+
$ git push origin my-branch
120+
----
121+
122+
[id="deploying-cluster-using-patternsh-file"]
123+
== Deploying the pattern by using the pattern.sh file
124+
125+
To deploy the pattern by using the `pattern.sh` file, complete the following steps:
126+
127+
. Log in to your cluster by following this procedure:
128+
129+
.. Obtain an API token by visiting link:https://oauth-openshift.apps.<your_cluster>.<domain>/oauth/token/request[https://oauth-openshift.apps.<your_cluster>.<domain>/oauth/token/request].
130+
131+
.. Log in to the cluster by running the following command:
132+
+
133+
[source,terminal]
134+
----
135+
$ oc login --token=<retrieved-token> --server=https://api.<your_cluster>.<domain>:6443
136+
----
137+
+
138+
Or log in by running the following command:
139+
+
140+
[source,terminal]
141+
----
142+
$ export KUBECONFIG=~/<path_to_kubeconfig>
143+
----
144+
145+
. Deploy the pattern to your cluster. Run the following command:
146+
+
147+
[source,terminal]
148+
----
149+
$ ./pattern.sh make install
150+
----
151+
152+
.Verification
153+
154+
To verify a successful installation, check the health of the ArgoCD applications:
155+
156+
. Run the following command:
157+
+
158+
[source,terminal]
159+
----
160+
$ ./pattern.sh make argo-healthcheck
161+
----
162+
+
163+
It might take several minutes for all applications to synchronize and reach a healthy state. This includes downloading the LLM models and populating the vector database.
164+
165+
. Verify that the Operators are installed by navigating to *Operators -> Installed Operators* in the {ocp} web console.
166+
167+
. After all applications are healthy, open the RAG chatbot UI by clicking the route link in the *Networking -> Routes* page of the `ai-quickstart-rag-prod` namespace.

0 commit comments

Comments
 (0)