Skip to content

Commit a57bdb5

Browse files
gaurav-nelsondminnear-rh
authored andcommitted
fixed minor nits, added troubleshooting section and additional links
1 parent 830c807 commit a57bdb5

7 files changed

Lines changed: 297 additions & 28 deletions

File tree

content/patterns/maas-quickstart/_index.adoc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,4 +33,7 @@ include::modules/maas-quickstart-architecture.adoc[leveloffset=+1]
3333
[id="next-steps-maas-quickstart"]
3434
== Next steps
3535

36-
* link:getting-started[Install this pattern.]
36+
* link:getting-started[Install this pattern]
37+
* link:cluster-sizing[Cluster sizing]
38+
* link:customizing-this-pattern[Customizing this pattern]
39+
* link:troubleshooting[Troubleshooting]

content/patterns/maas-quickstart/cluster-sizing.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ In addition to the worker nodes listed above, this pattern requires at least 2 G
2020
.GPU node minimum requirements
2121
[cols="<,^,<,<"]
2222
|===
23-
| Cloud Provider | Node Type | Number of nodes | Instance Type
23+
| Cloud provider | Node type | Number of nodes | Instance type
2424

2525
| Amazon Web Services
2626
| GPU Worker

content/patterns/maas-quickstart/customizing-this-pattern.adoc

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The pattern serves two models by default:
2222
* `nemotron-3-nano-30b-a3b-fp8` -- Available to premium and enterprise tier users.
2323
* `gpt-oss-20b` -- Available to all user tiers.
2424

25-
To change or add models, edit the `models` list in `overrides/maas-quickstart.yaml`. Models are pulled from OCI registries and do not require a HuggingFace API token.
25+
To change or add models, edit the `models` list in `overrides/maas-quickstart.yaml`. The pattern pulls models from OCI registries and does not require a HuggingFace API token.
2626

2727
The model definitions specify the model URI, resource requirements, GPU tolerations, and vLLM arguments. For example:
2828

@@ -61,7 +61,7 @@ The pattern uses Kuadrant (Red Hat Connectivity Link) to enforce per-tier rate l
6161

6262
[cols="1,1,2",options="header"]
6363
|===
64-
| Tier | Rate Limit | Description
64+
| Tier | Rate limit | Description
6565

6666
| Free
6767
| 5 requests per 2 minutes
@@ -76,7 +76,7 @@ The pattern uses Kuadrant (Red Hat Connectivity Link) to enforce per-tier rate l
7676
| High-throughput workloads
7777
|===
7878

79-
To adjust rate limits, modify the `tiers` section in `overrides/maas-quickstart.yaml`. For example, to increase the premium tier request limit to 40 and the token limit to 20000:
79+
To adjust rate limits, modify the `tiers` section in `overrides/maas-quickstart.yaml`. The following example increases the premium tier request limit to 40 and the token limit to 20000:
8080

8181
[source,yaml]
8282
----
@@ -97,14 +97,14 @@ Push your changes to your forked repository so the GitOps framework applies the
9797
[id="managing-users-maas"]
9898
=== Managing users
9999

100-
User authentication is handled by htpasswd with OpenShift OAuth. The default users are:
100+
htpasswd with OpenShift OAuth handles user authentication. The default users are:
101101

102102
* `admin` -- Full administrative access (enterprise tier)
103103
* `free-user` -- Free tier access
104104
* `premium-user` -- Premium tier access
105105
* `enterprise-user` -- Enterprise tier access
106106

107-
User passwords are stored in the `values-secret.yaml` file and managed through HashiCorp Vault and the External Secrets Operator (ESO). To change a user password after initial deployment, update the secret value in your `values-secret.yaml` file and redeploy the pattern.
107+
{hashicorp-vault} and the {eso-op} store and manage user passwords in the `values-secret.yaml` file. To change a user password after initial deployment, update the secret value in your `values-secret.yaml` file and redeploy the pattern.
108108

109109
To assign users to different tiers, modify the `tiers` section in `overrides/maas-quickstart.yaml`:
110110

@@ -136,8 +136,8 @@ To customize the DevSpaces configuration, you can adjust:
136136
* The inference endpoint URL used by the Continue extension
137137

138138
[id="gpu-node-provisioning-maas"]
139-
=== GPU node provisioning
139+
=== Provisioning GPU nodes
140140

141141
This pattern requires at least 2 NVIDIA GPU nodes with 48 GB or more of VRAM each. On AWS, the pattern automatically provisions `g6e.2xlarge` GPU machine sets with NVIDIA L40S GPUs.
142142

143-
If your cluster does not have GPU nodes, you must add them before deploying the pattern. The pattern installs all required operators, including the NVIDIA GPU Operator, automatically during deployment.
143+
If your cluster does not have GPU nodes, you must add them before you deploy the pattern. The pattern installs all required operators, including the NVIDIA GPU Operator, automatically during deployment.

content/patterns/maas-quickstart/getting-started.adoc

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ include::modules/comm-attributes.adoc[]
1515
.Prerequisites
1616

1717
* An OpenShift cluster (version 4.20 or later). This pattern requires at least 2 NVIDIA GPU nodes with 48 GB or more of VRAM each.
18-
** *AWS*: The pattern automatically provisions 2 `g6e.2xlarge` GPU worker nodes (NVIDIA L40S) during installation. No GPU nodes need to be present before deploying.
19-
** *Other providers and bare metal*: GPU nodes must already be part of the OpenShift cluster before deploying this pattern. The pattern installs all required operators automatically.
18+
** *AWS*: The pattern automatically provisions 2 `g6e.2xlarge` GPU worker nodes (NVIDIA L40S) during installation. No GPU nodes need to be present before you deploy.
19+
** *Other providers and bare metal*: GPU nodes must already be part of the OpenShift cluster before you deploy this pattern. The pattern installs all required operators automatically.
2020
** To create an OpenShift cluster, go to the https://console.redhat.com/[Red Hat Hybrid Cloud console].
2121
** Select *OpenShift \-> Red Hat OpenShift Container Platform \-> Create cluster*.
2222
* The Helm binary. For instructions, see link:https://helm.sh/docs/intro/install/[Installing Helm].
@@ -71,7 +71,7 @@ upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-maas-code-assist
7171
+
7272
[WARNING]
7373
====
74-
Do not add, commit, or push this file to your repository. Doing so may expose personal credentials to GitHub.
74+
Do not add, commit, or push this file to your repository. Doing so might expose personal credentials to GitHub.
7575
====
7676
+
7777
Run the following command:
@@ -184,3 +184,10 @@ $ oc get inferenceservice -A
184184
----
185185

186186
. Access the OpenShift DevSpaces dashboard to confirm the IDE environment is available. Navigate to *Networking -> Routes* in the DevSpaces namespace and open the route URL.
187+
188+
[id="next-steps-getting-started-maas"]
189+
== Next steps
190+
191+
* link:customizing-this-pattern[Customizing this pattern]
192+
* link:cluster-sizing[Cluster sizing]
193+
* link:troubleshooting[Troubleshooting]
Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,264 @@
1+
---
2+
title: Troubleshooting
3+
weight: 40
4+
aliases: /maas-quickstart/troubleshooting/
5+
---
6+
7+
:toc:
8+
:imagesdir: /images
9+
:_content-type: ASSEMBLY
10+
include::modules/comm-attributes.adoc[]
11+
12+
[id="troubleshooting-maas-quickstart"]
13+
== Troubleshooting the MaaS Code Assistant AI Quickstart pattern
14+
15+
Use this page to diagnose and resolve common issues when deploying or operating this pattern.
16+
17+
[id="troubleshooting-prereqs-maas"]
18+
== Prerequisite and tooling issues
19+
20+
[id="troubleshooting-podman-version"]
21+
=== Podman version not supported
22+
23+
The `pattern.sh` script requires Podman 4.3.0 or later. Earlier versions do not support the `--userns=keep-id` flag required for correct UID/GID mapping inside the container.
24+
25+
.Symptom
26+
27+
The script exits with an error referencing the Podman version or `keep-id`.
28+
29+
.Resolution
30+
31+
. Check your Podman version:
32+
+
33+
[source,terminal]
34+
----
35+
$ podman --version
36+
----
37+
38+
. If the version is earlier than 4.3.0, upgrade Podman. For instructions, see the link:https://podman.io/docs/installation[Podman installation documentation].
39+
40+
[id="troubleshooting-kubeconfig"]
41+
=== KUBECONFIG path is outside the HOME directory
42+
43+
The `pattern.sh` script runs inside a container and mounts your `$HOME` directory. If your `KUBECONFIG` file is located outside `$HOME`, the container cannot access it.
44+
45+
.Symptom
46+
47+
The script fails to connect to the cluster or reports that the kubeconfig file cannot be found.
48+
49+
.Resolution
50+
51+
Move your kubeconfig file to a path inside your home directory and export the updated path:
52+
53+
[source,terminal]
54+
----
55+
$ cp <current-kubeconfig-path> ~/kubeconfig
56+
$ export KUBECONFIG=~/kubeconfig
57+
----
58+
59+
[id="troubleshooting-deployment-maas"]
60+
== Deployment issues
61+
62+
[id="troubleshooting-argocd-sync"]
63+
=== ArgoCD applications are not syncing or are unhealthy
64+
65+
After running `./pattern.sh make install`, ArgoCD applications can take 15–30 minutes to reach a healthy state. Model downloads and GPU operator initialization take additional time.
66+
67+
.Symptom
68+
69+
Running `./pattern.sh make argo-healthcheck` reports applications in `Progressing` or `Degraded` state.
70+
71+
.Resolution
72+
73+
. Check which applications are not healthy:
74+
+
75+
[source,terminal]
76+
----
77+
$ oc get applications -n openshift-gitops
78+
----
79+
80+
. Inspect the failing application for error details:
81+
+
82+
[source,terminal]
83+
----
84+
$ oc describe application <application-name> -n openshift-gitops
85+
----
86+
87+
. Check the logs of the ArgoCD application controller:
88+
+
89+
[source,terminal]
90+
----
91+
$ oc logs -n openshift-gitops deployment/openshift-gitops-application-controller
92+
----
93+
94+
. If applications are stuck in `Progressing`, wait an additional 10 minutes and re-run the health check. Model downloads from OCI registries can take significant time depending on network conditions.
95+
96+
[id="troubleshooting-schema-validation"]
97+
=== Values file schema validation fails
98+
99+
The pattern validates `values-*.yaml` files against a schema before deployment.
100+
101+
.Symptom
102+
103+
Running `./pattern.sh make install` fails with a schema validation error.
104+
105+
.Resolution
106+
107+
. Run the validation step independently to see the full error output:
108+
+
109+
[source,terminal]
110+
----
111+
$ ./pattern.sh make validate-schema
112+
----
113+
114+
. Review the error message to identify the malformed field and correct the value in your `values-secret.yaml` or `overrides/maas-quickstart.yaml` file.
115+
116+
[id="troubleshooting-gpu-maas"]
117+
== GPU and inference issues
118+
119+
[id="troubleshooting-gpu-nodes"]
120+
=== GPU nodes are not ready
121+
122+
The NVIDIA GPU Operator must successfully initialize on each GPU node before model serving can start.
123+
124+
.Symptom
125+
126+
Inference service pods remain in `Pending` state, or `oc get inferenceservice -A` shows services not ready.
127+
128+
.Resolution
129+
130+
. Check the status of GPU nodes:
131+
+
132+
[source,terminal]
133+
----
134+
$ oc get nodes -l nvidia.com/gpu.present=true
135+
----
136+
137+
. Check the NVIDIA GPU Operator pods:
138+
+
139+
[source,terminal]
140+
----
141+
$ oc get pods -n nvidia-gpu-operator
142+
----
143+
144+
. Check for driver initialization errors:
145+
+
146+
[source,terminal]
147+
----
148+
$ oc logs -n nvidia-gpu-operator -l app=nvidia-driver-daemonset
149+
----
150+
151+
. If you are using a provider other than AWS, confirm that GPU nodes were present in the cluster before you deployed the pattern. The pattern does not provision GPU nodes on providers other than AWS.
152+
153+
[id="troubleshooting-inference-endpoints"]
154+
=== Inference endpoints are not serving
155+
156+
.Symptom
157+
158+
`oc get inferenceservice -A` shows inference services in a non-ready state, or the Continue AI extension in DevSpaces returns connection errors.
159+
160+
.Resolution
161+
162+
. Check the status of inference services:
163+
+
164+
[source,terminal]
165+
----
166+
$ oc get inferenceservice -A
167+
----
168+
169+
. Check the vLLM model server pod logs for a specific model:
170+
+
171+
[source,terminal]
172+
----
173+
$ oc logs -n redhat-ods-applications -l serving.kserve.io/inferenceservice=<model-name>
174+
----
175+
176+
. Confirm that the GPU nodes have sufficient available VRAM. Each model requires a GPU with at least 48 GB of VRAM. If both models are scheduled on the same node, the node requires at least 96 GB of VRAM or you must use two separate GPU nodes.
177+
178+
[id="troubleshooting-rate-limiting-maas"]
179+
== Rate limiting and authentication issues
180+
181+
[id="troubleshooting-rate-limits"]
182+
=== Rate limiting is not enforced
183+
184+
.Symptom
185+
186+
Requests from all users succeed regardless of the configured rate limits, or requests are blocked for all users.
187+
188+
.Resolution
189+
190+
. Check the status of the Kuadrant operator and Limitador pod:
191+
+
192+
[source,terminal]
193+
----
194+
$ oc get pods -n kuadrant-system
195+
----
196+
197+
. Check the Limitador logs for policy errors:
198+
+
199+
[source,terminal]
200+
----
201+
$ oc logs -n kuadrant-system deployment/limitador
202+
----
203+
204+
. Confirm that rate limit policies are applied correctly:
205+
+
206+
[source,terminal]
207+
----
208+
$ oc get ratelimitpolicy -A
209+
----
210+
211+
[id="troubleshooting-auth-maas"]
212+
=== Users cannot authenticate
213+
214+
.Symptom
215+
216+
Users receive authentication errors when accessing the inference API or DevSpaces.
217+
218+
.Resolution
219+
220+
. Confirm that the htpasswd secret was correctly provisioned by the External Secrets Operator:
221+
+
222+
[source,terminal]
223+
----
224+
$ oc get externalsecret -A
225+
$ oc get secret htpasswd-secret -n openshift-config
226+
----
227+
228+
. If the secret is missing or incorrect, verify that your `values-secret.yaml` file contains the correct passwords for all four users (`admin`, `free-user`, `premium-user`, `enterprise-user`) and redeploy the pattern.
229+
230+
[id="troubleshooting-devspaces-maas"]
231+
== OpenShift DevSpaces issues
232+
233+
[id="troubleshooting-devspaces-connection"]
234+
=== Continue AI extension cannot connect to inference endpoints
235+
236+
.Symptom
237+
238+
Code suggestions are not returned in DevSpaces, or the Continue extension reports a connection error.
239+
240+
.Resolution
241+
242+
. Confirm that the inference services are healthy:
243+
+
244+
[source,terminal]
245+
----
246+
$ oc get inferenceservice -A
247+
----
248+
249+
. Navigate to *Networking -> Routes* in the namespace where the inference services are running and confirm the routes are accessible.
250+
251+
. In DevSpaces, open the Continue extension settings and verify that the endpoint URL matches the route URL for the vLLM service.
252+
253+
[id="troubleshooting-get-help-maas"]
254+
== Getting help
255+
256+
If you cannot resolve an issue using this guide:
257+
258+
* Check the link:https://github.com/validatedpatterns-sandbox/ai-quickstart-maas-code-assistant/issues[GitHub issues] for known problems and workarounds.
259+
* Open a new issue with the output of the following command to help diagnose the problem:
260+
+
261+
[source,terminal]
262+
----
263+
$ oc get pods -A | grep -v Running | grep -v Completed
264+
----

modules/maas-quickstart-about.adoc

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,17 +12,12 @@ Use case::
1212
* Deploy an AI-powered code assistant that provides intelligent code suggestions through an integrated development environment.
1313
* Implement Model-as-a-Service (MaaS) governance with tiered user access, rate limiting, and chargeback capabilities.
1414
* Use a GitOps approach to provision AI inference infrastructure including GPU-accelerated model serving, identity management, and API rate limiting.
15-
+
16-
[NOTE]
17-
====
18-
Based on the requirements of a specific implementation, certain details might differ. However, all Validated Patterns that are based on a portfolio architecture, generalize one or more successful deployments of a use case.
19-
====
2015

2116
Background::
2217

23-
This pattern is scaffolding around the link:https://github.com/rh-ai-quickstart/maas-code-assistant[MaaS Code Assistant AI Quickstart]. It provisions the OpenShift cluster with link:https://www.redhat.com/en/products/ai/openshift-ai[{rhoai}] configured for GPU-accelerated inference using vLLM and llm-d. It deploys the NVIDIA GPU Operator for model serving on GPU nodes and manages secrets through the {solution-name-upstream} framework using HashiCorp Vault and the External Secrets Operator.
18+
This pattern builds on the link:https://github.com/rh-ai-quickstart/maas-code-assistant[MaaS Code Assistant AI Quickstart]. It provisions the OpenShift cluster with link:https://www.redhat.com/en/products/ai/openshift-ai[{rhoai}] configured for GPU-accelerated inference using vLLM and llm-d. It deploys the NVIDIA GPU Operator for model serving on GPU nodes and manages secrets through the {solution-name-upstream} framework using HashiCorp Vault and the External Secrets Operator. This pattern generalizes one or more successful deployments of this use case. Implementation details might vary depending on your specific environment and requirements.
2419

25-
The MaaS Code Assistant enables organizations to offer AI code assistance as an internal service with differentiated access tiers. It demonstrates a production-ready approach to:
20+
Organizations can use the MaaS Code Assistant to offer AI code assistance as an internal service with differentiated access tiers. It demonstrates a production-ready approach to:
2621

2722
- Serving multiple NVIDIA Nemotron language models optimized for code completion and generation
2823
- Enforcing per-user rate limits through Kuadrant (Red Hat Connectivity Link) to manage capacity and enable chargeback
@@ -40,7 +35,7 @@ The solution uses vLLM with llm-d for high-performance inference of NVIDIA Nemot
4035
[id="about-maas-quickstart-technology"]
4136
== About the technology
4237

43-
The following technologies are used in this solution:
38+
This solution uses the following technologies:
4439

4540
https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[{rh-ocp}]::
4641
An enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, public cloud, and edge deployments.

0 commit comments

Comments
 (0)