Skip to content

Commit 18c6743

Browse files
committed
OSDOCS-18945: adds troubleshooting MCP gateway
1 parent f19c445 commit 18c6743

14 files changed

Lines changed: 1153 additions & 1 deletion
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *mcp_gateway_config/mcp-gateway-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="con-mcp-gateway-ts-gateway-routing_{context}"]
7+
= Gateway and routing troubleshooting
8+
9+
[role="_abstract"]
10+
When traffic is not flowing after you installed {mcpg}, you can investigate each component of the gateway routing to check system health. Depending on the errors you are receiving, you can troubleshoot at several layers. Breaks can occur at the gateway, route, or policy levels.
11+
12+
If you have a `Connection Refused/Timeout` error and your client cannot reach the IP address, the cause might be the
13+
listener. In this case, one of the following situations likely applies:
14+
15+
* The port is not open.
16+
* The load balancer has not assigned an IP address.
17+
* The TLS handshake is failing.
18+
19+
When you have this type of error, check the listener first.
20+
21+
If can connect to the `Gateway `object, but you get an `HTTP` error, such as `404`, the cause can be a problem with the `HTTPRoute` custom resource (CR). The route exists, but the `Gateway` object has rejected it or the connection has failed. When you get these types of codes, check the `HTTPRoute` CR first.
22+
23+
If requests either fail with a `503` error or bypass the router, this means that the route is recognized, but the connection to the backend failed or was not authorized properly. In this case, start with API-level checks and narrow your investigation to Envoy filters as needed.
24+
25+
If the `EnvoyFilter` is not present, it usually means one of the following situations has occurred:
26+
27+
* The `Gateway` CR status is not `Programmed`.
28+
* There is a `labels` mismatch, and the MCP controller logic does not send any pods through the filter.
29+
* The MCP controller is crashing or stuck.

modules/proc-mcp-gateway-register-mcp-server.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,11 @@ spec:
9393
* Replace the `spec.targetRef.namespace:` field value with the namespace where your `HTTPRoute` CR is applied. In this example, `_<mcp_test>_` is used.
9494
* Replace the `credentialRef.name:` field value with the name of your `Secret` CR. In this example, `_<mcp_server_one_secret>_` is used. You can omit this parameter if your MCP server does not require authentication or authorization.
9595
* For more information about these parameters, see "Understanding the `MCPServerRegistration` custom resource."
96+
+
97+
[IMPORTANT]
98+
====
99+
A `toolPrefix` value cannot include spaces or special characters.
100+
====
96101
97102
. Apply the CR by running the following command:
98103
+
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *mcp_gateway_config/mcp-gateway-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="proc-mcp-gateway-ts-authn-issues_{context}"]
7+
= Troubleshooting {mcpg} authentication issues
8+
9+
[role="_abstract"]
10+
Authentication errors can happen in a variety of ways, including silent failures, broken sessions, and tool-access denials, depending on your backend MCP server setup. You can troubleshoot common problems by checking your connections and custom resource (CR) configurations.
11+
12+
.Prerequisites
13+
14+
* You installed {mcpg}.
15+
* You installed {prodname}.
16+
* You installed the {oc-first}.
17+
* You configured a `Gateway` object.
18+
* You configured an `HTTPRoute` object for the gateway.
19+
* You registered an MCP server.
20+
* You created a `Secret` CR for authentication.
21+
22+
.Procedure
23+
24+
. When your clients cannot discover OAuth configuration, discovery is not working. Use the following steps to troubleshoot this situation:
25+
26+
.. Retrieve the JSON object that lists the security requirements for your backend MCP server by running the following command:
27+
+
28+
[source,terminal,subs="+quotes"]
29+
----
30+
$ curl http://_<mcp_hostname>_/_<.well-known/oauth-protected-resource>_
31+
----
32+
+
33+
* Replace `_<mcp_hostname>_` with the name of the host for your MCP server.
34+
* Replace `_<.well-known/oauth-protected-resource>_` with the reserved path and JSON file that describes the OAuth 2.0 security requirements for the MCP server.
35+
+
36+
.Example output
37+
[source,text]
38+
----
39+
{
40+
"resource": "https://mcp.example.com",
41+
"authorization_servers": [
42+
"https://auth.provider.com/oauth2/default"
43+
],
44+
"scopes_supported": ["mcp:tools", "mcp:resources"],
45+
"bearer_methods_supported": ["header"],
46+
"resource_documentation": "https://docs.example.com/mcp-help"
47+
}
48+
----
49+
50+
.. Check that your associated `HTTPRoute` CR includes a path for your `/.well-known/oauth-protected-resource` by running the following command:
51+
+
52+
[source,terminal,subs="+quotes"]
53+
----
54+
$ oc get httproute _<route_name>_ -o jsonpath='{.spec.rules[*].matches[*].path.value}'
55+
----
56+
+
57+
Replace `_<route_name>_` with the associated `HTTPRoute` CR.
58+
59+
.. Check the specific `AuthPolicy` CR configuration by running the following command:
60+
+
61+
[source,terminal,subs="+quotes"]
62+
----
63+
$ oc describe authpolicy _<policy_name>_ -n _<namespace>_
64+
----
65+
66+
.. Check that you excluded all `/.well-known` paths from your `AuthPolicy` CR by trying to reach the endpoint without any credentials by using the following command:
67+
+
68+
[source,terminal,subs="+quotes"]
69+
----
70+
$ curl -o /dev/null -s -w "%{http_code}\n" http://_<mcp_hostname>_/_<.well-known/oauth-protected-resource>_
71+
----
72+
+
73+
* Replace `_<mcp_hostname>_` with the name of the host for your MCP server.
74+
* Replace `_<.well-known/oauth-protected-resource>_` with the reserved path and JSON file that describes the OAuth 2.0 security requirements for the MCP server.
75+
+
76+
[NOTE]
77+
====
78+
The following codes are examples of possible outputs:
79+
80+
* `200`: Means that the exclusion exists and matches.
81+
* `401`: Means that your `AuthPolicy` CR is still demanding a token for this path. The exclusion is either not present or not working.
82+
* `404`: The exclusion might be present and working, but the `HTTPRoute` CR does not point to that path to a valid backend.
83+
====
84+
85+
.. Optional. Check all MCP broker component environment variables by running the following command:
86+
+
87+
[source,terminal,subs="+quotes"]
88+
----
89+
$ oc get deployment _<mcp_gateway>_ -n _<mcp_system>_ -o yaml | grep -A 10 env
90+
----
91+
* Replace `_<mcp_gateway>_` with the name of your {mcpg} deployment.
92+
* Replace `_<mcp_system>_` with the namespace where your {mcpg} deployment is applied.
93+
//Q: do we actually make Deployment or DeploymentConfig objects for MCP gateway? If yes, when?
94+
95+
.. Check that the `OAUTH_*` environment variables are set on your MCP broker component.
96+
+
97+
[source,terminal,subs="+quotes"]
98+
----
99+
$ oc set env deployment/_<mcp_gateway>_ --list
100+
----
101+
+
102+
Replace `_<mcp_gateway>_` with the name of your {mcpg} deployment.
103+
104+
.. Verify that the MCP broker component pod restarted after any environment variable changes.
105+
106+
. If your valid tokens are being rejected with `401` errors, your JWT token validation is failing. Use the following steps to troubleshoot this situation:
107+
108+
.. List the `AuthPolicy` CRs by running the following command:
109+
+
110+
[source,terminal]
111+
----
112+
$ oc get authpolicy -A
113+
----
114+
115+
.. Check the specific `AuthPolicy` CR configuration by running the following command:
116+
+
117+
[source,terminal,subs="+quotes"]
118+
----
119+
$ oc describe authpolicy _<policy_name>_ -n _<namespace>_
120+
----
121+
122+
.. Verify that the `issuerUrl` in the `AuthPolicy` CR matches your identity provider's `realm`.
123+
124+
.. Check the Authorino Operator logs by running the following command:
125+
+
126+
[source,terminal,subs="+quotes"]
127+
----
128+
$ oc logs -n kuadrant-system -l authorino-resource=authorino
129+
----
130+
131+
.. Decode JWT to verify claims by running the following command:
132+
+
133+
[source,terminal,subs="+quotes"]
134+
----
135+
$ echo "_<example_token>_" | cut -d. -f2 | base64 -d | jq
136+
----
137+
+
138+
Replace `_<example_token>_` with your token.
139+
140+
.. Ensure that your issuer URL is reachable from the cluster by using the `cluster-local` service name.
141+
142+
.. Check the token expiration time by examining the `exp` claim.
143+
144+
.. Verify the audience, if required, by using an `aud` claim.
145+
146+
.. Ensure that the token includes all required claims such as `groups`, `email`, and so on.
147+
148+
. If your `401 Unauthorized` responses do not include OAuth discovery information, the `WWW-Authenticate` header is missing. This usually means that your `AuthPolicy` CR is not properly configured. Use the following steps to troubleshoot this situation:
149+
150+
.. Isolate the failure point by using verbose output which lists the TLS handshake, the HTTP headers, and the server response code by running the following command:
151+
+
152+
[source,terminal,subs="+quotes"]
153+
----
154+
$ curl -v http://_<mcp_hostname>_/mcp \
155+
-H "Content-Type: application/json" \
156+
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/list"}'
157+
----
158+
+
159+
Replace `_<mcp_hostname>_` with the hostname for your backend MCP server.
160+
161+
.. Verify that your `AuthPolicy` CR `spec.response.unauthenticated.headers:` list includes `response.unauthenticated.headers.WWW-Authenticate`.
162+
163+
.. Check that your `AuthPolicy` CR `response.unauthenticated.headers.WWW-Authenticate.value` includes the correct `metadata`.
164+
165+
.. Ensure that your `AuthPolicy` CR is applied to the correct `Gateway` object and listener.
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *mcp_gateway_config/mcp-gateway-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="proc-mcp-gateway-ts-authz-issues_{context}"]
7+
= Troubleshooting {mcpg} authorization issues
8+
9+
[role="_abstract"]
10+
Authorization errors can happen in a variety of ways, including an authenticated user getting `403` errors for all tool calls, authorization checks not enforced, or authorization failing with `CEL` evaluation errors. You can troubleshoot these problems by checking your configurations and troubleshooting `CEL`.
11+
12+
.Prerequisites
13+
14+
* You installed {mcpg}.
15+
* You installed {prodname}.
16+
* You installed the {oc-first}.
17+
* You configured a `Gateway` object.
18+
* You configured an `HTTPRoute` object for the gateway.
19+
* You registered an MCP server.
20+
* You created a `Secret` CR for authentication.
21+
22+
.Procedure
23+
24+
. Check your `AuthPolicy` custom resource (CR) authorization rules by running the following command:
25+
+
26+
[source,terminal,subs="+quotes"]
27+
----
28+
$ oc get authpolicy _<policy_name>_ -n _<namespace>_ -o yaml | grep -A 20 authorization
29+
----
30+
+
31+
* Replace `_<policy_name>_` with the name of your `AuthPolicy` CR.
32+
* Replace `_<namespace>_ `with the namespace where your `AuthPolicy` CR is applied.
33+
34+
. Check the Authorino Operator logs for CEL evaluation by running the following command:
35+
+
36+
[source,terminal,subs="+quotes"]
37+
----
38+
$ oc logs -n kuadrant-system -l authorino-resource=authorino | grep -i authz
39+
----
40+
41+
. Ensure that the Authorino Operator can communicate with your identity server.
42+
43+
. Verify that your JWT token includes `resource_access[server-name].roles` claims.
44+
45+
. When your authorization checks are not enforced, first check your `AuthPolicy` CR status by running the following command:
46+
+
47+
[source,terminal,subs="+quotes"]
48+
----
49+
$ oc describe authpolicy _<policy_name>_ -n _<namespace>_
50+
----
51+
+
52+
* Replace `_<policy_name>_` with the name of your `AuthPolicy` CR.
53+
* Replace `_<namespace>_ `with the namespace where your `AuthPolicy` CR is applied.
54+
55+
. Next, verify your `AuthPolicy` CR targets the correct resource by running the following command:
56+
+
57+
[source,terminal,subs="+quotes"]
58+
----
59+
$ oc get authpolicy _<policy_name>_ -n _<namespace>_ -o yaml | grep -A 5 targetRef
60+
----
61+
+
62+
* Replace `_<policy_name>_` with the name of your `AuthPolicy` CR.
63+
* Replace `_<namespace>_ `with the namespace where your `AuthPolicy` CR is applied.
64+
65+
. Ensure that your `AuthPolicy` CR `targetRef` matches your `Gateway` object name and namespace by running the following command:
66+
+
67+
[source,terminal,subs="+quotes"]
68+
----
69+
$ echo "--- AuthPolicy Targets ---" && \
70+
oc get authpolicy -n _<mcp_system>_ -o jsonpath='{range .items[*]}{.metadata.name}{"\t targets -> \t"}{.spec.targetRef.kind}{"/"}{.spec.targetRef.name}{"\n"}{end}' && \
71+
echo "--- Actual Gateways ---" && \
72+
oc get gateway -n _<mcp_system>_ -o custom-columns=NAME:.metadata.name
73+
----
74+
+
75+
Replace `_<mcp_system>_` with the name of your {mcpg} deployment.
76+
77+
. If your `AuthPolicy` CR and your `Gateway` object are in different namespaces, you must either move the `AuthPolicy` CR into the same namespace as the Gateway object, or target the `HTTPRoute` CR instead.
78+
79+
. Check that your `AuthPolicy` CR `sectionName` matches your `Gateway` object's listener name.
80+
+
81+
[source,terminal,subs="+quotes"]
82+
----
83+
$ oc describe authpolicy _<auth_policy_name>_ -n _<mcp_system>_
84+
----
85+
+
86+
Replace `_<auth_policy_name>_` with the name of your `AuthPolicy` CR.
87+
Replace `_<mcp_system>_` with the name of your {mcpg} deployment.
88+
89+
. Examine the `Status` block for an entry about your listener. If the `sectionName` is wrong, the policy shows `"Accepted"`, but the policy does not affect the intended traffic path.
90+
91+
. Check that Kuadrant Operator is working by running the following command:
92+
+
93+
[source,terminal,subs="+quotes"]
94+
----
95+
$ oc get pods -n kuadrant-system
96+
----
97+
+
98+
.Example output
99+
[source,text]
100+
----
101+
NAME READY STATUS RESTARTS AGE
102+
authorino-78c5679f94-abc12 1/1 Running 0 5d
103+
dns-operator-controller-manager-5d4789f6-x1y2z 1/1 Running 0 5d
104+
kuadrant-operator-controller-manager-8495bc4d-98765 1/1 Running 0 5d
105+
limitador-67f89bc5d4-z9w8v 1/1 Running 0 5d
106+
----
107+
+
108+
* If the `authorino-*` pod shows `CrashLoopBackOff`, it either cannot reach your OIDC issuer or has an invalid configuration.
109+
* If the `kuadrant-operator-controller-manager-*` pod is down, any changes you make to your `AuthPolicy` CR cannot be applied to the Gateway object because the controller pod reconciles your `AuthPolicy` CR.
110+
111+
. Remedy pod issues as required.
112+
113+
. Check the Authorino Operator logs for `CEL` errors by running the following command:
114+
+
115+
[source,terminal,subs="+quotes"]
116+
----
117+
$ oc logs -n kuadrant-system -l authorino-resource=authorino | grep -i cel
118+
----
119+
120+
. Verify the CEL syntax in your authorization `rules`.
121+
122+
. Check that referenced fields exist, such as `auth.identity.groups`.
123+
124+
. Ensure that the `metadata` `source` is accessible and returns the expected structure.
125+
126+
. Test `CEL` expression syntax using online validators.
127+
128+
. Add persistent logging to understand the `CEL` evaluation context.

0 commit comments

Comments
 (0)