Skip to content

Commit 10e6223

Browse files
committed
OSDOCS-18945: adds troubleshooting MCP gateway
1 parent cab138f commit 10e6223

8 files changed

Lines changed: 594 additions & 1 deletion
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *mcp_gateway_config/mcp-gateway-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="con-mcp-gateway-ts-gateway-routing_{context}"]
7+
= Gateway and routing troubleshooting
8+
9+
[role="_abstract"]
10+
When traffic is not flowing after you installed {mcpg}, you can investigate each component of the gateway routing to check system health. Depending on the errors you are receiving, you can troubleshoot at several layers. Breaks can occur at the gateway, route, or policy levels.
11+
12+
If you have a `Connection Refused/Timeout` error and your client cannot reach the IP address, the cause might be the
13+
listener. In this case, one of the following situations likely applies:
14+
15+
* The port is not open.
16+
* The load balancer has not assigned an IP address.
17+
* The TLS handshake is failing.
18+
19+
When you have this type of error, check the listener first.
20+
21+
If can connect to the `Gateway `object, but you get an `HTTP` error, such as `404`, the cause can be a problem with the `HTTPRoute` custom resource (CR). The route exists, but the `Gateway` object has rejected it or the connection has failed. When you get these types of codes, check the `HTTPRoute` CR first.
22+
23+
If requests either fail with a `503` error or bypass the router, this means that the route is recognized, but the connection to the backend failed or was not authorized properly. In this case, start with API-level checks and narrow your investigation to Envoy filters as needed.
24+
25+
If the `EnvoyFilter` is not present, it usually means one of the following situations has occurred:
26+
27+
* The `Gateway` CR status is not `Programmed`.
28+
* There is a `labels` mismatch, and the MCP controller logic does not send any pods through the filter.
29+
* The MCP controller is crashing or stuck.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *mcp_gateway_config/mcp-gateway-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="proc-mcp-gateway-ts-extension-not-ready_{context}"]
7+
= Troubleshooting an MCPGatewayExtension status of not ready
8+
9+
[role="_abstract"]
10+
You can troubleshoot when your `MCPGatewayExtension` custom resource (CR) shows a `Ready: False` state by running a few commands.
11+
12+
Common causes include the following errors and indicate an associated action:
13+
14+
* `InvalidMCPGatewayExtension`: This often means that the `targetRef` points to a Gateway object that does not exist, or you have a typing error in the `kind` or `group`.
15+
* `ReferenceGrantRequired`: This occurs if your extension is in one namespace but is trying to target a `Gateway` object in another. To fix this, you must apply a `ReferenceGrant` in the `Gateway` object namespace.
16+
* `Conflict`: Only one `MCPGatewayExtension` can target a specific `Gateway` object. If another extension is already pointing to the `Gateway` object you configured with a new extension, the new one fails.
17+
18+
.Prerequisites
19+
20+
* You installed {mcpg}.
21+
* You installed {prodname}.
22+
* You configured a `Gateway` object.
23+
* You configured an `HTTPRoute` object for the gateway.
24+
* You installed the {oc-first}.
25+
* You registered an MCP server.
26+
27+
.Procedure
28+
29+
. Check the status of the specific `MCPGatewayExtension` CR by running the following command:
30+
+
31+
[source,terminal,subs="+quotes"]
32+
----
33+
$ oc get mcpgatewayextension -n _<namespace>_
34+
----
35+
+
36+
Replace `_<namespace>_` with the namespace where your `MCPGatewayExtension` CR is applied.
37+
38+
. Check for conflicting `MCPGatewayExtension` CRs by running the following command:
39+
+
40+
[source,terminal,subs="+quotes"]
41+
----
42+
$ oc get mcpgatewayextension -A
43+
----
44+
+
45+
There must be only one extension per namespace.
46+
47+
. Verify that the target Gateway object exists by running the following command:
48+
+
49+
[source,terminal,subs="+quotes"]
50+
----
51+
$ oc get gateway -n _<gateway-namespace>_
52+
----
53+
+
54+
Replace `_<gateway-namespace>_` with the namespace where your `Gateway` object is applied.
55+
56+
. Check the `MCPServerRegistration` CR namespace and route status by running the following command:
57+
+
58+
[source,terminal,subs="+quotes"]
59+
----
60+
$ oc describe mcpsr _<mcpsr_name>_ -n _<mcpsr_namespace>_
61+
----
62+
+
63+
* Replace `_<mcpsr_name>_` with the names of your `MCPServerRegistration` CR is applied.
64+
* Replace `_<mcpsr_namespace>_` with the namespace where your `MCPServerRegistration` CR is applied.
65+
66+
. Verify that an `MCPGatewayExtension` CR exists in the same namespace as the `MCPServerRegistration` CR. If the two do not share the same namespace, you must create a `ReferenceGrant` CR.
67+
68+
. Ensure that the `MCPGatewayExtension` CR targets the `Gateway` object that the `HTTPRoute` CR is attached to.
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
2+
// Module included in the following assemblies:
3+
//
4+
// *mcp_gateway_config/mcp-gateway-troubleshooting.adoc
5+
6+
:_mod-docs-content-type: PROCEDURE
7+
[id="proc-mcp-gateway-ts-gateway-listener-not-working_{context}"]
8+
= Troubleshooting the gateway listener not working
9+
10+
[role="_abstract"]
11+
If your {mcpg} cannot reach an MCP endpoint at configured hostname, the cause might be that the `Listener` custom resource (CR) you configured is not working. You can troubleshoot this situation by using a few commands and some insight.
12+
13+
Use the following concepts in conjunction with the commands that follow to solve a non-functioning `Listener` CR:
14+
15+
* Ensure that your `Gateway` object has `Accepted` and `Programmed` conditions set to `True`.
16+
* Verify that the `hostname` in the `Listener `CR matches your DNS or hosts configuration.
17+
18+
.Prerequisites
19+
20+
* You installed {mcpg}.
21+
* You installed {prodname}.
22+
* You configured a `Gateway` object.
23+
* You configured an `HTTPRoute` object for the gateway.
24+
* You installed the {oc-first}.
25+
* You registered an MCP server.
26+
27+
.Procedure
28+
29+
. Check the general `Gateway` object configuration by running the following command:
30+
+
31+
[source,terminal]
32+
----
33+
$ oc get gateway -A
34+
----
35+
+
36+
This command returns general information about all `Gateway` objects in the cluster. If the `Gateway` object you are troubleshooting does exist, the command returns the `gatewayClassName` is it using, whether or not it has an IP address or hostname assigned, and a `status`, such as `Ready`, `Programmed`, or `Pending`.
37+
38+
. Check the full metadata and status history for one specific `Gateway` object by running the following command:
39+
+
40+
[source,terminal,subs="+quotes"]
41+
----
42+
$ oc describe gateway _<gateway_name>_ -n _<namespace>_
43+
----
44+
+
45+
* Replace `_<gateway_name>_` with the name of the `Gateway` object.
46+
* Replace `_<namespace>_` with the namespace where the `Gateway` object is applied.
47+
* This command can help you figure out why a `Gateway` object is stuck in `Pending` by checking for port conflicts and verifying that `SSL/TLS` certificates are correctly attached to `Listener` CRs.
48+
49+
. Verify the `Listener` CR configuration by running the following command:
50+
+
51+
[source,terminal,subs="+quotes"]
52+
----
53+
$ oc get gateway _<gateway_name>_ -n _<namespace>_ -o yaml | grep -A 10 listeners
54+
----
55+
+
56+
* Replace `_<gateway_name>_` with the name of the `Gateway` object.
57+
* Replace `_<namespace>_` with the namespace where the `Gateway object` is applied.
58+
59+
. Check all of your `Listener` CR configurations at the same time by running the following command:
60+
+
61+
[source,terminal,subs="+quotes"]
62+
----
63+
$ oc get gateway _<gateway_name>_ -n _<namespace>_ -o jsonpath='{range .spec.listeners[*]}{.name}{"\t"}{.hostname}{"\t"}{.port}{"\n"}{end}' | jq
64+
----
65+
* Replace `_<gateway_name>_` with the name of the `Gateway` object.
66+
* Replace `_<namespace>_` with the namespace where your `Gateway` object is applied.
67+
* This command uses `jq` for formatting.
68+
69+
. Check that the Istio gateway pod is running by using the following command:
70+
+
71+
[source,terminal,subs="+quotes"]
72+
----
73+
$ oc get pods -n _<gateway_system>_ -l istio=ingressgateway
74+
----
75+
+
76+
* Replace `_<gateway_system>_` with the name of your `Gateway` object deployment.
77+
* This command checks the status of Envoy-proxy pods and returns pod, traffic flow, and policy errors.
78+
79+
. Verify that the port you are trying to use is not already in use by running the following command:
80+
+
81+
[source,terminal]
82+
----
83+
$ oc get gateway -A -o yaml | grep "port:"
84+
----
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *mcp_gateway_config/mcp-gateway-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="proc-mcp-gateway-ts-on-prem-mcp-server_{context}"]
7+
= Troubleshooting on-premise MCP server registration issues
8+
9+
[role="_abstract"]
10+
When your on-premise MCP server is not discovered by your {mcpg} after you registered it, or you are having trouble with your tools, you can troubleshoot by checking for common problems.
11+
12+
Basic steps include making sure that your backend server is available, routing is applied correctly, and that tool prefix headers are labeled correctly.
13+
14+
.Prerequisites
15+
16+
* You installed {mcpg}.
17+
* You installed {prodname}.
18+
* You configured a `Gateway` object.
19+
* You configured an `HTTPRoute` object for the gateway.
20+
* You installed the {oc-first}.
21+
* You registered an MCP server.
22+
23+
.Procedure
24+
25+
. If tools from your on-premise MCP server do not appear in `tools/list`, check that the MCP server is properly discovered by {mcpg} components by running the following command:
26+
27+
. Check `MCPServerRegistration` CR status by running the following command:
28+
+
29+
[source,terminal]
30+
----
31+
$ oc get mcpsr -A
32+
----
33+
34+
+
35+
[source,terminal,subs="+quotes"]
36+
----
37+
$ oc describe mcpserverregistration _<server_name> -n _<namespace>_
38+
----
39+
40+
. Verify that the `MCPServerRegistration` CR `targetRef` points to correct `HTTPRoute` name and namespace.
41+
42+
. Check that backend MCP server is running by entering the following command:
43+
+
44+
[source,terminal,subs="+quotes"]
45+
----
46+
$ oc get pods -n _<mcp_server_namespace>_
47+
----
48+
49+
. Verify that the backend service exists by running the following command:
50+
+
51+
[source,terminal,subs="+quotes"]
52+
----
53+
$ oc get svc -n _<namespace>_ _<service_name>_
54+
----
55+
56+
. Check that the attached `HTTPRoute` CR has valid backend reference by running the following command:
57+
+
58+
[source,terminal,subs="+quotes"]
59+
----
60+
$ oc describe httproute _<route_name>_
61+
----
62+
63+
. If your MCPServerRegistration CR is discovered, but tools are missing, test the backend server directly
64+
+
65+
----
66+
$ oc run -it --rm debug --image=nicolaka/netshoot --restart=Never -- \
67+
curl -X POST http://<service-name>.<namespace>.svc.cluster.local:<port>/mcp \
68+
-H "mcp-session-id: SESSION_ID" \
69+
-H "Content-Type: application/json" \
70+
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/list"}'
71+
----
72+
+
73+
[NOTE]
74+
====
75+
You might need a valid `mcp-session-id` header set.
76+
====
77+
78+
. Check the broker router logs for errors by running the following command:
79+
+
80+
[source,terminal,subs="+quotes"]
81+
----
82+
$ oc logs -n mcp-system -l app.kubernetes.io/name=mcp-gateway
83+
----
84+
85+
**Solutions**:
86+
- Verify backend MCP server implements `tools/list` method correctly
87+
- Check backend server logs for errors
88+
- Ensure backend server returns valid MCP protocol responses
89+
- Verify `toolPrefix` in MCPServerRegistration spec is valid (no spaces or special chars)
90+
91+
### Tool Prefix Not Applied
92+
93+
**Symptom**: Tools appear without the configured prefix by running the following command:
94+
95+
. Check `MCPServerRegistration` configuration
96+
+
97+
[source,terminal,subs="+quotes"]
98+
----
99+
$ oc get mcpsr <server-name> -n <namespace> -o yaml | grep toolPrefix
100+
----
101+
102+
**Solutions**:
103+
- Ensure `toolPrefix` is set in MCPServerRegistration spec
104+
105+
- Verify no typos in `toolPrefix` field name by running the following command:
106+
107+
. Check controller logs
108+
+
109+
[source,terminal,subs="+quotes"]
110+
----
111+
$ oc logs -n mcp-system deployment/mcp-gateway-controller | grep prefix
112+
----
113+
114+
. Restart the MCP gateway broker component after `MCPServerRegistration` CR changes by running the following command:
115+
+
116+
[source,terminal,subs="+quotes"]
117+
----
118+
$ oc rollout restart deployment/mcp-gateway -n mcp-system
119+
----
120+
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *mcp_gateway_config/mcp-gateway-troubleshooting.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="proc-mcp-gateway-ts-pods-not-starting_{context}"]
7+
= {mcpg} pods not starting
8+
9+
[role="_abstract"]
10+
After installation, if your {mcpg} pods are stuck in one of several states that indicate that they are not starting as expected, you can take several steps to diagnose the problem.
11+
12+
Common causes include the following states and indicate an associated action:
13+
14+
* `ImagePullBackOff`: Check image repository access and credentials.
15+
* `CrashLoopBackOff`: Check the logs for application errors.
16+
* `Pending:` Check resource availability and node capacity.
17+
* `Init Container Failure*:` Check RBAC permissions.
18+
19+
.Prerequisites
20+
21+
* You installed {mcpg}.
22+
* You installed {prodname}.
23+
* You configured a `Gateway` object.
24+
* You configured an `HTTPRoute` object for the gateway.
25+
* You installed the {oc-first}.
26+
* You registered an MCP server.
27+
28+
.Procedure
29+
30+
. Check the pod status by running the following command:
31+
+
32+
[source,terminal,subs="+quotes"]
33+
----
34+
$ oc get pods -n _<mcp_system>_
35+
----
36+
+
37+
Replace `_<mcp_system>_` with the name of the {mcpg} deployment that you are checking.
38+
39+
. Describe problem pods by running the following command:
40+
+
41+
[source,terminal,subs="+quotes"]
42+
----
43+
$ oc describe pod -n _<mcp_system>_ _<pod_name>_
44+
----
45+
+
46+
* Replace `_<mcp_system>_` with the name of the {mcpg} deployment that you are checking.
47+
* Replace `_<pod_name>_` with the name of the pod that you are checking.
48+
49+
. Check the namespace logs by running the following command:
50+
+
51+
[source,terminal,subs="+quotes"]
52+
----
53+
$ oc logs -n _<mcp_system>_ _<pod_name>_
54+
----
55+
+
56+
* Replace `_<mcp_system>_` with the name of the {mcpg} deployment that you are checking.
57+
* Replace `_<pod_name>_` with the name of the pod that you are checking.

0 commit comments

Comments
 (0)