-
Notifications
You must be signed in to change notification settings - Fork 1.9k
OSDOCS-18945: adds troubleshooting MCP gateway #110076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: mcp-gateway-docs-tp
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,29 @@ | ||||||
| // Module included in the following assemblies: | ||||||
| // | ||||||
| // *mcp_gateway_config/mcp-gateway-troubleshooting.adoc | ||||||
|
|
||||||
| :_mod-docs-content-type: CONCEPT | ||||||
| [id="con-mcp-gateway-ts-gateway-routing_{context}"] | ||||||
| = Gateway and routing troubleshooting | ||||||
|
|
||||||
| [role="_abstract"] | ||||||
| When traffic is not flowing after you installed {mcpg}, you can investigate each component of the gateway routing to check system health. Depending on the errors you are receiving, you can troubleshoot at several layers. Breaks can occur at the gateway, route, or policy levels. | ||||||
|
|
||||||
| If you have a `Connection Refused/Timeout` error and your client cannot reach the IP address, the cause might be the | ||||||
| listener. In this case, one of the following situations likely applies: | ||||||
|
|
||||||
| * The port is not open. | ||||||
| * The load balancer has not assigned an IP address. | ||||||
| * The TLS handshake is failing. | ||||||
|
|
||||||
| When you have this type of error, check the listener first. | ||||||
|
|
||||||
| If can you connect to the `Gateway` object, but you get an `HTTP` error, such as `404`, the cause can be a problem with the `HTTPRoute` custom resource (CR). The route exists, but the `Gateway` object has rejected it or the connection has failed. When you get these types of codes, check the `HTTPRoute` CR first. | ||||||
|
|
||||||
| If requests either fail with a `503` error or bypass the router, this means that the route is recognized, but the connection to the backend failed or was not authorized properly. In this case, start with API-level checks and narrow your investigation to Envoy filters as needed. | ||||||
|
|
||||||
| If the `EnvoyFilter` is not present, it usually means one of the following situations has occurred: | ||||||
|
|
||||||
| * The `Gateway` CR status is not `Programmed`. | ||||||
| * There is a `labels` mismatch, and the EnvoyFilter is not injected into the pods. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| * The MCP controller is crashing or stuck. | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
|
|
||
| // Module included in the following assemblies: | ||
| // | ||
| // *mcp_gateway_config/mcp-gateway-troubleshooting.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="proc-mcp-gateway-ts-gateway-listener-not-working_{context}"] | ||
| = Troubleshooting the gateway listener not working | ||
|
ShaunaDiaz marked this conversation as resolved.
|
||
|
|
||
| [role="_abstract"] | ||
| If your {mcpg} cannot reach an MCP endpoint at configured hostname, the cause might be that the `Listener` custom resource (CR) you configured is not working. You can troubleshoot this situation by using a few commands and some insight. | ||
|
ShaunaDiaz marked this conversation as resolved.
|
||
|
|
||
| Use the following concepts in conjunction with the commands that follow to solve a non-functioning `Listener` CR: | ||
|
ShaunaDiaz marked this conversation as resolved.
|
||
|
|
||
| * Ensure that your `Gateway` object has `Accepted` and `Programmed` conditions set to `True`. | ||
| * Verify that the `hostname` in the `Listener` CR matches your DNS or hosts configuration. | ||
|
ShaunaDiaz marked this conversation as resolved.
|
||
|
|
||
| .Prerequisites | ||
|
|
||
| * You installed {mcpg}. | ||
| * You installed the {oc-first}. | ||
| * You configured a `Gateway` object. | ||
| * You configured an `HTTPRoute` object for the gateway. | ||
|
|
||
| .Procedure | ||
|
|
||
| . Check the general `Gateway` object configuration by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc get gateway -A | ||
| ---- | ||
| + | ||
| This command returns general information about all `Gateway` objects in the cluster. If the `Gateway` object you are troubleshooting does exist, the command returns the `gatewayClassName` is it using, whether or not it has an IP address or hostname assigned, and a `status`, such as `Ready`, `Programmed`, or `Pending`. | ||
|
ShaunaDiaz marked this conversation as resolved.
|
||
|
|
||
| . Check the full metadata and status history for one specific `Gateway` object by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc describe gateway _<gateway_system>_ -n _<namespace>_ | ||
| ---- | ||
| + | ||
| * Replace `_<gateway_system>_` with the name of the `Gateway` object. | ||
| * Replace `_<namespace>_` with the namespace where the `Gateway` object is applied. | ||
| * This command can help you figure out why a `Gateway` object is stuck in `Pending` by checking for port conflicts and verifying that `SSL/TLS` certificates are correctly attached to `Listener` CRs. | ||
|
|
||
| . Verify the `Listener` CR configuration by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc get gateway _<gateway_system>_ -n _<namespace>_ -o yaml | grep -A 10 listeners | ||
| ---- | ||
| + | ||
| * Replace `_<gateway_system>_` with the name of the `Gateway` object. | ||
| * Replace `_<namespace>_` with the namespace where the `Gateway` object is applied. | ||
|
|
||
| . Check all of your `Listener` CR configurations at the same time by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc get gateway _<gateway_system>_ -n _<namespace>_ -o jsonpath='{range .spec.listeners[*]}{.name}{"\t"}{.hostname}{"\t"}{.port}{"\n"}{end}' | ||
| ---- | ||
| * Replace `_<gateway_system>_` with the name of the `Gateway` object. | ||
| * Replace `_<namespace>_` with the namespace where your `Gateway` object is applied. | ||
|
|
||
| . Check that the Istio gateway pod is running by using the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc get pods -n _<gateway_system>_ -l gateway.istio.io/managed=istio.io-gateway-controller | ||
| ---- | ||
| + | ||
| * Replace `_<gateway_system>_` with the name of your `Gateway` object deployment. | ||
| * This command checks the status of Envoy-proxy pods and returns pod, traffic flow, and policy errors. | ||
|
|
||
| . Verify that the port you are trying to use is not already in use by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc get gateway -A -o yaml | grep "port:" | ||
| ---- | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // *mcp_gateway_config/mcp-gateway-troubleshooting.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="proc-mcp-gateway-ts-pods-not-starting_{context}"] | ||
| = {mcpg} pods not starting | ||
|
|
||
| [role="_abstract"] | ||
| After installation, if your {mcpg} pods are stuck in one of several states that indicate that they are not starting as expected, you can take several steps to diagnose the problem. | ||
|
|
||
| Common causes include the following states and indicate an associated action: | ||
|
|
||
| * `ImagePullBackOff`: Check image repository access and credentials. | ||
| * `CrashLoopBackOff`: Check the logs for application errors. | ||
| * `Pending`: Check resource availability and node capacity. | ||
| * `Init Container Failure`: Check RBAC permissions. | ||
|
|
||
| .Prerequisites | ||
|
|
||
| * You installed {mcpg}. | ||
| * You installed the {oc-first}. | ||
| * You configured a `Gateway` object. | ||
| * You configured an `HTTPRoute` object for the gateway. | ||
|
|
||
| .Procedure | ||
|
|
||
| . Check the pod status by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc get pods -n _<mcp_system>_ | ||
| ---- | ||
| + | ||
| Replace `_<mcp_system>_` with the name of the {mcpg} deployment that you are checking. | ||
|
|
||
| . Describe problem pods by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc describe pod -n _<mcp_system>_ _<pod_name>_ | ||
| ---- | ||
| + | ||
| * Replace `_<mcp_system>_` with the name of the {mcpg} deployment that you are checking. | ||
| * Replace `_<pod_name>_` with the name of the pod that you are checking. | ||
|
|
||
| . Check the pod logs by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc logs -n _<mcp_system>_ _<pod_name>_ | ||
| ---- | ||
| + | ||
| * Replace `_<mcp_system>_` with the name of the {mcpg} deployment that you are checking. | ||
| * Replace `_<pod_name>_` with the name of the pod that you are checking. |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,145 @@ | ||||||
| // Module included in the following assemblies: | ||||||
| // | ||||||
| // *mcp_gateway_config/mcp-gateway-troubleshooting.adoc | ||||||
|
|
||||||
| :_mod-docs-content-type: PROCEDURE | ||||||
| [id="proc-mcp-gateway-ts-requests-fail-or-bypass-router_{context}"] | ||||||
| = Troubleshooting requests failing or bypassing the router | ||||||
|
|
||||||
| [role="_abstract"] | ||||||
| When you are certain that an `MCPGatewayExtension` custom resource (CR) exists for your MCP server, but requests either fail or bypass the router, it might mean that the `EnvoyFilter` CR is not applied properly. You can take several steps to troubleshoot the problem. | ||||||
|
|
||||||
| The `EnvoyFilter` CR is automatically created in the `Gateway` CR's namespace by the MCP gateway controller component when an `MCPGatewayExtension` CR is `Ready`. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there an attribute for "MCP gateway controller"? |
||||||
|
|
||||||
| [IMPORTANT] | ||||||
| ==== | ||||||
| You can look closely at the `EnvoyFilter` during deep troubleshooting, but do not manually edit or delete the CR. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ==== | ||||||
|
|
||||||
| .Prerequisites | ||||||
|
|
||||||
| * You installed {mcpg}. | ||||||
| * You installed the {oc-first}. | ||||||
| * You configured a `Gateway` object. | ||||||
| * You configured an `HTTPRoute` object for the gateway. | ||||||
| * You registered an MCP server. | ||||||
|
|
||||||
| .Procedure | ||||||
|
|
||||||
| . Ensure that the `Gateway` object exists and is in the expected namespace by checking the general `Gateway` object configuration by using the following command: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Looks like you typically use "by running the following command". |
||||||
| + | ||||||
| [source,terminal] | ||||||
| ---- | ||||||
| $ oc get gateway -A | ||||||
| ---- | ||||||
| + | ||||||
| This command returns general information about all `Gateway` objects in the cluster. If the `Gateway` object you are troubleshooting does exist, the command returns the `gatewayClassName` is it using, whether or not it has an IP address or hostname assigned, and a `status`, such as `Ready`, `Programmed`, or `Pending`. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| . Verify that the `MCPGatewayExtension` CR is `Ready` by running the following command: | ||||||
| + | ||||||
| [source,terminal] | ||||||
| ---- | ||||||
| $ oc get mcpgatewayextension -A | ||||||
| ---- | ||||||
|
|
||||||
| . Verify that a `ReferenceGrant` CR exists if the `MCPGatewayExtension` CR is in a different namespace than the `Gateway` object by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc get referencegrant _<referencegrant_name>_ -n _<gateway_system>_ -o yaml | ||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<referencegrant_name>_` with the names of the `ReferenceGrant` CR. | ||||||
| * Replace `_<gateway_system>_` with the namespace where the `Gateway` object is applied. | ||||||
|
|
||||||
| . Check the `HTTPRoute` CR by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc describe httproute _<httproute_name>_ -n _<httproute_namespace>_ | ||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<httproute_namespace>_` with the namespace where the `HTTPRoute` CR is applied. | ||||||
| * Replace `_<httproute_name>_` with the names of the `HTTPRoute` CR. | ||||||
| * If the `HTTPRoute` CR is not `Accepted` by the `Gateway` object, the route is not programmed into Envoy, causing a `404`. | ||||||
| * The conditions `Status.Parents.Conditions: Accepted: True` and `Programmed: True` show that the route is correct. | ||||||
|
|
||||||
| . If any of the CRs you just checked are not `Ready`, check the controller logs for `EnvoyFilter` creation errors by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc logs -n _<mcp_system>_ deployment/mcp-gateway-controller | ||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<mcp_system>_` with the name of your MCP gateway deployment. | ||||||
| * This step verifies that the MCP controller is successfully generating the underlying Istio configurations. | ||||||
|
|
||||||
| . Check that the `EnvoyFilter` exists in the `Gateway` namespace by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc get envoyfilter -n _<gateway_namespace>_ -l app.kubernetes.io/managed-by=mcp-gateway-controller | ||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<gateway_namespace>_` with the namespace where the `Gateway` object is applied. | ||||||
|
|
||||||
| . Verify the `EnvoyFilter` configuration by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc describe envoyfilter -n _<gateway_namespace>_ -l app.kubernetes.io/managed-by=mcp-gateway-controller | ||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<gateway_namespace>_` with the namespace where the `Gateway` object is applied. | ||||||
| * The `workloadSelector` labels must match your `Gateway` pods, or your policies are bypassed. | ||||||
|
|
||||||
| . Compare the `EnvoyFilter` labels against your pod labels by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc get pods -n _<gateway_namespace>_ --show-labels | ||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<gateway_namespace>_` with the namespace where the `Gateway` object is applied. | ||||||
|
|
||||||
| . Identify the port that the `Gateway` object is configured to use by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc get gateway <gateway_name> -n _<gateway_namespace>_ -o jsonpath='{range .spec.listeners[*]}{.name}{": "}{.port}{"\n"}{end}' | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<gateway_name>_` with the name of the `Gateway` object. | ||||||
| * Replace `_<gateway_namespace>_` with the namespace where the `Gateway` object is applied. | ||||||
|
|
||||||
| . Verify the `EnvoyFilter` chain binding by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc exec -n _<gateway_namespace>_ deploy/_<gateway_name>_-istio -- curl -s localhost:15000/config_dump | jq '.configs[] | select(.["@type"] | contains("ListenersConfigDump")) | .dynamic_listeners[0.0.0.0_8080] | select(.name | contains("_<gateway_port>_"))' | ||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<gateway_name>_` with the name of the `Gateway` object. | ||||||
| * Replace `_<gateway_namespace>_` with the namespace where the `Gateway` object is applied. | ||||||
| * Replace `_<gateway_port>_` with the port your Gateway object is configured to use. | ||||||
| * Check for `envoy.filters.http.ext_proc`. | ||||||
|
|
||||||
| . Check the Istio gateway pod configuration by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc exec -n _<gateway_namespace>_ deploy/_<gateway_name>_-istio -- curl localhost:15000/config_dump | grep ext_proc | ||||||
| ---- | ||||||
| + | ||||||
| * Replace `_<gateway_namespace>_` with the namespace where the `Gateway` object is applied. | ||||||
| * Replace `_<gateway_name>_` with the names of the `Gateway` object. | ||||||
| * If this command returns empty, the `EnvoyFilter` CR is not active on this pod. Traffic is bypassing your policies. | ||||||
|
|
||||||
| . Restart the Istio gateway to force a configuration reload by running the following command: | ||||||
| + | ||||||
| [source,terminal,subs="+quotes"] | ||||||
| ---- | ||||||
| $ oc rollout restart deployment/_<gateway_name>_-istio -n _<gateway_namespace>_ | ||||||
| ---- | ||||||
| * Replace `_<gateway_namespace>_` with the namespace where the `Gateway` object is applied. | ||||||
| * Replace `_<gateway_name>_` with the names of the `Gateway` object. | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // *mcp_gateway_config/mcp-gateway-troubleshooting.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="proc-mcp-gateway-ts-traffic-not-reaching-backend-server_{context}"] | ||
| = Troubleshooting traffic not reaching the backend MCP server | ||
|
|
||
| [role="_abstract"] | ||
| When you are certain that an `HTTPRoute` custom resource (CR) exists for your application, but traffic is not reaching your backend MCP servers, you can take several steps to troubleshoot the problem. | ||
|
|
||
| On the client side, errors such as `401`, `403`, and `404` can indicate this situation. | ||
|
|
||
| .Prerequisites | ||
|
|
||
| * You installed {mcpg}. | ||
| * You installed the {oc-first}. | ||
| * You configured a `Gateway` object. | ||
| * You configured an `HTTPRoute` object for the gateway. | ||
| * You registered an MCP server. | ||
|
|
||
| .Procedure | ||
|
|
||
| . Check the `HTTPRoute` general custom resource (CR) status by running the following command: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc get httproute -A | ||
| ---- | ||
| + | ||
| * This command returns general information about all `HTTPRoute` objects in the cluster. | ||
| * Check for the `Accepted` condition in the `HTTPRoute` CR `status` fields. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should line 32 be a separate step? |
||
|
|
||
| . Check the full metadata and status history for one specific `HTTPRoute` object by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc describe httproute _<route_name>_ -n _<namespace>_ | ||
| ---- | ||
| + | ||
| * Replace `_<route_name>_` with the name of the `HTTPRoute` object. | ||
| * Replace `_<namespace>_` with the namespace where the `HTTPRoute` object is applied. | ||
| * Verify that the `hostnames` value in the `HTTPRoute` CR matches the gateway `Listener` CR `hostname`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should lines 43 - 45 be steps in the procedure instead of part of the list below step 2? |
||
| * If the `HTTPRoute` status shows `Accepted: False`, then the `Gateway` object is not using the route. | ||
| * If the condition is `ResolvedRefs: False:`, the route is accepted through the `Gateway` object, but it cannot find the backend MCP service. There might be either a mismatch in the CR `metadata.name:` field, or the MCP service is in a namespace the `Gateway` object cannot access. | ||
|
|
||
| . Verify the parent reference by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc get httproute _<route_name>_ -n _<namespace>_ -o yaml | grep -A 5 parentRefs | ||
| ---- | ||
| + | ||
| * Replace `_<route_name>_` with the name of the `HTTPRoute` object. | ||
| * Replace `_<namespace>_` with the namespace where the `HTTPRoute` object is applied. | ||
| * Ensure that the retrieved `parentRefs` value matches your `Gateway` CR name and namespace exactly. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should line 56 be a separate step? |
||
|
|
||
| . Check that the `allowedRoutes.namespaces` value in the `Gateway` CR allows the `HTTPRoute` namespace by running the following command: | ||
| + | ||
| [source,terminal,subs="+quotes"] | ||
| ---- | ||
| $ oc get gateway _<gateway_name>_ -n _<gateway_namespace>_ -o jsonpath='{range .spec.listeners[*]}{.name}{": "}{.allowedRoutes.namespaces.from}{"\n"}{end}' | ||
| ---- | ||
| + | ||
| * Replace `_<gateway_name>_` with the name of the `Gateway` object. | ||
| * Replace `_<gateway_namespace>_` with the namespace where the `Gateway` object is applied. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.