You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/network_monitoring/cloud_network_monitoring/guide/detecting_a_network_outage.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,13 +20,13 @@ Use CNM metrics to see whether your source endpoint may be sending an enormous a
20
20
21
21
## CPU overconsumption of the underlying infrastructure
22
22
23
-
On the other hand, resource overconsumption of either the client or server endpoint could be the culprit of poor communication between the two. In the side panel **Processes** tab, scope your view to processes running on either the source or destination endpoints to spot any heavy software that may be degrading the performance of their underlying hosts or containers, thus reducing their ability to respond to network calls. In this case, in addition to knowing _whether_ an underlying host is running hot and causing application latency, you will want to know _why_ it is running hot. Grouping your process metrics by command gives you this granularity, since you can identify the particular workload that is consuming your CPU and memory resources.
23
+
On the other hand, resource overconsumption of either the client or server endpoint could be the culprit of poor communication between the two. In the side panel {{< ui >}}Processes{{< /ui >}} tab, scope your view to processes running on either the source or destination endpoints to spot any heavy software that may be degrading the performance of their underlying hosts or containers, thus reducing their ability to respond to network calls. In this case, in addition to knowing _whether_ an underlying host is running hot and causing application latency, you will want to know _why_ it is running hot. Grouping your process metrics by command gives you this granularity, since you can identify the particular workload that is consuming your CPU and memory resources.
24
24
25
25
{{< img src="network_performance_monitoring/guide/detecting_a_network_outage/cnm_processes_tab.png" alt="CPU overconsumption of the underlying infrastructure">}}
26
26
27
27
## Application errors in code
28
28
29
-
Network errors and latency can also be caused by client-side application errors. For instance, if your application is generating connections on loop unnecessarily, it could be overwhelming the endpoints that rely on it, leading to downstream application and network issues. To determine whether this is the case, look for application request errors in the **Traces**tab of a specific service in [CNM > DNS][1], or the **Network** tab of a specific trace in APM Traces.
29
+
Network errors and latency can also be caused by client-side application errors. For instance, if your application is generating connections on loop unnecessarily, it could be overwhelming the endpoints that rely on it, leading to downstream application and network issues. To determine whether this is the case, look for application request errors in the {{< ui >}}Traces{{< /ui >}} tab of a specific service in [CNM > DNS][1], or the {{< ui >}}Network{{< /ui >}} tab of a specific trace in APM Traces.
30
30
31
31
{{< img src="network_performance_monitoring/guide/detecting_a_network_outage/traces_2.png" alt="Application errors in code">}}
Copy file name to clipboardExpand all lines: content/en/network_monitoring/cloud_network_monitoring/guide/detecting_application_availability.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ CNM is designed to track traffic between entities, determine which resources are
23
23
24
24
To examine the a basic traffic flow between entities, take the following steps:
25
25
26
-
1. On the [Network Analytics page][1], set your **View clients as**and **View servers as** dropdown filters to group by `service` tags to examine a service-to-service flow. Here you can observe the basic traffic unit: a source IP communicating over a port to a destination IP on a port.
26
+
1. On the [Network Analytics page][1], set your {{< ui >}}View clients as{{< /ui >}} and {{< ui >}}View servers as{{< /ui >}} dropdown filters to group by `service` tags to examine a service-to-service flow. Here you can observe the basic traffic unit: a source IP communicating over a port to a destination IP on a port.
27
27
28
28
{{< img src="network_performance_monitoring/guide/detecting_network_insights/cnm_service_service.png" alt="CNM analytics page, grouping by service to service with Client and Server IP highlighted">}}
29
29
@@ -51,37 +51,37 @@ To analyze the cause of service latency, take the following steps:
51
51
52
52
3. Click one of the traffic paths on this page to open the side panel. The side panel provides more detailed telemetry to help you further debug your network dependencies.
53
53
54
-
4. While on the side panel view, check the **Flows** tab to determine if the communication protocol is TCP or UDP, and review metrics like RTT, Jitter, and packets sent and received. If you're investigating a high retransmit count, this information can help you identify the cause.
54
+
4. While on the side panel view, check the {{< ui >}}Flows{{< /ui >}} tab to determine if the communication protocol is TCP or UDP, and review metrics like RTT, Jitter, and packets sent and received. If you're investigating a high retransmit count, this information can help you identify the cause.
55
55
56
56
{{< img src="network_performance_monitoring/guide/detecting_network_insights/cnm_sidepanel_flows.png" alt="Side panel of a traffic flow, highlighting the Flows tab">}}
57
57
58
58
## Insight into network traffic
59
59
60
60
Datadog CNM consolidates relevant distributed traces, logs, and infrastructure data into a single view, allowing you to identify and trace issues back to the originating request from an application.
61
61
62
-
In the example below, check the **Traces** tab under Network Analytics to view distributed traces of requests between source and destination endpoints, which can help you pinpoint where application-level errors occur.
62
+
In the example below, check the {{< ui >}}Traces{{< /ui >}} tab under Network Analytics to view distributed traces of requests between source and destination endpoints, which can help you pinpoint where application-level errors occur.
63
63
64
64
To identify if an issue is an application or network issue, take can use the following steps:
65
65
66
-
1. Navigate to [**Infrastructure** > **Cloud Network** > **Analytics**][1].
67
-
2. In the **Summary** graphs, click a line of communication that has a lot of volume and high RTT time:
2. In the {{< ui >}}Summary{{< /ui >}} graphs, click a line of communication that has a lot of volume and high RTT time:
68
68
69
69
{{< img src="network_performance_monitoring/guide/detecting_network_insights/cnm_isolate_series.png" alt="CNM analytics page, clicking on a path with high RTT Time">}}
70
70
71
-
3. Click **Isolate this series**. This opens a page that allows you to observe the network traffic only on this line of communication.
72
-
4. While on this page, click into one of the network communications paths, then click the **Flows** tab to observe RTT time:
71
+
3. Click {{< ui >}}Isolate this series{{< /ui >}}. This opens a page that allows you to observe the network traffic only on this line of communication.
72
+
4. While on this page, click into one of the network communications paths, then click the {{< ui >}}Flows{{< /ui >}} tab to observe RTT time:
73
73
74
74
{{< img src="network_performance_monitoring/guide/detecting_network_insights/cnm_sidepanel_rtt.png" alt="CNM sidepanel, highlighting the RTT time column">}}
75
75
76
76
On this page, CNM correlates network metric round-trip time (RTT) with application request latency, to help identify if the issue is a network or application issue. In this particular example, observe that the RTT time is slightly high but has come down over time and needs to be investigated further.
77
77
78
-
5. On this same page, click the **Traces**tab and investigate the **Duration** column:
78
+
5. On this same page, click the {{< ui >}}Traces{{< /ui >}} tab and investigate the {{< ui >}}Duration{{< /ui >}} column:
79
79
80
80
{{< img src="network_performance_monitoring/guide/detecting_network_insights/cnm_traces_duration.png" alt="CNM sidepanel, highlighting the Traces tab and duration column">}}
81
81
82
82
Observe that although network latency (RTT) is high, the application request latency (Duration) is normal, so in this case, the issue is likely network-related, and there's no need to investigate the app code.
83
83
84
-
Conversely, *if network latency is stable but application latency (Duration) is high*, the problem likely stems from the app, and you can explore code-level traces by clicking on one of the service paths in the **Traces** tab to find the root cause, which takes you to the APM flame graph relative to this service:
84
+
Conversely, *if network latency is stable but application latency (Duration) is high*, the problem likely stems from the app, and you can explore code-level traces by clicking on one of the service paths in the {{< ui >}}Traces{{< /ui >}} tab to find the root cause, which takes you to the APM flame graph relative to this service:
85
85
86
86
{{< img src="network_performance_monitoring/guide/detecting_network_insights/cnm_apm_traces.png" alt="APM flame graph screenshot after clicking on a service from the CNM sidepanel traces tab">}}
87
87
@@ -93,7 +93,7 @@ For complex networks in large containerized environments, Datadog's Network Map
93
93
94
94
To identify if there might be a communication problem with your Kubernetes pods and their underlying services, perform the following steps:
95
95
96
-
1. On the [Network Map][2], set the **View**dropdown to `pod_name`, the **By**dropdown to "Client Availability Zone", and set the **Metric**dropdown to "Volume Sent" (this is the [metric][6] you want your edges to represent):
96
+
1. On the [Network Map][2], set the {{< ui >}}View{{< /ui >}} dropdown to `pod_name`, the {{< ui >}}By{{< /ui >}} dropdown to {{< ui >}}Client Availability Zone{{< /ui >}}, and set the {{< ui >}}Metric{{< /ui >}} dropdown to {{< ui >}}Volume Sent{{< /ui >}} (this is the [metric][6] you want your edges to represent):
Copy file name to clipboardExpand all lines: content/en/network_monitoring/cloud_network_monitoring/guide/manage_traffic_costs_with_cnm.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ When Datadog migrated to Kubernetes, the process of moving stateless services wa
36
36
37
37
{{< img src="network_performance_monitoring/guide/manage_traffic_costs_with_cnm/team_region_2.png" alt="Use the team tag to isolate traffic.">}}
38
38
39
-
4. To monitor costs from external traffic, scope your destination endpoints to public IPs using the **IP Type** facet.
39
+
4. To monitor costs from external traffic, scope your destination endpoints to public IPs using the {{< ui >}}IP Type{{< /ui >}} facet.
40
40
{{< img src="network_performance_monitoring/guide/manage_traffic_costs_with_cnm/scope_destination_points_2.png" alt="Use the IP type facet." style="width: 40%;">}}
41
41
42
42
Then group your destination by `domain` to break down external traffic volume by where it is going. Although you cannot install a Datadog Agent on public servers, Datadog can resolve IPs representing external and cloud endpoints to human-readable domain names.
You can edit your preferences using the **Filter traffic** button. In larger environments, Datadog recommends scoping to just the most significant traffic sources by moving the sliders to include only the highest-volume dependencies.
54
+
You can edit your preferences using the {{< ui >}}Filter traffic{{< /ui >}} button. In larger environments, Datadog recommends scoping to just the most significant traffic sources by moving the sliders to include only the highest-volume dependencies.
55
55
56
56
{{< img src="network_performance_monitoring/guide/manage_traffic_costs_with_cnm/filter-traffic_2.png" alt="Filter your traffic" style="width: 50%;">}}
57
57
58
58
## Graphing traffic costs
59
59
60
-
Datadog recommends tracking traffic volume metrics over time in dashboards and notebooks. You can graph traffic between any two endpoints using the same queries you would make on the [Cloud Network][3] page. To do this, create a **Timeseries Widget**and select the **Network** source from the dropdown menu.
60
+
Datadog recommends tracking traffic volume metrics over time in dashboards and notebooks. You can graph traffic between any two endpoints using the same queries you would make on the [Cloud Network][3] page. To do this, create a {{< ui >}}Timeseries Widget{{< /ui >}} and select the {{< ui >}}Network{{< /ui >}} source from the dropdown menu.
61
61
62
62
{{< img src="network_performance_monitoring/guide/manage_traffic_costs_with_cnm/timeseries_2.png" alt="Create a Timeseries Widget with Network metrics">}}
0 commit comments