Skip to content

Commit ffa57b9

Browse files
authored
[Hugging Face TGI] Add monitors (DataDog#21355)
* Add monitors * Fix integration tag
1 parent 7b59ac0 commit ffa57b9

4 files changed

Lines changed: 107 additions & 1 deletion

File tree

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
{
2+
"version": 2,
3+
"created_at": "2025-09-16",
4+
"last_updated_at": "2025-09-16",
5+
"title": "High queue size",
6+
"description": "This monitor tracks the number of requests waiting in the TGI queue. A high queue size indicates that requests are arriving faster than they can be processed, which can lead to increased latency and potential timeouts.",
7+
"definition": {
8+
"name": "[Hugging Face TGI] High queue size",
9+
"type": "query alert",
10+
"query": "avg(last_5m):avg:hugging_face_tgi.queue.size{*} > 50",
11+
"message": "Hugging Face TGI queue size is high (>50 requests).\n\nThis indicates:\n* Requests arriving faster than processing capacity\n* Potential for increased latency and timeouts\n* Need for scaling or optimization\n\n{{#is_alert}}\nCurrent queue size: {{value}} requests\n{{/is_alert}}\n\nActions:\n1. Consider scaling TGI instances\n2. Review batch size configuration\n3. Check for inefficient request patterns\n4. Monitor resource utilization",
12+
"tags": [
13+
"integration:hugging-face-tgi"
14+
],
15+
"options": {
16+
"thresholds": {
17+
"critical": 50,
18+
"warning": 30
19+
},
20+
"notify_audit": false,
21+
"require_full_window": false,
22+
"renotify_interval": 60,
23+
"include_tags": true,
24+
"evaluation_delay": 60,
25+
"escalation_message": "",
26+
"on_missing_data": "default",
27+
"new_host_delay": 300
28+
},
29+
"priority": 2
30+
},
31+
"tags": [
32+
"integration:hugging-face-tgi"
33+
]
34+
}
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
{
2+
"version": 2,
3+
"created_at": "2025-09-16",
4+
"last_updated_at": "2025-09-16",
5+
"title": "High request latency",
6+
"description": "This monitor tracks the average request duration for Hugging Face TGI. High latency can indicate performance bottlenecks such as model inference issues, resource contention, or inefficient batch processing.",
7+
"definition": {
8+
"name": "[Hugging Face TGI] High request latency",
9+
"type": "query alert",
10+
"query": "sum(last_5m):avg:hugging_face_tgi.request.duration.sum{*}.as_count() / avg:hugging_face_tgi.request.duration.count{*}.as_count() > 10",
11+
"message": "Hugging Face TGI request latency is high (>10s average).\n\nThis could indicate:\n* Model inference bottlenecks\n* Resource contention (CPU/GPU/memory)\n* Inefficient batch processing\n* Queue buildup\n\n{{#is_alert}}\nAverage latency: {{value}}s\n{{/is_alert}}\n\nCheck:\n1. TGI server resource utilization\n2. Queue size and batch processing metrics\n3. Model performance and configuration",
12+
"tags": [
13+
"integration:hugging-face-tgi"
14+
],
15+
"options": {
16+
"thresholds": {
17+
"critical": 10,
18+
"warning": 5
19+
},
20+
"notify_audit": false,
21+
"require_full_window": false,
22+
"renotify_interval": 60,
23+
"include_tags": true,
24+
"evaluation_delay": 300,
25+
"escalation_message": "",
26+
"on_missing_data": "show_and_notify_no_data",
27+
"new_host_delay": 300
28+
},
29+
"priority": 2
30+
},
31+
"tags": [
32+
"integration:hugging-face-tgi"
33+
]
34+
}
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
{
2+
"version": 2,
3+
"created_at": "2025-09-16",
4+
"last_updated_at": "2025-09-16",
5+
"title": "Slow token generation",
6+
"description": "This monitor tracks the mean time per token generation for Hugging Face TGI. Slow token generation indicates model inference performance issues, which directly impacts user experience and throughput.",
7+
"definition": {
8+
"name": "[Hugging Face TGI] Slow token generation",
9+
"type": "query alert",
10+
"query": "sum(last_5m):sum:hugging_face_tgi.request.mean_time_per_token.duration.sum{*}.as_count() / sum:hugging_face_tgi.request.mean_time_per_token.duration.count{*}.as_count() > 0.2",
11+
"message": "Hugging Face TGI token generation is slow (>200ms per token).\n\nThis indicates:\n* Model inference performance degradation\n* Resource constraints (GPU memory/compute)\n* Inefficient model configuration or parameters\n\n{{#is_alert}}\nMean time per token: {{value}}s\n{{/is_alert}}\n\nInvestigate:\n1. GPU utilization and memory usage\n2. Model configuration and quantization settings\n3. Batch size optimization\n4. Temperature and sampling parameters",
12+
"tags": [
13+
"integration:hugging-face-tgi"
14+
],
15+
"options": {
16+
"thresholds": {
17+
"critical": 0.2,
18+
"warning": 0.1
19+
},
20+
"notify_audit": false,
21+
"require_full_window": false,
22+
"renotify_interval": 60,
23+
"include_tags": true,
24+
"evaluation_delay": 300,
25+
"escalation_message": "",
26+
"on_missing_data": "show_and_notify_no_data",
27+
"new_host_delay": 300
28+
},
29+
"priority": 2
30+
},
31+
"tags": [
32+
"integration:hugging-face-tgi"
33+
]
34+
}

hugging_face_tgi/manifest.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,11 @@
4141
"text-generation-router"
4242
]
4343
},
44-
"monitors": {},
44+
"monitors": {
45+
"High request latency": "assets/monitors/request_latency_high.json",
46+
"High queue size": "assets/monitors/queue_size_high.json",
47+
"Slow token generation": "assets/monitors/token_generation_slow.json"
48+
},
4549
"saved_views": {}
4650
},
4751
"author": {

0 commit comments

Comments
 (0)