Skip to content

Commit 270ac3e

Browse files
Enhance LiteLLM dashboard with LLM Observability integration (DataDog#20953)
* Enhance LiteLLM dashboard with LLM Observability integration - Add comprehensive LLM Observability section with key performance metrics: - Total LLM requests counter from llm_observability data source - Response time metrics (p95 and p50 percentiles) - Model usage breakdown with interactive sunburst visualization - Update dashboard content with LLM Observability references and setup links - Add documentation links for auto-instrumentation and quickstart guides - Improve widget layout and positioning for better user experience - Clean up redundant configuration elements This enhancement bridges LiteLLM infrastructure monitoring with Datadog's LLM Observability platform, providing comprehensive visibility into both infrastructure metrics and application-level LLM performance. * Update LiteLLM dashboard content with native integration clarification - Add 'In addition to the native integration' to clarify the relationship between LiteLLM's built-in monitoring and Datadog's LLM Observability features - Improve user understanding of how both monitoring approaches complement each other * Finalize LiteLLM dashboard title - Remove '- draft' suffix from dashboard title - Dashboard is now ready for production use * Refine LiteLLM dashboard content and layout - Make 'Datadog LLM Observability' bold with direct link for better visibility - Capitalize and punctuate 'Get started today.' for consistency - Streamline LLM Observability section content for improved readability - Optimize widget positioning with refined y-coordinates - Enhance overall user experience and content clarity * Update litellm/assets/dashboards/litellm_overview.json Co-authored-by: domalessi <111786334+domalessi@users.noreply.github.com> --------- Co-authored-by: domalessi <111786334+domalessi@users.noreply.github.com>
1 parent c7b0903 commit 270ac3e

1 file changed

Lines changed: 262 additions & 7 deletions

File tree

litellm/assets/dashboards/litellm_overview.json

Lines changed: 262 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"author_name": "Datadog",
3-
"description": "[LiteLLM](https://www.litellm.ai/)\u00a0is a lightweight, open-source proxy and analytics layer for large language model (LLM) APIs. It enables unified access, observability, and cost control across multiple LLM providers.\n\nThis dashboard provides a comprehensive, real-time view of all LLM API activity flowing through the LiteLLM proxy layer. It is designed to help teams monitor performance, track key usage metrics, detect anomalies, and manage costs across multiple large language model providers.\n\n**Further reading:**\n\n- [Datadog LiteLLM Integration Documentation](https://docs.datadoghq.com/integrations/litellm/)\n- [LiteLLM Docs](https://docs.litellm.ai/docs/)\n- [LiteLLM Monitoring Docs](https://docs.litellm.ai/docs/proxy/prometheus)",
3+
"description": "[LiteLLM](https://www.litellm.ai/) is a lightweight, open-source proxy and analytics layer for large language model (LLM) APIs. It enables unified access, observability, and cost control across multiple LLM providers.\n\nThis dashboard provides a comprehensive, real-time view of all LLM API activity flowing through the LiteLLM proxy layer. It is designed to help teams monitor performance, track key usage metrics, detect anomalies, and manage costs across multiple large language model providers.\n\n**Further reading:**\n\n- [Datadog LiteLLM Integration Documentation](https://docs.datadoghq.com/integrations/litellm/)\n- [LiteLLM Docs](https://docs.litellm.ai/docs/)\n- [LiteLLM Monitoring Docs](https://docs.litellm.ai/docs/proxy/prometheus)",
44
"layout_type": "ordered",
55
"template_variables": [
66
{
@@ -53,7 +53,7 @@
5353
{
5454
"definition": {
5555
"background_color": "white",
56-
"content": "[LiteLLM](https://www.litellm.ai/) is a lightweight, open-source proxy and analytics layer for large language model (LLM) APIs. It enables unified access, observability, and cost control across multiple LLM providers.\n\nThis dashboard provides a comprehensive, real-time view of all LLM API activity flowing through the LiteLLM proxy layer. It is designed to help teams monitor performance, track key usage metrics, detect anomalies, and manage costs across multiple large language model providers.",
56+
"content": "[LiteLLM](https://www.litellm.ai/) is a lightweight, open-source proxy and analytics layer for large language model (LLM) APIs. It enables unified access, observability, and cost control across multiple LLM providers.\n\nThis dashboard provides a comprehensive, real-time view of all LLM API activity flowing through the LiteLLM proxy layer. It is designed to help teams monitor performance, track key usage metrics, detect anomalies, and manage costs across multiple large language model providers.\n\nIn addition to the native integration, use Datadog's LLM Observability to evaluate, troubleshoot, and iterate on your LLM agents or applications using LiteLLM. LLM Observability SDK can\u00a0__[automatically trace](https://docs.datadoghq.com/llm_observability/instrumentation/auto_instrumentation?tab=python)__\u00a0your LLM operations. __[Get started today.](https://docs.datadoghq.com/llm_observability/quickstart/?tab=python)__",
5757
"font_size": "14",
5858
"has_padding": true,
5959
"show_tick": false,
@@ -74,7 +74,7 @@
7474
{
7575
"definition": {
7676
"background_color": "white",
77-
"content": "**Further reading:**\n\n- [Datadog LiteLLM Integration Documentation](https://docs.datadoghq.com/integrations/litellm/)\n- [LiteLLM Docs](https://docs.litellm.ai/docs/)\n- [LiteLLM Monitoring Docs](https://docs.litellm.ai/docs/proxy/prometheus)",
77+
"content": "**Further reading:**\n\n- [Datadog LiteLLM Integration Documentation](https://docs.datadoghq.com/integrations/litellm/)\n- [LiteLLM Docs](https://docs.litellm.ai/docs/)\n- [LiteLLM Monitoring Docs](https://docs.litellm.ai/docs/proxy/prometheus)\n- [Get started with LiteLLM and Datadog LLM Observability](https://docs.datadoghq.com/llm_observability/instrumentation/auto_instrumentation/?tab=python#litellm)",
7878
"font_size": "16",
7979
"has_padding": true,
8080
"show_tick": false,
@@ -906,7 +906,6 @@
906906
},
907907
"palette": "dog_classic"
908908
},
909-
"time": {},
910909
"title": "Remaining API Key Budget",
911910
"title_align": "left",
912911
"title_size": "16",
@@ -2501,9 +2500,10 @@
25012500
"id": 7250476941150852,
25022501
"layout": {
25032502
"height": 11,
2503+
"is_column_break": true,
25042504
"width": 12,
25052505
"x": 0,
2506-
"y": 51
2506+
"y": 0
25072507
}
25082508
},
25092509
{
@@ -2787,7 +2787,7 @@
27872787
"height": 11,
27882788
"width": 12,
27892789
"x": 0,
2790-
"y": 62
2790+
"y": 11
27912791
}
27922792
},
27932793
{
@@ -2883,7 +2883,262 @@
28832883
"height": 6,
28842884
"width": 12,
28852885
"x": 0,
2886-
"y": 73
2886+
"y": 22
2887+
}
2888+
},
2889+
{
2890+
"definition": {
2891+
"background_color": "vivid_blue",
2892+
"layout_type": "ordered",
2893+
"show_title": true,
2894+
"title": "LLM Observability ",
2895+
"type": "group",
2896+
"widgets": [
2897+
{
2898+
"definition": {
2899+
"background_color": "blue",
2900+
"content": "[Datadog LLM Observability](https://docs.datadoghq.com/llm_observability/quickstart/?tab=python) enables you to experiment, troubleshoot, monitor, and evaluate LLM agents or applications. Get real-time visibility into inputs and outputs, errors, latency, token usage, and more, along with in-depth quality and security checks at every stage, including data retrieval, tool calls, and agent interactions.\n\nLLM Observability SDK can [automatically trace](https://docs.datadoghq.com/llm_observability/instrumentation/auto_instrumentation?tab=python) your LLM operations. Datadog's LLM Observability views integrate with LiteLLM metrics to provide visibility into your workflows.\n\nFor setup instructions and auto-instrumentation, check out the [LLM Observability Quickstart Guide](https://docs.datadoghq.com/llm_observability/quickstart/?tab=python).\n\nTo explore model-level usage, token attribution, cost analysis, and request tracing across teams and providers, visit the [LiteLLM Observability dashboard](https://app.datadoghq.com/dash/integration/llm_operational_insights).",
2901+
"font_size": "14",
2902+
"has_padding": true,
2903+
"show_tick": true,
2904+
"text_align": "center",
2905+
"tick_edge": "left",
2906+
"tick_pos": "50%",
2907+
"type": "note",
2908+
"vertical_align": "center"
2909+
},
2910+
"id": 1692340165130004,
2911+
"layout": {
2912+
"height": 2,
2913+
"width": 12,
2914+
"x": 0,
2915+
"y": 0
2916+
}
2917+
},
2918+
{
2919+
"definition": {
2920+
"autoscale": true,
2921+
"precision": 2,
2922+
"requests": [
2923+
{
2924+
"formulas": [
2925+
{
2926+
"formula": "query1"
2927+
}
2928+
],
2929+
"queries": [
2930+
{
2931+
"compute": {
2932+
"aggregation": "count"
2933+
},
2934+
"data_source": "llm_observability",
2935+
"group_by": [],
2936+
"indexes": [
2937+
"*"
2938+
],
2939+
"name": "query1",
2940+
"search": {
2941+
"query": "@event_type:span @meta.model_provider:* @meta.span.kind:llm"
2942+
}
2943+
}
2944+
],
2945+
"response_format": "scalar"
2946+
}
2947+
],
2948+
"timeseries_background": {
2949+
"type": "bars",
2950+
"yaxis": {}
2951+
},
2952+
"title": "Total LLM Requests",
2953+
"title_align": "left",
2954+
"title_size": "16",
2955+
"type": "query_value"
2956+
},
2957+
"id": 3902525824901928,
2958+
"layout": {
2959+
"height": 3,
2960+
"width": 4,
2961+
"x": 0,
2962+
"y": 2
2963+
}
2964+
},
2965+
{
2966+
"definition": {
2967+
"autoscale": true,
2968+
"precision": 2,
2969+
"requests": [
2970+
{
2971+
"formulas": [
2972+
{
2973+
"formula": "query1",
2974+
"number_format": {
2975+
"unit": {
2976+
"type": "canonical_unit",
2977+
"unit_name": "second"
2978+
}
2979+
}
2980+
}
2981+
],
2982+
"queries": [
2983+
{
2984+
"aggregator": "percentile",
2985+
"data_source": "metrics",
2986+
"name": "query1",
2987+
"query": "p95:ml_obs.span.duration{span_kind:llm}"
2988+
}
2989+
],
2990+
"response_format": "scalar"
2991+
}
2992+
],
2993+
"timeseries_background": {
2994+
"type": "area"
2995+
},
2996+
"title": "LLM Call Response Time (p95)",
2997+
"title_align": "left",
2998+
"title_size": "16",
2999+
"type": "query_value"
3000+
},
3001+
"id": 2848364602362541,
3002+
"layout": {
3003+
"height": 3,
3004+
"width": 4,
3005+
"x": 4,
3006+
"y": 2
3007+
}
3008+
},
3009+
{
3010+
"definition": {
3011+
"autoscale": true,
3012+
"precision": 2,
3013+
"requests": [
3014+
{
3015+
"formulas": [
3016+
{
3017+
"formula": "query1",
3018+
"number_format": {
3019+
"unit": {
3020+
"type": "canonical_unit",
3021+
"unit_name": "second"
3022+
}
3023+
}
3024+
}
3025+
],
3026+
"queries": [
3027+
{
3028+
"aggregator": "percentile",
3029+
"data_source": "metrics",
3030+
"name": "query1",
3031+
"query": "p50:ml_obs.span.duration{span_kind:llm}"
3032+
}
3033+
],
3034+
"response_format": "scalar"
3035+
}
3036+
],
3037+
"timeseries_background": {
3038+
"type": "area"
3039+
},
3040+
"title": "LLM Call Response Time (p50)",
3041+
"title_align": "left",
3042+
"title_size": "16",
3043+
"type": "query_value"
3044+
},
3045+
"id": 8997641233217794,
3046+
"layout": {
3047+
"height": 3,
3048+
"width": 4,
3049+
"x": 8,
3050+
"y": 2
3051+
}
3052+
},
3053+
{
3054+
"definition": {
3055+
"custom_links": [
3056+
{
3057+
"label": "View related spans in LLM Observability",
3058+
"link": "/llm/traces?query=@meta.model_name%3A{{@meta.model_name.value}}%20@event_type%3Aspan%20@parent_id%3A*%20@{{$ml_app}}%20{{$version}}&start={{timestamp_widget_start}}&end={{timestamp_widget_end}}&paused=false"
3059+
}
3060+
],
3061+
"hide_total": false,
3062+
"legend": {
3063+
"type": "table"
3064+
},
3065+
"requests": [
3066+
{
3067+
"formulas": [
3068+
{
3069+
"formula": "query2"
3070+
}
3071+
],
3072+
"queries": [
3073+
{
3074+
"compute": {
3075+
"aggregation": "count"
3076+
},
3077+
"data_source": "llm_observability",
3078+
"group_by": [
3079+
{
3080+
"facet": "@meta.model_provider",
3081+
"limit": 10,
3082+
"sort": {
3083+
"aggregation": "count",
3084+
"order": "desc"
3085+
}
3086+
},
3087+
{
3088+
"facet": "@meta.model_name",
3089+
"limit": 10,
3090+
"sort": {
3091+
"aggregation": "count",
3092+
"order": "desc"
3093+
}
3094+
}
3095+
],
3096+
"indexes": [
3097+
"*"
3098+
],
3099+
"name": "query2",
3100+
"search": {
3101+
"query": "@event_type:span @meta.model_provider:* @meta.span.kind:llm"
3102+
}
3103+
}
3104+
],
3105+
"response_format": "scalar",
3106+
"sort": {
3107+
"count": 500,
3108+
"order_by": [
3109+
{
3110+
"index": 0,
3111+
"order": "desc",
3112+
"type": "formula"
3113+
}
3114+
]
3115+
},
3116+
"style": {
3117+
"palette": "datadog16"
3118+
}
3119+
}
3120+
],
3121+
"title": "Model Usage",
3122+
"title_align": "left",
3123+
"title_size": "16",
3124+
"type": "sunburst"
3125+
},
3126+
"id": 2995771392271954,
3127+
"layout": {
3128+
"height": 5,
3129+
"width": 12,
3130+
"x": 0,
3131+
"y": 5
3132+
}
3133+
}
3134+
]
3135+
},
3136+
"id": 6876435950632807,
3137+
"layout": {
3138+
"height": 11,
3139+
"width": 12,
3140+
"x": 0,
3141+
"y": 28
28873142
}
28883143
}
28893144
]

0 commit comments

Comments
 (0)