Metric templates define reusable TSDB queries with variable substitution. Templates allow you to define a query pattern once and reuse it across multiple services and environments.
metric_templates:
api_latency:
query: "stats.timers.api.$environment.$service.mean"
op: "gt"
threshold: 500| Property | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | TSDB query with variable placeholders |
op |
string | Yes | Comparison operator: lt, gt, eq |
threshold |
number | Yes | Default threshold value |
Templates support variable substitution using the $variable syntax. Variables are replaced with actual values when the template is used by a flag metric.
| Variable | Description | Source |
|---|---|---|
$service |
Service name | From flag_metrics[].service |
$environment |
Environment name | From flag_metrics[].environments[].name |
Template definition:
metric_templates:
api_success_rate:
query: "asPercent(sumSeries(stats.counters.api.$environment.$service.2xx), sumSeries(stats.counters.api.$environment.$service.total))"
op: "lt"
threshold: 95Flag metric using the template:
flag_metrics:
- name: "success_rate_low"
service: "user-service"
template:
name: "api_success_rate"
environments:
- name: "production"Resulting query for production:
asPercent(sumSeries(stats.counters.api.production.user-service.2xx), sumSeries(stats.counters.api.production.user-service.total))
The op field defines how the query result is compared to the threshold:
| Operator | Meaning | Flag Raised When | Use Case |
|---|---|---|---|
lt |
Less than | value < threshold | Success rates, availability |
gt |
Greater than | value > threshold | Latency, error counts |
eq |
Equal to | value == threshold | Exact match conditions |
metric_templates:
# Flag raised when success rate drops below 90%
low_success_rate:
query: "stats.gauges.api.$environment.$service.success_rate"
op: "lt"
threshold: 90
# Flag raised when latency exceeds 500ms
high_latency:
query: "stats.timers.api.$environment.$service.mean"
op: "gt"
threshold: 500
# Flag raised when error rate hits 100% (complete outage)
total_failure:
query: "stats.gauges.api.$environment.$service.error_rate"
op: "eq"
threshold: 100metric_templates:
api_success_rate_low:
query: "asPercent(sumSeries(stats.counters.openstack.api.$environment.*.$service.*.*.{2*,3*,404}.count), sumSeries(stats.counters.openstack.api.$environment.*.$service.*.*.attempted.count))"
op: "lt"
threshold: 90metric_templates:
api_slow:
query: "consolidateBy(aggregate(stats.timers.openstack.api.$environment.*.$service.*.*.*.mean, 'average'), 'average')"
op: "gt"
threshold: 300metric_templates:
api_down:
query: "asPercent(sumSeries(stats.counters.openstack.api.$environment.*.$service.*.*.failed.count), sumSeries(stats.counters.openstack.api.$environment.*.$service.*.*.attempted.count))"
op: "eq"
threshold: 100metric_templates:
error_rate_high:
query: "asPercent(sumSeries(stats.counters.api.$environment.$service.5xx), sumSeries(stats.counters.api.$environment.$service.total))"
op: "gt"
threshold: 5Thresholds defined in templates serve as defaults. Individual flag metrics can override the threshold per environment:
metric_templates:
api_latency:
query: "stats.timers.api.$environment.$service.mean"
op: "gt"
threshold: 500 # Default: 500ms
flag_metrics:
- name: "slow_response"
service: "api-gateway"
template:
name: "api_latency"
environments:
- name: "production"
# Uses default 500ms threshold
- name: "staging"
threshold: 1000 # Override: 1000ms for staging# Good: Describes what the flag indicates
metric_templates:
api_success_rate_below_sla:
query: "..."
# Avoid: Vague naming
metric_templates:
metric1:
query: "..."metric_templates:
# API Performance
api_latency_p99:
query: "stats.timers.api.$environment.$service.p99"
op: "gt"
threshold: 1000
api_latency_mean:
query: "stats.timers.api.$environment.$service.mean"
op: "gt"
threshold: 300
# API Reliability
api_success_rate:
query: "..."
op: "lt"
threshold: 99
api_error_rate:
query: "..."
op: "gt"
threshold: 1For large deployments, keep templates in a dedicated file:
conf.d/
└── 01-templates.yaml
# conf.d/01-templates.yaml
metric_templates:
api_latency:
query: "stats.timers.api.$environment.$service.mean"
op: "gt"
threshold: 500
# ... more templatesIf you see $service or $environment in your logs instead of actual values:
- Verify variable syntax uses
$prefix (not${...}) - Check that the flag metric properly references the template
- Test the query directly in Graphite UI
- Verify the metric path exists for the service/environment combination
- Check time range in API request
- Flag Metrics - Using templates in flag metrics
- Schema Reference - Complete property reference
- Examples - Working configuration samples