Metric Templates

Metric templates define reusable TSDB queries with variable substitution. Templates allow you to define a query pattern once and reuse it across multiple services and environments.

Configuration

metric_templates:
  api_latency:
    query: "stats.timers.api.$environment.$service.mean"
    op: "gt"
    threshold: 500

Properties

Property	Type	Required	Description
`query`	string	Yes	TSDB query with variable placeholders
`op`	string	Yes	Comparison operator: `lt`, `gt`, `eq`
`threshold`	number	Yes	Default threshold value

Variable Substitution

Templates support variable substitution using the $variable syntax. Variables are replaced with actual values when the template is used by a flag metric.

Built-in Variables

Variable	Description	Source
`$service`	Service name	From `flag_metrics[].service`
`$environment`	Environment name	From `flag_metrics[].environments[].name`

Example

Template definition:

metric_templates:
  api_success_rate:
    query: "asPercent(sumSeries(stats.counters.api.$environment.$service.2xx), sumSeries(stats.counters.api.$environment.$service.total))"
    op: "lt"
    threshold: 95

Flag metric using the template:

flag_metrics:
  - name: "success_rate_low"
    service: "user-service"
    template:
      name: "api_success_rate"
    environments:
      - name: "production"

Resulting query for production:

asPercent(sumSeries(stats.counters.api.production.user-service.2xx), sumSeries(stats.counters.api.production.user-service.total))

Comparison Operators

The op field defines how the query result is compared to the threshold:

Operator	Meaning	Flag Raised When	Use Case
`lt`	Less than	value < threshold	Success rates, availability
`gt`	Greater than	value > threshold	Latency, error counts
`eq`	Equal to	value == threshold	Exact match conditions

Examples

metric_templates:
  # Flag raised when success rate drops below 90%
  low_success_rate:
    query: "stats.gauges.api.$environment.$service.success_rate"
    op: "lt"
    threshold: 90

  # Flag raised when latency exceeds 500ms
  high_latency:
    query: "stats.timers.api.$environment.$service.mean"
    op: "gt"
    threshold: 500

  # Flag raised when error rate hits 100% (complete outage)
  total_failure:
    query: "stats.gauges.api.$environment.$service.error_rate"
    op: "eq"
    threshold: 100

Real-World Template Examples

API Success Rate

metric_templates:
  api_success_rate_low:
    query: "asPercent(sumSeries(stats.counters.openstack.api.$environment.*.$service.*.*.{2*,3*,404}.count), sumSeries(stats.counters.openstack.api.$environment.*.$service.*.*.attempted.count))"
    op: "lt"
    threshold: 90

API Latency

metric_templates:
  api_slow:
    query: "consolidateBy(aggregate(stats.timers.openstack.api.$environment.*.$service.*.*.*.mean, 'average'), 'average')"
    op: "gt"
    threshold: 300

Complete Service Down

metric_templates:
  api_down:
    query: "asPercent(sumSeries(stats.counters.openstack.api.$environment.*.$service.*.*.failed.count), sumSeries(stats.counters.openstack.api.$environment.*.$service.*.*.attempted.count))"
    op: "eq"
    threshold: 100

Error Rate

metric_templates:
  error_rate_high:
    query: "asPercent(sumSeries(stats.counters.api.$environment.$service.5xx), sumSeries(stats.counters.api.$environment.$service.total))"
    op: "gt"
    threshold: 5

Threshold Behavior

Thresholds defined in templates serve as defaults. Individual flag metrics can override the threshold per environment:

metric_templates:
  api_latency:
    query: "stats.timers.api.$environment.$service.mean"
    op: "gt"
    threshold: 500  # Default: 500ms

flag_metrics:
  - name: "slow_response"
    service: "api-gateway"
    template:
      name: "api_latency"
    environments:
      - name: "production"
        # Uses default 500ms threshold
      - name: "staging"
        threshold: 1000  # Override: 1000ms for staging

Best Practices

1. Name Templates Descriptively

# Good: Describes what the flag indicates
metric_templates:
  api_success_rate_below_sla:
    query: "..."

# Avoid: Vague naming
metric_templates:
  metric1:
    query: "..."

2. Group Related Templates

metric_templates:
  # API Performance
  api_latency_p99:
    query: "stats.timers.api.$environment.$service.p99"
    op: "gt"
    threshold: 1000

  api_latency_mean:
    query: "stats.timers.api.$environment.$service.mean"
    op: "gt"
    threshold: 300

  # API Reliability
  api_success_rate:
    query: "..."
    op: "lt"
    threshold: 99

  api_error_rate:
    query: "..."
    op: "gt"
    threshold: 1

3. Use Separate conf.d File

For large deployments, keep templates in a dedicated file:

conf.d/
└── 01-templates.yaml

# conf.d/01-templates.yaml
metric_templates:
  api_latency:
    query: "stats.timers.api.$environment.$service.mean"
    op: "gt"
    threshold: 500
  # ... more templates

Troubleshooting

Variable Not Substituted

If you see $service or $environment in your logs instead of actual values:

Verify variable syntax uses $ prefix (not ${...})
Check that the flag metric properly references the template

Query Returns No Data

Test the query directly in Graphite UI
Verify the metric path exists for the service/environment combination
Check time range in API request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric Templates

Configuration

Properties

Variable Substitution

Built-in Variables

Example

Comparison Operators

Examples

Real-World Template Examples

API Success Rate

API Latency

Complete Service Down

Error Rate

Threshold Behavior

Best Practices

1. Name Templates Descriptively

2. Group Related Templates

3. Use Separate conf.d File

Troubleshooting

Variable Not Substituted

Query Returns No Data

Related Documentation

FilesExpand file tree

metric-templates.md

Latest commit

History

metric-templates.md

File metadata and controls

Metric Templates

Configuration

Properties

Variable Substitution

Built-in Variables

Example

Comparison Operators

Examples

Real-World Template Examples

API Success Rate

API Latency

Complete Service Down

Error Rate

Threshold Behavior

Best Practices

1. Name Templates Descriptively

2. Group Related Templates

3. Use Separate conf.d File

Troubleshooting

Variable Not Substituted

Query Returns No Data

Related Documentation