|
| 1 | +# PromQL Queries for Event Processing Statistics |
| 2 | + |
| 3 | +The following queries can be used to analyse performance of LocalStack's event processing capabilties. |
| 4 | + |
| 5 | +## Average Propagation Delay from Event Source to Poller |
| 6 | + |
| 7 | +The average amount of time a record has to wait before being processed during the last 5 minutes. A high propagation delay indicates that our event pollers are taking too long to ingest new events from an event source. |
| 8 | + |
| 9 | +``` |
| 10 | +rate(localstack_event_propagation_delay_seconds_sum[5m]) / rate(localstack_event_propagation_delay_seconds_count[5m]) |
| 11 | +``` |
| 12 | + |
| 13 | +**Example**: |
| 14 | + |
| 15 | + |
| 16 | +## Batch Efficiency |
| 17 | + |
| 18 | +A ratio showing how efficiently are our pollers retrieving records from an event source relative to how large their maximum batch size is. A higher number indicates that batch sizes could be increased. |
| 19 | + |
| 20 | +``` |
| 21 | +rate(localstack_batch_size_efficiency_ratio_sum[1m]) / rate(localstack_batch_size_efficiency_ratio_count[1m]) |
| 22 | +``` |
| 23 | + |
| 24 | +Example: |
| 25 | + |
| 26 | + |
| 27 | +## Records Per Poll |
| 28 | + |
| 29 | +The average number of records being pulled in by an event poller per minute. When used in conjunction with batch efficiency, you can interpret the performance of your batching configuration. |
| 30 | + |
| 31 | +``` |
| 32 | +rate(localstack_records_per_poll_sum[1m]) / rate(localstack_records_per_poll_count[1m]) |
| 33 | +``` |
| 34 | + |
| 35 | +Example: |
| 36 | + |
| 37 | + |
| 38 | + |
| 39 | +## In-Flight Events |
| 40 | + |
| 41 | +Gauges how many events are currently being processed by a target at a given point in time. If event processing is taking long, this is a good way of measuring back-pressure on the system. |
| 42 | + |
| 43 | +``` |
| 44 | +localstack_in_flight_events |
| 45 | +``` |
| 46 | + |
| 47 | +Example: |
| 48 | + |
| 49 | + |
| 50 | +## Event Processing Duration |
| 51 | + |
| 52 | +The average duration per minute that targets are processing events for. |
| 53 | + |
| 54 | +``` |
| 55 | +rate(localstack_process_event_duration_seconds_sum[1m]) / rate(localstack_process_event_duration_seconds_count[1m]) |
| 56 | +``` |
| 57 | + |
| 58 | +Example: |
| 59 | + |
| 60 | + |
| 61 | + |
| 62 | +## High Latency Event Processing |
| 63 | + |
| 64 | +Retrieve the 95th percentile of processing times in a 5m interval grouped by LocalStack service and operation. Useful for analysing the tail-latency of event processing since this is likely where bottlenecks in performance start to show. |
| 65 | + |
| 66 | +``` |
| 67 | +histogram_quantile(0.95, sum by(service, operation, le) (rate(localstack_request_processing_duration_seconds_bucket[5m]))) |
| 68 | +``` |
| 69 | + |
| 70 | +Example: |
| 71 | + |
| 72 | + |
| 73 | +## Empty Poll Responses |
| 74 | + |
| 75 | +The approximate number of empty poll requests in a 5 minute interval. |
| 76 | + |
| 77 | +``` |
| 78 | +rate(localstack_poll_miss_total[5m]) * 60 |
| 79 | +``` |
| 80 | + |
| 81 | +Example: |
| 82 | + |
| 83 | + |
| 84 | +## Number of LocalStack requests Processed |
| 85 | + |
| 86 | +The average number of request processed by the LocalStack gateway per minute. This is grouped by service type (i.e SQS) and operation type (i.e ReceiveMessage) |
| 87 | + |
| 88 | +``` |
| 89 | +sum by(service, operation) (rate(localstack_request_processing_duration_seconds_count[1m]) * 60) |
| 90 | +``` |
| 91 | + |
| 92 | +Example: |
| 93 | + |
| 94 | + |
| 95 | +## In-Flight Requests Against LocalStack Gateway |
| 96 | + |
| 97 | +Measures how many requests the Kinesis, SQS, DynamoDB, and Lambda services are currently processing in a given minute interval. Useful for seeing how hard a given service is currently being hit and the operation type. |
| 98 | + |
| 99 | +``` |
| 100 | +sum_over_time(localstack_in_flight_requests{service=~"dynamodb|kinesis|sqs|lambda"}[1m]) |
| 101 | +``` |
| 102 | + |
| 103 | +Example: |
| 104 | + |
0 commit comments