Skip to content

[docs] Recommended approach for metrics? #494

@matt-winfield

Description

@matt-winfield

We already have some good documentation on setting up server timings: https://github.com/epicweb-dev/epic-stack/blob/main/docs/server-timing.md But this only can give me information about requests that I make manually using the devtools. For production applications, we need some way to monitor if everything is working smoothly for end users, and identify potential bottlenecks.

Some info I think is essential to have:

  • HTTP response time across all routes (mean, P90, P99)
  • Overall HTTP status codes across all routes
  • HTTP response time per route (mean, P90, P99)
  • HTTP status code per route

It would also be useful to have:

  • Show most frequent SQL queries
  • Show SQL queries with longest execution time
  • Ability to define custom metrics (e.g. recording the "server timing" data mentioned previously somewhere that can be turned into a dashboard)

Fly.io provides a Promethius instance + Grafana dashboard (in preview at the moment):
https://fly.io/docs/reference/metrics/#managed-grafana-preview

This includes some basic information that covers average HTTP response times + status codes, however it doesn't yet break it down by route (which would be useful to work out what's causing issues). There is a way to expose data to Prometheus via an endpoint, so perhaps that's how extra metrics could be added?

It would be great to add some documentation on 1) Grafana and 2) How we could add some metrics not included by default (namely HTTP respone time/status by route, frequency/duration of SQL queries and custom metrics).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions