This application exposes Prometheus-compatible metrics on a separate port from the main API server.
The metrics server runs on a separate port configured via the METRICS_PORT environment variable:
# Default: 9090
METRICS_PORT=9090Add this to your .env file. See .env.sample for reference.
The metrics are served at:
http://localhost:9090/metrics
(Replace 9090 with your configured METRICS_PORT if different)
The following default Node.js metrics are automatically collected:
- nodejs_version_info - Node.js version information
- process_cpu_user_seconds_total - Total user CPU time spent in seconds
- process_cpu_system_seconds_total - Total system CPU time spent in seconds
- nodejs_heap_size_total_bytes - Total heap size in bytes
- nodejs_heap_size_used_bytes - Used heap size in bytes
- nodejs_external_memory_bytes - External memory in bytes
- nodejs_heap_space_size_total_bytes - Total heap space size in bytes
- nodejs_heap_space_size_used_bytes - Used heap space size in bytes
- nodejs_eventloop_lag_seconds - Event loop lag in seconds
- nodejs_eventloop_lag_min_seconds - Minimum event loop lag
- nodejs_eventloop_lag_max_seconds - Maximum event loop lag
- nodejs_eventloop_lag_mean_seconds - Mean event loop lag
- nodejs_eventloop_lag_stddev_seconds - Standard deviation of event loop lag
- nodejs_eventloop_lag_p50_seconds - 50th percentile event loop lag
- nodejs_eventloop_lag_p90_seconds - 90th percentile event loop lag
- nodejs_eventloop_lag_p99_seconds - 99th percentile event loop lag
Duration of HTTP requests in seconds, labeled by:
method- HTTP method (GET, POST, etc.)route- Request route/pathstatus_code- HTTP status code
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds
Total number of HTTP requests, labeled by:
method- HTTP method (GET, POST, etc.)route- Request route/pathstatus_code- HTTP status code
Histogram of total GraphQL operation duration by operation name and type.
Labels:
operation_name- Name of the GraphQL operationoperation_type- Type of operation (query, mutation, subscription)
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds
Purpose: Identify slow API operations (P95/P99 latency).
Counter of failed GraphQL operations grouped by operation name and error class.
Labels:
operation_name- Name of the GraphQL operationerror_type- Type/class of the error
Purpose: Detect increased error rates and failing operations.
Histogram of resolver execution time per type, field, and operation.
Labels:
type_name- GraphQL type namefield_name- Field name being resolvedoperation_name- Name of the GraphQL operation
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5 seconds
Purpose: Find slow or CPU-intensive resolvers that degrade overall performance.
Histogram of MongoDB command duration by command, collection family, and database.
Labels:
command- MongoDB command name (find, insert, update, etc.)collection_family- Collection family name (extracted from dynamic collection names to reduce cardinality)db- Database name
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds
Purpose: Detect slow queries and high-latency collections.
Note on Collection Families: To reduce metric cardinality, dynamic collection names are grouped into families. For example:
events:projectId→eventsdailyEvents:projectId→dailyEventsrepetitions:projectId→repetitionsmembership:userId→membershipteam:workspaceId→team
This prevents metric explosion when dealing with thousands of projects, users, or workspaces, while still providing meaningful insights into collection performance patterns.
Counter of failed MongoDB commands grouped by command and error code.
Labels:
command- MongoDB command nameerror_code- MongoDB error code
Purpose: Track transient or persistent database errors.
You can test the metrics endpoint using curl:
curl http://localhost:9090/metricsOr run the provided test script:
./test-metrics.shIntegration tests for metrics are located in test/integration/cases/metrics.test.ts.
Run them with:
npm run test:integrationThe metrics implementation uses the prom-client library and consists of:
-
Metrics Module (
src/metrics/index.ts):- Initializes a Prometheus registry
- Configures default Node.js metrics collection
- Defines custom HTTP metrics (duration histogram and request counter)
- Registers GraphQL and MongoDB metrics
- Provides middleware for tracking HTTP requests
- Creates a separate Express app for serving metrics
-
GraphQL Metrics (
src/metrics/graphql.ts):- Implements Apollo Server plugin for tracking GraphQL operations
- Tracks operation duration, errors, and resolver execution time
- Automatically captures operation name, type, and field information
-
MongoDB Metrics (
src/metrics/mongodb.ts):- Implements MongoDB command monitoring
- Tracks command duration and errors
- Uses MongoDB's command monitoring events
- Extracts collection families from dynamic collection names to reduce cardinality
-
Integration (
src/index.ts,src/mongo.ts):- Adds GraphQL metrics plugin to Apollo Server
- Adds metrics middleware to the main Express app
- Enables MongoDB command monitoring on database clients
- Starts metrics server on a separate port
- Keeps metrics server isolated from main API traffic
To scrape these metrics with Prometheus, add the following to your prometheus.yml:
scrape_configs:
- job_name: 'hawk-api'
static_configs:
- targets: ['localhost:9090']Adjust the target host and port according to your deployment.