Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 87 additions & 2 deletions docs/METRICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Duration of HTTP requests in seconds, labeled by:
- `route` - Request route/path
- `status_code` - HTTP status code

Buckets: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds
Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds

#### http_requests_total (Counter)

Expand All @@ -64,6 +64,77 @@ Total number of HTTP requests, labeled by:
- `route` - Request route/path
- `status_code` - HTTP status code

### GraphQL Metrics

#### hawk_gql_operation_duration_seconds (Histogram)

Histogram of total GraphQL operation duration by operation name and type.

Labels:
- `operation_name` - Name of the GraphQL operation
- `operation_type` - Type of operation (query, mutation, subscription)

Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds

**Purpose**: Identify slow API operations (P95/P99 latency).

#### hawk_gql_operation_errors_total (Counter)

Counter of failed GraphQL operations grouped by operation name and error class.

Labels:
- `operation_name` - Name of the GraphQL operation
- `error_type` - Type/class of the error

**Purpose**: Detect increased error rates and failing operations.

#### hawk_gql_resolver_duration_seconds (Histogram)

Histogram of resolver execution time per type, field, and operation.

Labels:
- `type_name` - GraphQL type name
- `field_name` - Field name being resolved
- `operation_name` - Name of the GraphQL operation

Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5 seconds

**Purpose**: Find slow or CPU-intensive resolvers that degrade overall performance.

### MongoDB Metrics

#### hawk_mongo_command_duration_seconds (Histogram)

Histogram of MongoDB command duration by command, collection family, and database.

Labels:
- `command` - MongoDB command name (find, insert, update, etc.)
- `collection_family` - Collection family name (extracted from dynamic collection names to reduce cardinality)
- `db` - Database name

Buckets: 0.01, 0.05, 0.1, 0.5, 1, 5, 10 seconds

**Purpose**: Detect slow queries and high-latency collections.

**Note on Collection Families**: To reduce metric cardinality, dynamic collection names are grouped into families. For example:
- `events:projectId` → `events`
- `dailyEvents:projectId` → `dailyEvents`
- `repetitions:projectId` → `repetitions`
- `membership:userId` → `membership`
- `team:workspaceId` → `team`

This prevents metric explosion when dealing with thousands of projects, users, or workspaces, while still providing meaningful insights into collection performance patterns.

#### hawk_mongo_command_errors_total (Counter)

Counter of failed MongoDB commands grouped by command and error code.

Labels:
- `command` - MongoDB command name
- `error_code` - MongoDB error code

**Purpose**: Track transient or persistent database errors.

## Testing

### Manual Testing
Expand Down Expand Up @@ -98,11 +169,25 @@ The metrics implementation uses the `prom-client` library and consists of:
- Initializes a Prometheus registry
- Configures default Node.js metrics collection
- Defines custom HTTP metrics (duration histogram and request counter)
- Registers GraphQL and MongoDB metrics
- Provides middleware for tracking HTTP requests
- Creates a separate Express app for serving metrics

2. **Integration** (`src/index.ts`):
2. **GraphQL Metrics** (`src/metrics/graphql.ts`):
- Implements Apollo Server plugin for tracking GraphQL operations
- Tracks operation duration, errors, and resolver execution time
- Automatically captures operation name, type, and field information

3. **MongoDB Metrics** (`src/metrics/mongodb.ts`):
- Implements MongoDB command monitoring
- Tracks command duration and errors
- Uses MongoDB's command monitoring events
- Extracts collection families from dynamic collection names to reduce cardinality

4. **Integration** (`src/index.ts`, `src/mongo.ts`):
- Adds GraphQL metrics plugin to Apollo Server
- Adds metrics middleware to the main Express app
- Enables MongoDB command monitoring on database clients
- Starts metrics server on a separate port
- Keeps metrics server isolated from main API traffic

Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "hawk.api",
"version": "1.1.42",
"version": "1.2.0",
"main": "index.ts",
"license": "BUSL-1.1",
"scripts": {
Expand Down
3 changes: 2 additions & 1 deletion src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ import BusinessOperationsFactory from './models/businessOperationsFactory';
import schema from './schema';
import { graphqlUploadExpress } from 'graphql-upload';
import morgan from 'morgan';
import { metricsMiddleware, createMetricsServer } from './metrics';
import { metricsMiddleware, createMetricsServer, graphqlMetricsPlugin } from './metrics';

/**
* Option to enable playground
Expand Down Expand Up @@ -122,6 +122,7 @@ class HawkAPI {
process.env.NODE_ENV === 'production'
? ApolloServerPluginLandingPageDisabled()
: ApolloServerPluginLandingPageGraphQLPlayground(),
graphqlMetricsPlugin,
],
context: ({ req }): ResolverContextBase => req.context,
formatError: (error): GraphQLError => {
Expand Down
93 changes: 93 additions & 0 deletions src/metrics/graphql.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import client from 'prom-client';
import { ApolloServerPlugin, GraphQLRequestContext, GraphQLRequestListener } from 'apollo-server-plugin-base';
import { GraphQLError } from 'graphql';

/**
* GraphQL operation duration histogram
* Tracks GraphQL operation duration by operation name and type
*/
export const gqlOperationDuration = new client.Histogram({
name: 'hawk_gql_operation_duration_seconds',
help: 'Histogram of total GraphQL operation duration by operation name and type',
labelNames: ['operation_name', 'operation_type'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5, 10],
});

/**
* GraphQL operation errors counter
* Tracks failed GraphQL operations grouped by operation name and error class
*/
export const gqlOperationErrors = new client.Counter({
name: 'hawk_gql_operation_errors_total',
help: 'Counter of failed GraphQL operations grouped by operation name and error class',
labelNames: ['operation_name', 'error_type'],
});

/**
* GraphQL resolver duration histogram
* Tracks resolver execution time per type, field, and operation
*/
export const gqlResolverDuration = new client.Histogram({
name: 'hawk_gql_resolver_duration_seconds',
help: 'Histogram of resolver execution time per type, field, and operation',
labelNames: ['type_name', 'field_name', 'operation_name'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
});

/**
* Apollo Server plugin to track GraphQL metrics
*/
export const graphqlMetricsPlugin: ApolloServerPlugin = {
async requestDidStart(_requestContext: GraphQLRequestContext): Promise<GraphQLRequestListener> {
const startTime = Date.now();
let operationName = 'unknown';
let operationType = 'unknown';

return {
async didResolveOperation(ctx: GraphQLRequestContext): Promise<void> {
operationName = ctx.operationName || 'anonymous';
operationType = ctx.operation?.operation || 'unknown';
},

async executionDidStart(): Promise<GraphQLRequestListener> {
return {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
willResolveField({ info }: any): () => void {
const fieldStartTime = Date.now();

return (): void => {
const duration = (Date.now() - fieldStartTime) / 1000;

gqlResolverDuration
.labels(
info.parentType.name,
info.fieldName,
operationName
)
.observe(duration);
};
},
};
},

async willSendResponse(ctx: GraphQLRequestContext): Promise<void> {
const duration = (Date.now() - startTime) / 1000;

gqlOperationDuration
.labels(operationName, operationType)
.observe(duration);

// Track errors if any
if (ctx.errors && ctx.errors.length > 0) {
ctx.errors.forEach((error: GraphQLError) => {
const errorType = error.extensions?.code || error.name || 'unknown';

gqlOperationErrors
.labels(operationName, errorType as string)
.inc();
});
}
},
};
},
};
26 changes: 25 additions & 1 deletion src/metrics/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import client from 'prom-client';
import express from 'express';
import { gqlOperationDuration, gqlOperationErrors, gqlResolverDuration } from './graphql';
import { mongoCommandDuration, mongoCommandErrors } from './mongodb';

/**
* Create a Registry to register the metrics
Expand All @@ -19,7 +21,7 @@ const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5, 10],
registers: [ register ],
});

Expand All @@ -34,8 +36,24 @@ const httpRequestCounter = new client.Counter({
registers: [ register ],
});

/**
* Register GraphQL metrics
*/
register.registerMetric(gqlOperationDuration);
register.registerMetric(gqlOperationErrors);
register.registerMetric(gqlResolverDuration);

/**
* Register MongoDB metrics
*/
register.registerMetric(mongoCommandDuration);
register.registerMetric(mongoCommandErrors);

/**
* Express middleware to track HTTP metrics
* @param req - Express request object
* @param res - Express response object
* @param next - Express next function
*/
export function metricsMiddleware(req: express.Request, res: express.Response, next: express.NextFunction): void {
const start = Date.now();
Expand Down Expand Up @@ -71,3 +89,9 @@ export function createMetricsServer(): express.Application {

return metricsApp;
}

/**
* Export GraphQL metrics plugin and MongoDB metrics setup
*/
export { graphqlMetricsPlugin } from './graphql';
export { setupMongoMetrics, withMongoMetrics } from './mongodb';
Loading
Loading