Observability

This document covers logging, error tracking, and monitoring in the cloud portal.

Overview

The observability stack consists of:

Component	Purpose	Local	Production
Sentry	Error tracking	Optional	Required
OpenTelemetry	Distributed tracing	Jaeger	Grafana Tempo
Prometheus	Metrics collection	Local	Cloud
Grafana	Dashboards	Local	Cloud

Logging

Console Logging

Use structured logging in development:

// Simple logging
console.log('User logged in', { userId, orgId });

// Error logging
console.error('Failed to fetch zones', { error, params });

Log Levels

console.log - General information
console.warn - Warnings, non-critical issues
console.error - Errors, exceptions

Server-Side Logging

Server logs are captured by the Hono server and forwarded to OTEL:

// In loaders/actions
export async function loader({ request }: LoaderFunctionArgs) {
  console.log('Loading zones', { url: request.url });
  // ...
}

Error Tracking (Sentry)

Configuration

Set up in .env:

SENTRY_DSN=https://xxx@sentry.io/xxx
SENTRY_ORG=datum
SENTRY_PROJECT=cloud-portal

Client-Side Errors

Sentry automatically captures:

Unhandled exceptions
Promise rejections
React error boundaries
Network errors
API errors (via axios interceptors)

Sentry Module

The @/modules/sentry module provides centralized Sentry integration:

import {
  // Context - hierarchical enrichment
  setSentryUser,
  setSentryOrgContext,
  setSentryProjectContext,
  setSentryResourceContext,
  // Breadcrumbs - user journey tracking
  trackFormSubmit,
  trackFormSuccess,
  trackFormError,
  // Capture - error reporting
  captureError,
  captureApiError,
  captureMessage,
} from '@/modules/sentry';

Hierarchical Context

Context is set automatically at different levels:

// User context (set on login)
setSentryUser({ id: 'user-123', email: 'user@example.com' });

// Organization context (set in org layout)
setSentryOrgContext({ name: 'acme-corp', uid: 'org-abc' });

// Project context (set in project layout)
setSentryProjectContext({ name: 'my-project', uid: 'proj-xyz' });

// Resource context (set automatically from API responses)
setSentryResourceContext({
  kind: 'DNSZone',
  apiVersion: 'dns.networking.miloapis.com/v1alpha1',
  metadata: { name: 'example.com', namespace: 'default' },
});

Tags for Filtering

Filter issues in Sentry dashboard using these tags:

Tag	Description	Example
`user.id`	User identifier	`user-123`
`org.id`	Organization name	`acme-corp`
`project.id`	Project name	`my-project`
`resource.kind`	K8s resource kind	`DNSZone`
`resource.apiGroup`	API group	`dns.networking.miloapis.com`
`resource.type`	Resource type (from URL)	`dnszones`
`resource.name`	Resource name	`example.com`

API Error Capture

API errors are automatically captured with resource context:

// Automatic capture via axios interceptors
// Errors include: fingerprint, resource context, method, URL, status

// Manual capture
captureApiError({
  error: axiosError,
  method: 'GET',
  url: '/apis/dns.networking.miloapis.com/v1alpha1/dnszones/my-zone',
  status: 404,
  message: 'Not Found',
});

Error Grouping: Errors are grouped by resource type + API group + status code:

API 404: GET dnszones (instead of generic "AxiosError")
API 401: POST projects

Form Tracking

Forms automatically track user interactions as breadcrumbs:

// Add name prop to forms for better tracking
<Form.Root name="dns-zone-create" schema={schema} onSubmit={handleSubmit}>
  ...
</Form.Root>

Tracked events:

Form submit attempts
Validation errors (field names only, not values)
Submission success/failure

Performance Monitoring

Sentry tracks:

Page load times
Route transitions
API call durations
React component renders

Distributed Tracing (OpenTelemetry)

Architecture

Browser → Hono Server → Control Plane APIs
   │          │              │
   └──────────┴──────────────┘
              │
         Trace Context
              │
              ▼
     Jaeger (local) / Tempo (prod)

Configuration

# .env
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_SERVICE_NAME=cloud-portal
OTEL_ENABLED=true

Automatic Instrumentation

The following are automatically traced:

HTTP requests (incoming and outgoing)
Route handlers (loaders, actions)
Database queries
External API calls

Manual Spans

For custom tracing:

import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('cloud-portal');

async function complexOperation() {
  return tracer.startActiveSpan('complex-operation', async (span) => {
    try {
      span.setAttribute('custom.attribute', 'value');

      // Nested span
      await tracer.startActiveSpan('sub-operation', async (subSpan) => {
        await doSomething();
        subSpan.end();
      });

      return result;
    } finally {
      span.end();
    }
  });
}

Viewing Traces

Local (Jaeger):

Start observability stack: bun run dev:otel
Open http://localhost:16686
Select "cloud-portal" service
Search for traces

Finding a Trace:

By trace ID from logs
By operation name
By tags (user ID, route, etc.)

Metrics (Prometheus)

Available Metrics

Metric	Type	Description
`http_requests_total`	Counter	Total HTTP requests
`http_request_duration_seconds`	Histogram	Request latency
`http_requests_in_flight`	Gauge	Concurrent requests
`nodejs_heap_size_bytes`	Gauge	Memory usage

Custom Metrics

import { Counter, Histogram } from 'prom-client';

// Counter
const zoneCreations = new Counter({
  name: 'dns_zone_creations_total',
  help: 'Total DNS zones created',
  labelNames: ['org_id'],
});

zoneCreations.inc({ org_id: orgId });

// Histogram
const queryDuration = new Histogram({
  name: 'dns_query_duration_seconds',
  help: 'DNS query duration',
  buckets: [0.1, 0.5, 1, 2, 5],
});

const timer = queryDuration.startTimer();
await performQuery();
timer();

Metrics Endpoint

Metrics are exposed at /metrics:

curl http://localhost:3000/metrics

Local Observability Stack

Starting the Stack

# Start all observability services
bun run dev:otel

# Or with docker-compose
docker-compose -f docker-compose.otel.yml up -d

Services

Service	Port	URL
Jaeger UI	16686	http://localhost:16686
Prometheus	9090	http://localhost:9090
Grafana	3001	http://localhost:3001
OTEL Collector	4318	(HTTP receiver)
OTEL Collector	4317	(gRPC receiver)

Grafana Dashboards

Pre-configured dashboards:

Application Overview - Request rate, error rate, latency
Node.js Runtime - Memory, CPU, event loop
API Performance - Per-endpoint metrics

Default credentials: admin/admin

Stopping the Stack

docker-compose -f docker-compose.otel.yml down

Production Observability

Sentry Setup

Create project in Sentry
Configure DSN in deployment
Set up release tracking
Configure alerts

Grafana Cloud

Configure OTEL exporter endpoint
Set up Tempo for traces
Configure Prometheus remote write
Import dashboards

Alert Rules

Configure alerts for:

Error rate > threshold
P99 latency > threshold
Memory usage > threshold
Failed health checks

Debugging with Observability

Tracing a Request

Get trace ID from logs or network tab
Search in Jaeger/Tempo
Examine span timeline
Check span attributes and logs

Correlating Errors

Find error in Sentry
Get trace ID from error context
View full trace
Identify root cause

Performance Analysis

Open Grafana dashboard
Identify slow endpoints
View traces for slow requests
Check span breakdown

Filtering Sentry Issues

Use tags to filter issues in the Sentry dashboard:

# Find all errors for a specific organization
org.id:acme-corp

# Find errors in a specific project
project.id:my-project

# Find all DNS Zone errors
resource.type:dnszones

# Find HTTP Proxy errors with 404 status
resource.type:httpproxies status:404

# Find all errors for a resource API group
resource.apiGroup:dns.networking.miloapis.com

# Combine filters for specific customer issues
org.id:acme-corp project.id:production resource.kind:HTTPProxy

Debugging Customer Issues

Get customer org ID from support ticket
Filter in Sentry: org.id:<customer-org>
Check breadcrumbs for user journey (form submissions, API calls)
View resource context to see what resource they were working on
Correlate with trace ID for full request flow

Best Practices

DO

Add context to errors (user ID, org ID, resource ID)
Use structured logging
Add custom spans for complex operations
Set meaningful span names
Use captureApiError() for API errors (automatic fingerprinting)
Add name prop to forms for better tracking
Filter errors by resource tags in Sentry dashboard

DON'T

Log sensitive data (tokens, passwords)
Create too many custom metrics
Ignore high-cardinality labels
Skip error context
Use Sentry.captureException() directly for API errors (use captureApiError())
Track form field values (only track field names)

FilesExpand file tree

observability.md

Latest commit

History

observability.md

File metadata and controls

Observability

Overview

Logging

Console Logging

Log Levels

Server-Side Logging

Error Tracking (Sentry)

Configuration

Client-Side Errors

Sentry Module

Hierarchical Context

Tags for Filtering

API Error Capture

Form Tracking

Performance Monitoring

Distributed Tracing (OpenTelemetry)

Architecture

Configuration

Automatic Instrumentation

Manual Spans

Viewing Traces

Metrics (Prometheus)

Available Metrics

Custom Metrics

Metrics Endpoint

Local Observability Stack

Starting the Stack

Services

Grafana Dashboards

Stopping the Stack

Production Observability

Sentry Setup

Grafana Cloud

Alert Rules

Debugging with Observability

Tracing a Request

Correlating Errors

Performance Analysis

Filtering Sentry Issues

Debugging Customer Issues

Best Practices

DO

DON'T

Related Documentation