Skip to content

Add notification delivery interval config and server timeouts#283

Merged
biglyan merged 3 commits into
mainfrom
claude/sharp-ramanujan-bQhTu
May 23, 2026
Merged

Add notification delivery interval config and server timeouts#283
biglyan merged 3 commits into
mainfrom
claude/sharp-ramanujan-bQhTu

Conversation

@biglyan
Copy link
Copy Markdown
Contributor

@biglyan biglyan commented May 23, 2026

Summary

This PR introduces configuration for notification delivery loop timing and adds HTTP server-level timeout and body size limits to improve operational control and security.

Key changes:

  • Add NotificationDeliveryInterval config to decouple notification delivery timing from alert evaluation
  • Add WORKERFLEET_NOTIFICATION_DELIVERY_INTERVAL environment variable support
  • Add HandlerTimeout and MaxBodyBytes to ServerConfig with environment variable overrides
  • Wire body limit and timeout middleware into the HTTP server
  • Refactor test helpers to use newRuntime() for cleaner test setup
  • Remove WorkerDetail type alias in favor of direct WorkerView usage

Type of change

  • Feature
  • Refactor (no behavior change)

Scope and boundaries

Touched (directories/packages):

  • internal/app/config.go — Runtime and server configuration
  • internal/app/server_config.go — Server configuration loading
  • internal/app/app.go — Middleware wiring
  • internal/app/types.go — Type alias removal
  • internal/app/service.go — Return type consistency
  • internal/app/*_test.go — Test refactoring to use newRuntime()
  • internal/handler/ — Test stub updates
  • env.example — Configuration documentation

Out of scope:

  • Core business logic
  • Storage layer
  • Worker registration/heartbeat handlers

Safe refactor zone declaration

  • Zone A — Free Refactor (internal/)

Plan

Objective:

  • Allow independent configuration of notification delivery loop timing
  • Add HTTP server-level protections (timeouts, body size limits)
  • Improve test maintainability by centralizing runtime setup

Approach:

  1. Add NotificationDeliveryInterval field to RuntimeConfig with default 30s
  2. Update notificationDeliveryLoopSettings() to use the new field instead of AlertEvaluationInterval
  3. Add HandlerTimeout and MaxBodyBytes to ServerConfig with sensible defaults
  4. Wire bodylimit and timeout middleware from plumego
  5. Refactor alert and loop tests to use newRuntime() helper instead of full Bootstrap()
  6. Remove unused WorkerDetail type alias

Public API impact:

  • Yes — ServerConfig struct gains two new fields (backward compatible with defaults)
  • RuntimeConfig.NotificationDeliveryInterval added (new field, backward compatible)
  • WorkerDetail type alias removed (internal type, no public impact)

Risks / invariants:

  • Notification delivery loop must continue to work correctly with independent timing
  • HTTP server must respect new timeout and body size limits without breaking legitimate requests
  • Existing tests must pass with refactored setup

Verification

Required checks

  • go test ./... — All existing tests pass with refactored test setup
  • Configuration loading — New environment variables parsed correctly in config_test.go
  • Loop settings — TestRuntimeLoopSettingsRemainIndependent verifies notification delivery uses correct interval
  • Server config — LoadServerConfig tests verify timeout and body bytes parsing

Evidence:

  • TestLoadConfigParsesRuntimeSettings validates WORKERFLEET_NOTIFICATION_DELIVERY_INTERVAL parsing
  • TestRuntimeLoopSettingsRemainIndependent confirms notification delivery loop uses NotificationDeliveryInterval (14s) not AlertEvaluationInterval (13s)
  • TestNewRuntime validates refactored runtime initialization
  • TestSweepWorkerStatusesMarksExpiredHeartbeatOffline and alert tests use simplified newRuntime() setup
  • Server config tests cover new timeout and body bytes fields with validation

https://claude.ai/code/session_01PKdanD7JDusUQdTVCgF3Cv

claude added 3 commits May 23, 2026 15:07
… add bodylimit+timeout middleware

- newRuntime: wire into tests that seed store directly (TestSweepWorkerStatuses,
  TestEvaluateAndNotifyAlerts, TestNotificationOutboxRetries); add TestNewRuntime
  to cover the constructor path explicitly

- Bug: notificationDeliveryLoopSettings was reusing AlertEvaluationInterval,
  making WORKERFLEET_ALERT_EVALUATION_INTERVAL silently control both loops;
  add NotificationDeliveryInterval field with independent env var
  WORKERFLEET_NOTIFICATION_DELIVERY_INTERVAL (default 30s)

- Security: add MaxBodyBytes (default 2 MiB) and HandlerTimeout (default 30s)
  to ServerConfig; wire bodylimit.Middleware + timeout.Middleware in newCoreApp,
  matching the standard-service middleware chain

- Remove WorkerDetail = WorkerView no-op type alias; replace all references with
  WorkerView directly

https://claude.ai/code/session_01PKdanD7JDusUQdTVCgF3Cv
…Rebalance

The atomic "drop if running" pattern in triggerRebalance caused dropped
rebalance requests when multiple consumers joined quickly. A rebalance
triggered by c3 could be silently discarded while the rebalance for c1/c2
was running, leaving c3 with zero partitions assigned.

Replace rebalancing atomic.Bool with a mutex-protected running+pending pair.
When a rebalance is already in progress the incoming request sets a pending
flag; the running goroutine re-runs the rebalance loop before releasing
ownership, guaranteeing every join/leave is eventually reflected.

https://claude.ai/code/session_01PKdanD7JDusUQdTVCgF3Cv
@biglyan biglyan merged commit c1b46f0 into main May 23, 2026
9 checks passed
@biglyan biglyan deleted the claude/sharp-ramanujan-bQhTu branch May 23, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants