Feature RBAC maintenance by bakhterets · Pull Request #152 · stackmon/status-dashboard-v3

bakhterets · 2026-02-05T09:25:42Z

Implements Role-Based Access Control (RBAC) for maintenance event management as specified in specs/001-maintenance-rbac/spec.md.

Changes:

RBAC Implementation:

Three roles with hierarchy: Admin > Operator > Creator
Role resolution from JWT groups claim via environment variables (SD_RBAC_GROUP_ADMINS, SD_RBAC_GROUP_OPERATORS, SD_RBAC_GROUP_CREATORS)
Middleware for authentication and role-based authorization

Maintenance Workflow:

Operator: Create maintenance → pending_review status, can modify/cancel own events while pending
Operator: Approve events (pending_review → reviewed), cancel any pending event, create with planned status
Admin: Unrestricted access, bypass all status restrictions
Checker auto-transitions reviewed → planned

Field Visibility:

creator, contact_email, version fields visible only to authenticated users
Events with pending_review or reviewed status hidden from unauthenticated users

Optimistic Locking:

Version-based conflict detection for concurrent updates
Returns 409 Conflict on version mismatch

Documentation:

New docs/auth/rbac.md with roles, permissions, and workflow
Updated openapi.yaml with security schemes, RBAC description, 401/403/409 responses
Updated spec FR-022-1 to include reviewed status visibility

Testing:

Unit tests for RBAC functions, middleware, validation
Integration tests with real HMAC JWT authentication
Coverage for role permissions, ownership, version conflicts, status transitions

Updates:

Go version updated to 1.25
golangci-lint updated to v2.8.0

Testing:

# Unit tests:
go test ./internal/... -count=1

# Integration tests (requires Docker):
go test ./tests/... -count=1
Related
- Spec: specs/001-maintenance-rbac/spec.md

hierarcy;

403 resp;

v2 created_by added for events and statuses;

TestValidateStatusesPatches relocated to v2_validation_test.go; 06 migration changed; new errores for maintenance validation; Incident model changed; TestValidateEventCreationTimes added; TestValidateEventCreationImpact added;

implemented SetJWTClaims.

…ame HasGroup to HasAuthorizedGroup

…gration RBAC tests

go version updated

… from unauthenticated users

bakhterets · 2026-02-18T12:31:26Z

make version optionally for incidents and info

role operator has been raised to allow to perform all CRUD actions; spec.md revised.

rbac tests redesigned, main tests fixed.

bakhterets · 2026-03-02T18:11:35Z

Multi-IdP Authentication & RBAC Security Hardening (refactoring)

Summary

Implements a "secure by default" architecture by removing all authentication/RBAC bypass toggles and adding dual-IdP support (Keycloak RSA + Local HMAC). Introduces structured audit logging, Keycloak resilience with retry/fallback, JWT audience validation, and comprehensive test coverage.

Breaking Changes

Removed env var	Replacement
`SD_AUTH_DISABLED`	Removed — auth is always enforced
`SD_RBAC_DISABLED`	Removed — RBAC is always enforced
`SD_PROVIDER_DISABLED`	Removed — provider is always active

SD_RBAC_GROUP_ADMINS is now mandatory. Application fails to start without it.
SD_SECRET_KEY must be ≥ 32 characters if HMAC provider is used.

Changes by Category

Security

Removed bypass toggles: AuthenticationDisabled, RBAC.Disabled, Provider.Disabled — no path exists to skip auth/RBAC
Dual-IdP dispatch: parseToken() routes by JWT alg header — RS256 → Keycloak public key, HS* → HMAC secret
Minimum secret key length: SD_SECRET_KEY requires ≥ 32 characters, enforced in validateProviders()
Config validation: Validate() ensures at least one provider (Keycloak or HMAC) is configured; Admins group is mandatory

Observability

Structured audit logging: authAudit() emits SIEM-ready events with fields: event=auth_audit, action, result, idp_type (keycloak | local_hmac), username, reason
IdP detection: idpTypeFromMethod() maps JWT signing method to human-readable provider name
Auth handler audit: All 5 OAuth handlers (login, callback, token_retrieve, logout, refresh) log structured events with action/result

Middleware Refactoring

DRY helper: validateAndSetClaims() extracts shared token parsing, claims extraction, and context population — used by both AuthenticationMW (write) and SetJWTClaims (read)
Conditional routes: Keycloak OAuth routes (/auth/*) only registered when Keycloak is configured

Tests (new files)

internal/api/auth/auth_test.go — 24 tests: handler coverage, Keycloak JWKS fetch, retry/cache behavior
internal/api/auth/storage_test.go — 6 tests: CRUD, overwrite, concurrent access
tests/rbac_admin_only_test.go — 10 admin-only RBAC integration subtests

Documentation

docs/auth/authentication.md — Dual-IdP architecture, middleware chain, audit logging
docs/auth/rbac.md — Removed SD_RBAC_DISABLED references, added resilience/audit sections
docs/testing/integration-tests.md — Admin-only tests, unit test coverage section
openapi.yaml — Updated security scheme, dual-IdP description, mandatory ADMINS group
specs/001-maintenance-rbac/spec.md — Removed RBAC_DISABLED, updated FR-002a and FR-026

Test Coverage

Package	Before	After
`internal/conf`	16.7%	74.5%
`internal/api`	~40%	53.5%
`internal/api/auth`	0%	78.4%
`internal/api/rbac`	100%	100%

Public key caching removed.

version field assert for incidents (tests) removed

…valuation

bakhterets · 2026-05-20T08:02:09Z

fix(checker): handle optimistic locking race condition in maintenance processing

Problem

The background checker goroutine (runs every 2 minutes) reads maintenance incidents from the database and updates their status based on time progression (planned → in_progress → completed).

When ModifyIncident uses optimistic locking (WHERE version = ?), a race condition occurs:

Checker reads incident with version=N
A concurrent API request (user action) updates the same incident, incrementing version to N+1
Checker attempts UPDATE ... WHERE version=N → 0 rows affected → ErrVersionConflict
Previously: the entire checker cycle aborted, leaving all remaining maintenances unprocessed

Solution

Introduced retry-with-refetch pattern in a new processMaintenance() method:

On ErrVersionConflict, re-reads the incident from DB (fresh version) and retries (up to 3 attempts)
Errors on individual maintenances no longer abort the entire cycle — logged and skipped
No changes to ModifyIncident, RBAC, or the optimistic locking mechanism itself

Behavior Change

Scenario	Before	After
Version conflict	Entire checker cycle fails	Retry up to 3x with fresh data
One maintenance fails	All remaining skipped	Others still processed
No conflict	1 SELECT + N UPDATEs	Same (no overhead)
Conflict occurs	1 SELECT + 1 failed UPDATE	+1 SELECT per retry attempt

Code Changes

internal/checker/maintenance.go

Extracted processMaintenance() — retry loop with re-fetch on ErrVersionConflict
Extracted evaluateAndFixMntStatus() — calculates actual status and backfills missing status entries
Extracted trackActiveMaintenance() — appends to active maintenance slice for watermark tracking
Fixed linter issues: variable shadowing (govet), integer range loop (intrange), cognitive complexity (gocognit)

Execution Flow

CheckMaintenance()
 ├── db.GetMaintenances(lastMntID)              [SQL SELECT + Preload]
 └── for each maintenance:
      └── processMaintenance(mn)
           ├── attempt=0:
           │    ├── evaluateAndFixMntStatus(mn)  [in-memory]
           │    └── db.ModifyIncident(mn)        [SQL TX: UPDATE WHERE version=N]
           │         └── ErrVersionConflict? → continue
           └── attempt=1:
                ├── db.GetIncident(mn.ID)        [SQL SELECT — fresh version]
                ├── evaluateAndFixMntStatus(mn)  [recalculate]
                └── db.ModifyIncident(mn)        [SQL TX: UPDATE WHERE version=N+1] ✓

sgmv · 2026-05-22T07:42:27Z

fix(checker): handle optimistic locking race condition in maintenance processing

You can do it simpler. Because we will change this table, I won't use any db schemas changes. And you can just get the maintenance right before the patch and compare.

  The diff (against the pre-fix file bf9e9c7^:internal/checker/maintenance.go)

   --- a/internal/checker/maintenance.go
   +++ b/internal/checker/maintenance.go
   @@
          var activeMaintenances []uint
          for _, mn := range maintenances {
                  // Draft maintenances are not processed by the checker — they await
                  // manual approval (reviewed) or rejection (cancelled) via the API.
                  if mn.Status == event.MaintenancePendingReview {
                          continue
                  }

   +              // Refetch immediately before the read-modify-write. The bulk
   +              // GetMaintenances above is N items old by the time we reach item N;
   +              // using its preloaded state for the version check races concurrent
   +              // API edits. A single fresh read shrinks the race window from
   +              // "duration of the whole tick" to "one DB round-trip", which makes
   +              // ErrVersionConflict effectively unreachable without a retry loop.
   +              mn, err = ch.db.GetIncident(int(mn.ID))
   +              if err != nil {
   +                      ch.log.Error("failed to refetch maintenance, skipping",
   +                              zap.Uint("mntID", mn.ID), zap.Error(err))
   +                      continue
   +              }
   +
                     if mn.Status != actualStatus {
                          mn.Status = actualStatus
                          err = ch.db.ModifyIncident(mn)
                          if err != nil {
   -                              return err
   +                              ch.log.Error("failed to update maintenance, skipping",
   +                                      zap.Uint("mntID", mn.ID), zap.Error(err))
   +                              continue
                          }
                  }

bakhterets added 16 commits February 2, 2026 15:54

rbac resolver;

1a4b5b5

hierarcy;

RequireRole mw;

3ea42be

403 resp;

comments removed;

d6a460a

rbac: add maintenance status workflow with role-based permissions

ea3e607

checker: transition reviewed -> planned

17b1a8a

mw and v2 func refactored, golangci changed

d3aa143

database migration;

fd17157

v2 created_by added for events and statuses;

GET filter for unauthorized

ae010d8

validateMaintenanceCreation added;

da5feb6

TestValidateStatusesPatches relocated to v2_validation_test.go; 06 migration changed; new errores for maintenance validation; Incident model changed; TestValidateEventCreationTimes added; TestValidateEventCreationImpact added;

filter for pending review, contact email, created_by.

7727a78

comments removed from v2_auth_test

aafa6aa

versioning for PATCH (implementation)

ec08d92

negative test 409 added

21e1512

middleware auth refactoring

4f1a116

optimistic lock bug fixed

737d915

tests refactored

9b7d3e0

bakhterets self-assigned this Feb 10, 2026

sgmv and others added 7 commits February 10, 2026 14:31

fix events handler

bfc19ef

middleware, v2.go, db.go changed.

3a43b21

implemented SetJWTClaims.

update

3e7e147

refactor: inject RBAC service into handlers, simplify toAPIEvent, ren…

05956fa

…ame HasGroup to HasAuthorizedGroup

TestCalculateCurrentMntStatus added

2e37c6b

fix FR-003/FR-029 bugs, refactored auth middleware, add unit and inte…

357ebce

…gration RBAC tests

golangci-lint version updated

133b697

go version updated

bakhterets force-pushed the feature/rbac branch from 7b950bc to 133b697 Compare February 13, 2026 19:56

bakhterets added 2 commits February 16, 2026 18:05

AuthenticationMW fixed; rbac tests added;

e3ebc31

add RBAC documentation, update OpenAPI security, hide reviewed status…

5cbc720

… from unauthenticated users

bakhterets requested a review from sgmv February 17, 2026 16:23

bakhterets marked this pull request as ready for review February 17, 2026 16:24

bakhterets added 2 commits February 25, 2026 15:40

SD_RBAC_ADMINS -> SD_RBAC_GROUP_ADMINS renamed

88146d5

in_progress; pending_review renaming

453babb

bakhterets requested a review from sgmv February 25, 2026 15:42

bakhterets added 5 commits February 25, 2026 17:53

RBAC_ENABLED: true -> RBAC_DISABLED: false

e432224

docs updated

f22a8d6

sd_rolename renamed to rolename;

c039cff

role operator has been raised to allow to perform all CRUD actions; spec.md revised.

leaking data (v1, rss) fixed;

ed3ec39

rbac tests redesigned, main tests fixed.

openapi.yaml fixed; typo fixed

de10618

sgmv requested changes Mar 2, 2026

View reviewed changes

Comment thread internal/api/v2/v2.go

Comment thread internal/api/v2/v2.go Outdated

Comment thread internal/conf/conf.go Outdated

Comment thread internal/api/api.go Outdated

Comment thread internal/api/middleware.go

Comment thread internal/checker/maintenance.go Outdated

bakhterets added 3 commits March 2, 2026 14:25

v2.go, conf.go review's fixes

3d4620f

RBAC and auth refactored

0920162

RBAC and auth refactored

8e52ab9

bakhterets added 3 commits March 2, 2026 19:25

pending_review checker fixed

a1ab8e3

keycloak audience mapper (doc)

b117722

Audience validation removed;

4cc8272

Public key caching removed.

bakhterets requested a review from sgmv March 3, 2026 20:38

small fix

ac85590

sgmv marked this pull request as draft March 5, 2026 13:19

bakhterets and others added 4 commits March 16, 2026 16:35

Update docker-build-push-on-draft.yaml env section

6648626

wrong env fixed (#157)

65e6101

version field assert for incidents (tests) removed

inconsistencies between doc and code have been fixed

cf213ba

fix(rbac): support comma-separated groups in SD_RBAC_GROUPS_* vars

d9261fe

bakhterets force-pushed the feature/rbac branch from f5ca666 to d9261fe Compare May 6, 2026 07:39

refactor(maintenance): streamline maintenance processing and status e…

bf9e9c7

…valuation

bakhterets and others added 3 commits May 20, 2026 10:06

Merge branch 'main' into feature/rbac

9ff63dc

returned access control constants for database connections

7432739

fix: correct formatting of access control constants

19ee673

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature RBAC maintenance#152

Feature RBAC maintenance#152
bakhterets wants to merge 50 commits into
mainfrom
feature/rbac

bakhterets commented Feb 5, 2026 •

edited

Loading

Uh oh!

bakhterets commented Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bakhterets commented Mar 2, 2026 •

edited

Loading

Uh oh!

bakhterets commented May 20, 2026

Uh oh!

sgmv commented May 22, 2026 •

edited

Loading

fix(checker): handle optimistic locking race condition in maintenance processing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bakhterets commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes:

RBAC Implementation:

Maintenance Workflow:

Field Visibility:

Optimistic Locking:

Documentation:

Testing:

Updates:

Testing:

Uh oh!

bakhterets commented Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bakhterets commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking Changes

Changes by Category

Security

Observability

Middleware Refactoring

Tests (new files)

Documentation

Test Coverage

Uh oh!

bakhterets commented May 20, 2026

fix(checker): handle optimistic locking race condition in maintenance processing

Problem

Solution

Behavior Change

Code Changes

Execution Flow

Uh oh!

sgmv commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fix(checker): handle optimistic locking race condition in maintenance processing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bakhterets commented Feb 5, 2026 •

edited

Loading

bakhterets commented Mar 2, 2026 •

edited

Loading

sgmv commented May 22, 2026 •

edited

Loading