Skip to content

feat: Support version history and rollback for traffic rules#1477

Open
mochengqian wants to merge 12 commits into
apache:developfrom
mochengqian:feat/Support-version-history-and-rollback-for-traffic-rules
Open

feat: Support version history and rollback for traffic rules#1477
mochengqian wants to merge 12 commits into
apache:developfrom
mochengqian:feat/Support-version-history-and-rollback-for-traffic-rules

Conversation

@mochengqian
Copy link
Copy Markdown
Contributor

@mochengqian mochengqian commented May 22, 2026

Closes #1473.
Please view the detailed test report at https://notion-next-iota-amber-43.vercel.app/article/dubbo-admin-reviewer-report

1. What this PR delivers

Dubbo Admin currently saves traffic-rule edits with a destructive overwrite. There is no history, no audit trail for upstream-registry pushes, and no way to recover from a bad edit. This PR adds an immutable release ledger plus one-click rollback for the three governor-managed rule kinds — Condition Route, Tag Route, and Dynamic Config (Configurator).

The approach was picked after evaluating five candidate solutions during design:

Solution This PR?
A Immutable release ledger in the admin's local DB. Every publish, upstream push and rollback is appended as a row; rollback re-publishes an old snapshot, never overwriting. Yes — what this PR ships.
B Two-phase publish with a reviewer / approver workflow before the change reaches the registry. Deferred — needs concrete user demand before designing the state machine.
C Structural JSON diff in the admin UI. Folded into A — implemented via Monaco diff.
D GitOps: rules persisted in Git, the registry pulls from Git. Deferred — a major infrastructure shift, not v1-scale.
E Pure event sourcing — derive current rule state by replaying an event log. Deferred — orthogonal to the current store model.

User-visible capabilities

  • Each rule-detail page gets a History drawer with source (ADMIN / UPSTREAM / ROLLBACK / BOOTSTRAP), operator, timestamp and reason.
  • Diff against current uses Monaco diff to compare any past version with the live one.
  • Rollback takes a required reason; the old snapshot is re-published through the normal governor path and recorded as a new ROLLBACK row. The old row is never mutated.
  • Concurrent editors get a sticky 409 Reload notification instead of a silent overwrite.
  • Default retention is 5 versions per rule (configurable via versioning.maxVersionsPerRule).

2. How it works

UI publish ┐
           ├─► ResourceManager → governor → registry write ──┐
ZK push   ┘                                                  │
                                                             ▼
                                              EventBus.ResourceChangedEvent
                                                             │
                                                             ▼
                                            RuleVersionSubscriber (one per kind)
                                                             │
                                  ┌── admin hint hit ────────┤
                                  │       (TTL 30s)          │
                                  ▼                          ▼
                          source = ADMIN/ROLLBACK    source = UPSTREAM
                                  │                          │  author read from
                                  │                          │  event.Context()["source-registry"]
                                  └─────► InsertVersion ◄────┘
                                              │
                                              ▼
                          rule_version (immutable) + rule_version_meta (current)
                                              │
                                              ▼
                                  trim-on-insert by maxVersionsPerRule
  • Single recording site = the event-bus subscriber. UI publishes and upstream pushes both converge here, so there is no double-write risk.
  • ADMIN vs UPSTREAM is told apart by a TTL'd in-process hint registry keyed by (kind, resourceKey, contentHash).
  • Rollback = re-publish the old snapshot. It flows through governor → registry → echo event, producing a new row with source=ROLLBACK and rolled_back_from_id pointing at the source version. The original row is never modified.
  • version_no is monotonic and not reused after trim — users always see strictly increasing version labels.
  • A 2 s coalesce window collapses upstream-push bursts so the ledger stays readable.
  • On graceful shutdown the component flushes all pending coalesce buckets so the last window is never lost.

3. Scope

In this PR

  • Backend

    • New pkg/core/versioning package: types, memory + GORM stores, service, hint registry, subscriber, component.
    • New pkg/config/versioning config block with sane defaults and validation.
    • Bootstrap scan that emits a BOOTSTRAP baseline row for every existing rule (idempotent).
    • 4 new REST endpoints × 3 rule kinds.
    • Optional expectedVersionId precondition on existing PUT / POST / DELETE.
    • events.SourceRegistryContextKey constant + ZKConfigEventSubscriber emits registry context.
  • Frontend

    • ui-vue3/src/views/traffic/_shared/: RuleHistoryDrawer.vue, RuleHistoryPanel.vue, RuleDiffEditor.vue, ruleVersion.ts.
    • History panel + Monaco diff + rollback modal wired into all three rule pages.
    • 409 conflicts surfaced through notification with duration: 0 and a Reload action.
  • Tests

    • 23 new unit tests across normalize / hint TTL / memory & GORM store / subscriber sources / rollback paths / config validation / ZK delete nil-guard.
    • One end-to-end rollback drill (e2e_rollback_drill_test.go) covering bootstrap → admin edit → upstream push → rollback → overflow trim → audit-chain assertions.

Out of scope for this PR

Item Status Plan
AffinityRoute integration Tracked Will be done by @mochengqian as an immediate follow-up. AffinityRoute is not currently in governor.RuleResourceKinds, so its write path bypasses the governor that this ledger hooks into. Once it is brought onto the governor path, all four kinds will share the same versioning code.

4. Locked design decisions

# Choice Rationale
1 Rollback reason is required Most important audit-trail field; enforced front + back.
2 2 s upstream coalesce window Collapses ZK push bursts; imperceptible for UI publish.
3 AffinityRoute deferred Not in governor.RuleResourceKinds; out of scope.
4 Monaco diff Reuses an editor dependency already shipped.
5 Hard delete on trim Matches the "keep last 5" product semantics; keeps tables small.
6 expectedVersionId is best-effort Documented limitation: weak CAS, not strict. The lock manager prevents store-level corruption; the ledger meta is updated asynchronously by the subscriber, so two writes that race within one coalesce window can both pass the precondition. Sufficient for v1; can be upgraded later by moving meta updates onto the admin path.

5. API surface

New endpoints (one set per rule kind: condition-rule, tag-rule, configurator)

GET    /api/v1/{kind}/:ruleName/versions
GET    /api/v1/{kind}/:ruleName/versions/:versionId
GET    /api/v1/{kind}/:ruleName/versions/:versionId/diff?against=current|<id>
POST   /api/v1/{kind}/:ruleName/versions/:versionId/rollback
       body: { "reason": "<required>", "expectedVersionId": <optional int64> }

Existing endpoints (backward-compatible)

PUT / POST / DELETE /api/v1/{kind}/:ruleName accept an optional ?expectedVersionId=<id>.
Omit → unchanged behavior. Provide → mismatch returns:

HTTP/1.1 409 Conflict
Content-Type: application/json
{"code":"VERSION_CONFLICT","currentVersionId":5,"message":"rule version conflict"}

Feature flag

With versioning.enabled=false, all new endpoints return 503 + {"code":"FEATURE_DISABLED"}. Existing CRUD is completely untouched.


6. Database

Two new tables, created via AutoMigrate on the existing GORM connection (MySQL / Postgres). With store.type=memory a pure-Go in-memory implementation is used — no config changes needed.

CREATE TABLE rule_version (
    id, rule_kind, mesh, resource_key, rule_name,
    version_no,            -- monotonic; not reused after trim
    content_hash,          -- sha256(canonical spec json)
    spec_json,
    source,                -- ADMIN | UPSTREAM | ROLLBACK | BOOTSTRAP
    operation,             -- CREATE | UPDATE | DELETE
    author, reason,
    rolled_back_from_id,
    created_at,
    UNIQUE (rule_kind, resource_key, version_no),
    INDEX  (rule_kind, resource_key, created_at DESC),
    INDEX  (rule_kind, content_hash)
);

CREATE TABLE rule_version_meta (
    rule_kind, resource_key,    -- PK
    current_version,            -- nullable; NULL when the rule is deleted
    last_version_no,            -- monotonic
    updated_at
);

Upgrade path: on first start, scan every existing rule and write one source=BOOTSTRAP baseline row. Idempotent — safe to re-run.


7. Verification

Automated tests

go test ./pkg/core/versioning/... \
        ./pkg/config/versioning/... \
        ./pkg/console/handler/... \
        ./pkg/console/service/... \
        ./pkg/store/... \
        ./pkg/core/discovery/subscriber/... \
        ./pkg/core/manager/...

All green. go vet ./pkg/... reports no new warnings.

Frontend build

cd ui-vue3 && npm install --legacy-peer-deps && npm run build

Passes. npm run type-check still reports the pre-existing repo-wide TypeScript debt (home/index.vue, AuthUtil.ts, GrafanaPage, etc.) — that count does not increase under this branch.

Manual smoke

A full bootstrap → admin edit → diff → rollback → optimistic-lock 409 → retention cap → upstream ZK push → cross-rule rollback sweep was run end-to-end against MySQL + ZooKeeper. Evidence (HTTP transcripts, JSON ledger dumps, UI screenshots) is in the PR comments.


8. Upgrade and rollback

  • Upgrade — deploy as usual. AutoMigrate creates the two tables; the bootstrap scan inserts a baseline row per existing rule. Clients and registries need no changes.
  • Disable — set versioning.enabled=false and restart. CRUD paths are unaffected; new endpoints return 503. The tables are left in place in case the feature is re-enabled later.
  • Full revert — reverting the two feat(versioning) commits is sufficient; the two preparatory commits at the bottom of the ladder are harmless to keep.

9. Test plan checklist

  • CI green
  • go test ./pkg/... passes locally
  • cd ui-vue3 && npm run build passes
  • Bootstrap → admin edit → rollback verified manually
  • Optimistic-lock 409 verified manually
  • Retention cap respected after exceeding maxVersionsPerRule

Running the §9.4 smoke drill end-to-end uncovered three real defects
that the unit suite did not catch:

- RuleVersionSubscriber recorded a duplicate UPSTREAM row whenever the
  registry echoed back a no-op change identical to the latest ledger
  row (typically right after BOOTSTRAP). Now dedupes upstream events
  whose content hash already matches the current head, with an explicit
  test in versioning_test.go.
- writeVersioningResp mapped every bizerror to HTTP 200/UnknownError;
  bizerror.InvalidArgument (eg. empty rollback reason) now returns
  HTTP 400 with its original code so the frontend can act on it.
  Covered by a new handler/rule_version_test.go.
- The 409 VERSION_CONFLICT toast auto-dismissed after the default
  duration; users could miss the Reload button entirely. Pinned with
  duration: 0 so the notification stays until acknowledged.
The §9.4 smoke drill expectation "after rollback, the rule should look
the same on UI refresh" exposed a pre-existing edit-form regression:
rollback was correct at the ledger and ZK levels, but the edit form
silently dropped `priority`, `force`, and (for condition routes)
`configVersion` because they were neither rendered in the GET response
nor re-sent on save.

This is not caused by versioning, but a true round-trip is the first
flow that forces every field through the loop. Adds the missing fields
to ConditionRuleResp / TagRuleResp on the backend, and reads/writes
them in updateByFormView.vue on the frontend so a "save → rollback →
reload" cycle is now lossless.
Add /task_plan.md /findings.md /progress.md to .gitignore so the
planning-with-files workflow does not leak per-developer working
memory into the repo.
@mochengqian mochengqian force-pushed the feat/Support-version-history-and-rollback-for-traffic-rules branch from 6deb25e to 9596f7e Compare May 22, 2026 05:19
@mochengqian mochengqian changed the title Support version history and rollback for traffic rules feat: Support version history and rollback for traffic rules May 22, 2026
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support version history and rollback for traffic rules

1 participant