Skip to content

license: implement vCPU audit background writer for self-hosted clusters#170859

Merged
trunk-io[bot] merged 1 commit into
cockroachdb:masterfrom
sadaf-crl:CC-35136
Jun 2, 2026
Merged

license: implement vCPU audit background writer for self-hosted clusters#170859
trunk-io[bot] merged 1 commit into
cockroachdb:masterfrom
sadaf-crl:CC-35136

Conversation

@sadaf-crl

Copy link
Copy Markdown
Contributor

Add background ticker infrastructure for tracking vCPU consumption in self-hosted clusters.
A background goroutine fires hourly to collect vCPU data from status.GetVCPUs() and prepare it for writing to system.vcpu_hours_audit (table creation in follow-up).

The writer detects license rotation and triggers an immediate write to prevent gaps in consumption tracking. It handles graceful shutdown with a 5-second timeout for a final audit record write.

The actual SQL INSERT logic is stubbed with log.Dev.Infof() until the system table is created in a follow-up commit. (#170328)

Epic: CC-35515
Release note: None

@trunk-io

trunk-io Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

😎 Merged successfully - details.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@sadaf-crl sadaf-crl force-pushed the CC-35136 branch 5 times, most recently from f01bd1e to 77a504e Compare May 26, 2026 12:28
@blathers-crl

blathers-crl Bot commented May 26, 2026

Copy link
Copy Markdown

Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link)

@sadaf-crl sadaf-crl marked this pull request as ready for review May 27, 2026 05:06
@sadaf-crl sadaf-crl requested a review from a team as a code owner May 27, 2026 05:06
@sadaf-crl sadaf-crl requested review from a team, bghal, rahulcrl and vishalv1994 and removed request for a team May 27, 2026 05:06
Comment thread pkg/server/license/vcpu_audit.go Outdated
@github-actions

Copy link
Copy Markdown
Contributor

AI Review: Potential Issue(s) Detected

An inline comment has been added to the relevant line with details and a suggested fix.

Summary: In maybeWriteOnLicenseRotation (vcpu_audit.go:157), bytes.Equal(prevLicenseID, newLicenseID) compares the stored normalized value []byte("no-license") against the raw nil parameter. Since bytes.Equal([]byte("no-license"), nil) is always false, every RefreshForLicenseChange call with a nil licenseID (i.e., on unlicensed clusters) will spuriously trigger a "license rotation detected" immediate vCPU audit write. Once the TODO SQL INSERT is implemented, this will produce incorrect audit rows on every license callback for unlicensed clusters.

View full analysis

If helpful: add O-AI-Review-Real-Issue-Found label.
If not helpful: add O-AI-Review-Not-Helpful label.

@github-actions github-actions Bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label May 27, 2026
Comment thread pkg/server/license/vcpu_audit.go Outdated
Comment thread pkg/server/license/vcpu_audit.go Outdated
Comment thread pkg/server/license/enforcer.go Outdated
Comment thread pkg/server/license/vcpu_audit.go Outdated
Comment thread pkg/server/license/vcpu_audit_test.go

@rahulcrl rahulcrl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread pkg/server/server_sql.go
Comment thread pkg/server/license/enforcer.go
Comment thread pkg/server/license/vcpu_audit.go Outdated
// vcpuAuditData contains the data needed to write a vCPU audit record.
type vcpuAuditData struct {
// licenseID is the unique identifier for the license.
// Nil if no license is installed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is set to no-license (and not Nil as mentioned in the comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the comment as it was not meaningful here


for {
select {
case <-ticker.C:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the first record will be written only after an hour after start - is that intended, since we would like to avoid gaps in consumption tracking?

@sadaf-crl sadaf-crl Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it should write as soon as the node starts to avoid losing that hour's consumption. Updated it.

Comment thread pkg/server/license/vcpu_audit.go Outdated
// TODO(sadaf-crl): Replace log.Infof with actual SQL INSERT when
// system.vcpu_hours_audit table is available.
func (e *Enforcer) writeVCPUAuditRecord(
ctx context.Context, nodeID roachpb.NodeID, hourTimestamp time.Time,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove nodeID as a param here since Enforcer already has it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

hourTimestamp = hourTimestamp.Truncate(time.Hour)

// TODO(sadaf-crl): Replace this log statement with SQL INSERT:
// INSERT INTO system.vcpu_hours_audit (node_id, license_id, hour_timestamp, num_vcpu)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an immediate concern but when we implement the INSERT logic, we might need to guard against duplicate rows since (node_id, license_id, period_start) is the primary key and we round off against the hour. For example, we might have the license change logic and shutdown logic kicking off writeVCPUAuditRecord around the same time and trying to write the same value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. Will handle this in upcoming PR where INSERT logic is implemented

Comment thread pkg/server/license/vcpu_audit.go Outdated
// maybeWriteOnLicenseRotation checks if the license ID changed and triggers
// an immediate vCPU audit write to prevent gaps in consumption tracking.
// This is called from RefreshForLicenseChange when the license is updated.
func (e *Enforcer) maybeWriteOnLicenseRotation(ctx context.Context, newLicenseID []byte) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function name is slightly misleading because it unconditionally mutates currentLicenseID, even if it doesn't always write the audit record. Can we rename it to updateLicenseIDAndMaybeWrite (preferable) or add a comment on the store line stating that caching happens on every call?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to updateLicenseIDAndMaybeWrite

Comment thread pkg/server/license/vcpu_audit_test.go Outdated

// Wait for at least 2 ticks and verify writes actually fired.
require.Eventually(t, func() bool {
return writeCount.Load() >= 2

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be >= 3 now that we have the write on startup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

 self-hosted clusters

Add background ticker infrastructure for tracking
vCPU consumption in self-hosted clusters.
A background goroutine fires hourly to collect
vCPU data from status.GetVCPUs() and prepare it for
writing to system.vcpu_hours_audit (table creation in
follow-up).

The writer detects license rotation and triggers an
immediate write to prevent gaps in consumption tracking.
It handles graceful shutdown with
a 5-second timeout for a final audit record write.

The actual SQL INSERT logic is stubbed with
log.Dev.Infof() until the
system table is created in a follow-up commit.

Epic: CC-35515

Release note: None
@sadaf-crl

Copy link
Copy Markdown
Contributor Author

TFTR @rahulcrl @suj-krishnan

@sadaf-crl

Copy link
Copy Markdown
Contributor Author

/trunk merge

@trunk-io trunk-io Bot merged commit b08a491 into cockroachdb:master Jun 2, 2026
38 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. target-release-26.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants