Skip to content

admission: collapse from two WorkQueues to one#170621

Merged
trunk-io[bot] merged 1 commit into
cockroachdb:masterfrom
stevendanna:ssd/one-queue-step-3
May 28, 2026
Merged

admission: collapse from two WorkQueues to one#170621
trunk-io[bot] merged 1 commit into
cockroachdb:masterfrom
stevendanna:ssd/one-queue-step-3

Conversation

@stevendanna

@stevendanna stevendanna commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Delete resourceTier, cpuTimeTokenChildGranter, and the per-tier
arrays that organized the system into two WorkQueues, two requesters,
and four token buckets. The granter now holds two buckets
(canBurst, noBurst) and implements the granter interface directly.
The coordinator owns a single WorkQueue, and GetKVWorkQueue always
returns it when CTT is enabled (ignoring isSystemTenant).

Settings are consolidated: KVCPUTimeUtilGoal becomes the canonical
target under admission.cpu_time_tokens.target_util.system_tenant is retired.
ConfigSnapshot collapses from per-tier fractions to a single NoBurstFrac.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

@stevendanna stevendanna requested a review from a team as a code owner May 20, 2026 07:44
@trunk-io

trunk-io Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

😎 Merged directly without going through the merge queue, as the queue was empty and the PR was up to date with the target branch - details.

@blathers-crl

blathers-crl Bot commented May 20, 2026

Copy link
Copy Markdown

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@stevendanna stevendanna force-pushed the ssd/one-queue-step-3 branch from c8909d9 to 65a68de Compare May 22, 2026 13:10
@stevendanna stevendanna requested a review from a team as a code owner May 22, 2026 13:10
@stevendanna stevendanna requested review from shailendra-patel, srosenberg, tbg and wenyihu6 and removed request for a team, shailendra-patel and srosenberg May 22, 2026 13:10
@stevendanna stevendanna force-pushed the ssd/one-queue-step-3 branch from 65a68de to 762f06f Compare May 22, 2026 13:57
@stevendanna

Copy link
Copy Markdown
Collaborator Author

Apologies for the large diff.

@blathers-crl

blathers-crl Bot commented May 22, 2026

Copy link
Copy Markdown

Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link)

@stevendanna stevendanna force-pushed the ssd/one-queue-step-3 branch from 762f06f to aabee23 Compare May 22, 2026 17:39
@tbg tbg removed their request for review May 26, 2026 15:16

@wenyihu6 wenyihu6 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: nice! glad we are ripping this out in this release. re: commit 2: if the only purpose of the .app_tenant aliases is to keep existing dashboards rendering, is the right move to update the dashboard as part of the upgrade or create a new dashboard instead?

@wenyihu6 made 1 comment.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on joshimhoff and sumeerbhola).

@blathers-crl

blathers-crl Bot commented May 26, 2026

Copy link
Copy Markdown

Metrics change detected

This PR adds or updates one or more CRDB metrics. If you want these metrics to be exported by CRDB Cloud clusters to Internal CRL Datadog and/or included in the customer metric export integration (Essential metrics for standard deployment, and Essential metrics for advanced deployment), refer to this Installation and Usage guide of a CLI tool that syncs the metric mappings in managed-service. Run this CLI tool after your CRDB PR is merged.

  • The CLI opens a PR in managed-service with the required config changes.
  • Please track that PR and ensure it merges so your metrics become available to CRDB Cloud clusters.

Note: Your metric will appear in Internal CRL Datadog only after the managed-service PR merges and the new OTel configuration rolls out to at least one cluster running a CRDB build that includes this metric.

Docs: cockroach-metric-sync

Questions: reach out to @obs-india-prs

@wenyihu6

wenyihu6 commented May 26, 2026

Copy link
Copy Markdown
Contributor

@stevendanna

Copy link
Copy Markdown
Collaborator Author

is the right move to update the dashboard as part of the upgrade or create a new dashboard instead?

I'm OK with this as long as @joshimhoff is

@wenyihu6

Copy link
Copy Markdown
Contributor

Lets drop the second commit and get the first commit merged first? Don't want this PR to rot. I will stamp if you open a new PR with the second commit, and we can dabate more on it.

@stevendanna stevendanna force-pushed the ssd/one-queue-step-3 branch from aabee23 to 17b3806 Compare May 28, 2026 09:37
The CPU Time Token system previously used two work queues, one for
system tenant traffic and one for application tenant traffic. While
this does allow us to set different top-level utilization targets for
the system tenant vs the application tenant, we believe we can get
most of the benefit by mapping all serverless-mode system-tenant
requests into a group that is almost surely going to be prioritized
over any other tenant.

This collapse allows us to delete resourceTier,
cpuTimeTokenChildGranter, and the per-tier arrays that organized the
system into two WorkQueues, two requesters, and four token
buckets. The granter now holds two buckets
(canBurst, noBurst): and implements the granter interface directly.
The coordinator owns a single WorkQueue, and GetKVWorkQueue always
returns it when CTT is enabled.

Settings are consolidated: KVCPUTimeUtilGoal becomes the canonical
target under admission.cpu_time_tokens.target_util, with the old
system_tenant key retired.

Also delete the activeMode/lastMode indirection that was previously
used to track mode transitions across the filler. GetKVWorkQueue
and resetInterval read the cpuTimeTokenACMode cluster setting
directly.

The per-tier CPU time token metrics (suffixed .system_tenant and
.app_tenant) are replaced by single, untiered metrics.

Epic: none
Release note: none

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@stevendanna stevendanna force-pushed the ssd/one-queue-step-3 branch from 17b3806 to bd4651f Compare May 28, 2026 11:16
@stevendanna

Copy link
Copy Markdown
Collaborator Author

/trunk merge

@trunk-io trunk-io Bot merged commit ece07fa into cockroachdb:master May 28, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants