Add required ci check for keynote-2 benchmark#5078
Merged
Conversation
keynote-2 benchmark
9b8e2ab to
ad299df
Compare
9588d67 to
e814040
Compare
e814040 to
f83797b
Compare
pull Bot
pushed a commit
to age-rs/SpacetimeDB
that referenced
this pull request
May 28, 2026
…bs#5095) # Description of Changes Before this change, we used a single async-enabled wasm runtime for all requests, even though procedures are the only operation that can yield. Now each module gets two separate runtimes. We continue to use the same async runtime for procedures, but now reducers are executed against a synchronous wasm runtime, backed by a single OS-thread instead of a Tokio runtime. The purpose of this change is to remove from the critical path the overhead associated with async calls that really aren't async at all. Also includes the following fix from clockworklabs#5135: > After clockworklabs#4973, WASM procedures can execute concurrently with later operations on the same WebSocket. > Before this change, the C# regression testsuite queued several procedures, then immediately queued `UnsubscribeThen`. After clockworklabs#4973, the unsubscribe could be applied before the `SubscriptionEventOffset` procedure callback ran, clearing `MyTable` from the local subscribed cache. The callback then failed while asserting that the `offset-test:` row was present. > This change treats unsubscribe as a separate phase. It is scheduled after the main work is queued, but only starts once `waiting == 0`, so all callbacks that inspect subscribed state run before the cache is cleared. # API and ABI breaking changes None # Expected complexity level and risk 2.5 # Testing Pure refactor. Relies on current test coverage. clockworklabs#5078 will ensure the performance is on par with V8.
29ac856 to
098bd98
Compare
bfops
approved these changes
May 28, 2026
bfops
left a comment
Collaborator
There was a problem hiding this comment.
This LGTM in terms of CI. I can't speak to the correctness of how the benchmark is being run, but it seems to be passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of Changes
Adds a new required ci check for keynote-2 benchmark regressions. The test runs for 60s and fails if throughput < 300K TPS.
Note, this check will be flaky as long as it's running concurrently with other CI jobs. It may need a dedicated runner/host machine. Although it may be sufficient to only schedule one runner/VM to a single host machine at a time. I'll need to sync with @jdetter to determine the best way forward here.
UPDATE: We're using a dedicated runner. See the Testing section.
API and ABI breaking changes
N/A
Expected complexity level and risk
2
Mainly copy-paste from the other CI workflows.
Testing
This job now uses
spacetimedb-benchmark-runnerwhich is entirely dedicated to this one CI job. I've tested this at different times of the day when the CI runners are under load and not. The performance is consistent and the test isn't flaky. It has passed every time.