Skip to content

Adjust mod multiplier recalculation processes to minimise number of required database writes#374

Draft
bdach wants to merge 5 commits into
ppy:masterfrom
bdach:multiplier-recalculation-with-reduced-writes
Draft

Adjust mod multiplier recalculation processes to minimise number of required database writes#374
bdach wants to merge 5 commits into
ppy:masterfrom
bdach:multiplier-recalculation-with-reduced-writes

Conversation

@bdach
Copy link
Copy Markdown
Collaborator

@bdach bdach commented May 29, 2026

PRing early as draft primarily for visibility reasons.

The original plan for deploying the mod multiplier changes was to first ensure that all scores in the database have total_score_without_mods present, and then write out the updated total_score across the whole table as the new multipliers times that populated total_score_without_mods quantity.

Aside: Why is the migration not run in-place?

The primary reason for me approaching things in this way rather than running the recalculation in-place is because running the recalculation in-place would be an inherently lossy process.

  • Assuming everything goes right, because total score is rounded to full integers and mod multipliers are very much not integers, the possibility of rounding errors and floating-point inaccuracies is very real. Therefore, any in-place recalculation of multipliers would be a process that ideally would only ever occur once and never again, because the more times it is performed, the bigger the degradation incurred with every in-place conversion.

  • And that is assuming everything goes right. In case of any mistake or incorrectly-applied migration, the original data would be lost in an in-place migration, and would not be recoverable quickly or at all because of how complicated the scoring algorithm is at the explicit demand of the user base. The closest thing to a recovery procedure in that case would be replays - when there are replays - and even that would be slow.

    It is my former professional experience that tells me you never want to run huge data migrations in-place which incur chances of irretrievable or difficult-to-recover-from data loss.

In recent days, an attempt was made to run the backpopulation of total_score_without_mods on the producton scores table. Unfortunately at this time the original plan appears to be mostly unviable.

The number of rows in the production scores table that require this backfill is estimated to be around 2.38 billion rows. This would incur a storage overhead in the hundreds of gigabytes (not only from the actual rows themselves, but also from a transient spike in the size of binlog, which is relevant when replication is depended upon) and take roughly a month to execute.

This PR contains a set of changes that is designed to reduce the number of writes required to execute this migration. The cost here is the increased complexity of the migration process. While I am somewhat confident this is going to work, I will want to test this more thoroughly on the actual changes and maybe even do some test runs locally using data dumps to make sure I have not missed anything.

Additionally, comments from #269 (review) are addressed here.

As for the gory details:

Switch to new mod multiplier calculation API in backpopulation command

By itself this does nothing, the old API worked fine as far as the backpopulation was concerned. However doing this allows for an opportunity of automatically handling the change to mania key mod multipliers that was done live without backwards adjustments. More on this later.

Adjust query in backpopulation command to target the minimum necessary number of rows

As the inline commentary states, the number of rows that require writes of total_score_without_mods can be significantly reduced because of two facts:

  • Firstly, when it comes to scores set on stable, total_score_without_mods as well as total_score are quantities derived directly from legacy_total_score as well as beatmap attributes. Therefore, it is not actually necessary to write total_score_without_mods for those scores, and instead the score conversion algorithm can be used to recalculate the multipliers instead.
  • Secondly, when it comes to lazer scores specifically, this change assumes the invariant that if there are no mods on a score, then total_score == total_score_without_mods - so there is no point writing out values.

Adjust backpopulation command to automatically handle change to mania key mod multipliers

Migrating to the new multiplier API, which already has to handle multiplier versioning and the directly-applied change to mania key mod multipliers, means that it is now possible to leverage the versioned logic and automatically correct for the key mod multiplier change by filling out ScoreInfo.ClientVersion using osu_builds.

Adjust mod multiplier recalculation command to work with the minimum necessary number of rows with total_score_without_mods

Adds an alternate processing path for stable scores which leverages the algorithm of converting from legacy total score to implement the multiplier recalculation.

The command was already skipping scores with no mods at all, so no adjustment required there.

Add test coverage of commands

Written using real-world cases pulled from live production scores (via data.ppy.sh dumps plus some manual trawling for the mania key mod cases).

In the current state the most useful tests are the ones checking correctness of the mania key mod recalculation. Everything else will need updating once the new proposed values are out.

bdach added 5 commits May 29, 2026 13:53
By itself this does nothing, the old API worked *fine* as far as the
backpopulation was concerned. However doing this allows for an
opportunity of automatically handling the change to mania key mod
multipliers that was done live without backwards adjustments. More on
this later.
…y number of rows

As the inline commentary states, the number of rows that require writes
of `total_score_without_mods` can be significantly reduced because of
two facts:

- Firstly, when it comes to scores set on stable,
  `total_score_without_mods` as well as `total_score` are quantities
  derived directly from `legacy_total_score` as well as beatmap
  attributes. Therefore, it is not actually necessary to write
  `total_score_without_mods` for those scores, and instead the score
  conversion algorithm can be used to recalculate the multipliers
  instead.

- Secondly, when it comes to lazer scores specifically, this change
  assumes the invariant that if there are no mods on a score, then
  `total_score == total_score_without_mods` - so there is no point
  writing out values.
… key mod multipliers

Migrating to the new multiplier API, which already has to handle
multiplier versioning and the directly-applied change to mania key mod
multipliers, means that it is now possible to leverage the versioned
logic and automatically correct for the key mod multiplier change by
filling out `ScoreInfo.ClientVersion` using `osu_builds`.
…necessary number of rows with `total_score_without_mods`

Adds an alternate processing path for stable scores which leverages the
algorithm of converting from legacy total score to implement the
multiplier recalculation.

The command was already skipping scores with no mods at all, so no
adjustment required there.
Written using real-world cases pulled from live production scores (via
data.ppy.sh dumps plus some manual trawling for the mania key mod
cases).
@bdach bdach self-assigned this May 29, 2026
@bdach bdach moved this from Inbox to Pending Review in osu! team task tracker May 29, 2026
@peppy peppy self-requested a review May 29, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Pending Review

Development

Successfully merging this pull request may close these issues.

1 participant