Skip to content

Migrate allocator role to sudo key#161

Merged
functor-flow merged 1 commit into
mainfrom
agent/migrate-allocator
Jun 27, 2026
Merged

Migrate allocator role to sudo key#161
functor-flow merged 1 commit into
mainfrom
agent/migrate-allocator

Conversation

@renlabs-agent

@renlabs-agent renlabs-agent Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds spec 29 runtime migration that reads current sudo key from Sudo::Key and makes it the sole governance allocator.
  • Whitelists the sudo key, funds its registration burn from DAO treasury, and registers it through the normal torus0 registration path using the fixed allocator name new_allocator, reusing prior allocator URL/metadata.
  • Unstakes prior allocator outbound stake back to free balance before deregistration.
  • Moves existing non-allocator stake targeting prior allocator(s) to the sudo allocator through torus0::stake::transfer_stake before old allocator agents are deregistered.
  • Bumps runtime spec version 28 -> 29 and governance storage version 5 -> 6.
  • Updates changelog entries.

Live-state checks

  • Current mainnet has 1 allocator distinct from sudo.
  • Current mainnet has 447 inbound stake edges to the old allocator, and StakingTo / StakedBy keysets match for those edges.
  • Current mainnet has 1 old-allocator outbound edge: old allocator self-stake, amount 452734767. The migration unstakes this to the old allocator free balance before deregistration.
  • Current mainnet has no existing agent named new_allocator.
  • Current sudo key is not already registered as an agent.

Verification

  • cargo fmt
  • SKIP_WASM_BUILD=1 cargo check -p torus-runtime
  • cargo clippy -p torus-runtime -p pallet-governance -p pallet-torus0 --tests -- -D warnings
  • SKIP_WASM_BUILD=1 cargo test -p torus-runtime -p pallet-governance -p pallet-torus0
  • SKIP_WASM_BUILD=1 cargo test --workspace --exclude torus-client --exclude torus-mcp
  • just try-runtime-upgrade-mainnet

Note

Full just check/just test and local pre-push hook hit torus-client build.rs DNS resolution for api.testnet.torus.network. The runtime-focused checks and mainnet try-runtime upgrade pass.

@steinerkelvin

steinerkelvin commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

⚠️ Migration is broken on current mainnet state — requesting changes

I validated this migration against live mainnet state and it does not achieve its stated goals. Two of the three goals (register the sudo allocator, transfer prior allocator stake) silently no-op, and a large block of stake is unwound instead of migrated.

Evidence

Current mainnet state (queried via RPC at wss://api.torus.network):

Sudo::Key                         = 5GyLPVZr2vezWWwp4yLWEuKZ4uH5fYvU5GjuqmWV4PrAEVLy
Governance::Allocators            = [5DoVVgN7R6vHw4mvPX8s4EkkR8fgN1UJ5TDfKzab8eW9z89b]

sudo key is a registered agent?   false
the lone allocator is an agent?   true   (and != sudo key)
inbound stake on that allocator   42_751_919_073_108_961_966_126_552  (~4.28e25 planck)

This is exactly the failure scenario. Adding invariant pre_upgrade/post_upgrade hooks (not present in this PR) and running them against this live state (see "why the green run didn't reflect this" below for the exact command), post_upgrade aborts with:

MigrateAllocator: sudo allocator was not registered as an agent

Root cause

In migrate_allocator, registration happens before the old allocators are deregistered, and it reuses the old allocator's name:

register_allocator(&new_allocator, old_agent.as_ref()); // registers with old_agent.name
transfer_allocator_stake(&old_allocators, &new_allocator);
for old_allocator in old_allocators {
    let _ = pallet_torus0::agent::deregister::<Runtime>(old_allocator); // happens AFTER
}

pallet_torus0::agent::register rejects duplicate names:

ensure!(
    !exists::<T>(&agent_key) && Pallet::<T>::find_agent_by_name(&name).is_none(),
    Error::<T>::AgentAlreadyRegistered
);

Since the old allocator agent still holds that name at this point, register returns Err(AgentAlreadyRegistered), which is swallowed by let _ =. Cascade:

  1. The sudo allocator is never registered as a torus0 agent.
  2. transfer_allocator_stake early-returns on !Agents::contains_key(new_allocator)no stake transferred.
  3. deregister(old_allocator)stake::clear_key unwinds the ~4.28e25 stake back to the original stakers.

Net on-chain result: the sudo key is whitelisted + sole allocator but is not a registered agent, and the allocator-backing stake is dissolved rather than migrated. (Minor: fund_registration_burn also leaks one Burn from the DAO treasury to the sudo key, since funding runs before the failing register.)

This isn't a one-line reorder

The fix needs care because there's a genuine dependency cycle:

  • transfer_stakeadd_stake/remove_stake both ensure!(agent::exists(...)), so both old and new must be registered agents at transfer time.
  • But the new allocator can't register under the old name until the old allocator is deregistered, and deregister clears the old allocator's stake (returns it to stakers).

So "register new (same name) + transfer_stake" is unsatisfiable as written. A correct approach likely needs to capture (staker, amount) pairs, deregister the old allocator, register the new one, then re-stake from the stakers' now-freed balances (or avoid reusing the exact name).

Why the green try-runtime run didn't reflect this, and a recommendation

To be clear, this isn't a single CI-flag miss — it's two compounding gaps:

  1. The migration ships no pre_upgrade/post_upgrade hooks. So there was nothing for try-runtime to assert about the migration's goals in the first place. just try-runtime-upgrade-mainnet only validates idempotency + full state decode, and a broken-but-non-panicking migration passes both. A green run here says nothing about whether registration/stake-transfer actually happened.

  2. Even once hooks exist, the default just target wouldn't run them on the real transition. Because this runtime has the multi-block-migration framework enabled, try-runtime-cli forces the single-block migration to run with checks: None unless --disable-mbm-checks is passed:

    // try_runtime_core/commands/on_runtime_upgrade/mod.rs
    let sync_checks = if command.disable_mbm_checks { command.checks }
                      else { UpgradeCheckSelect::None };

I confirmed the bug by adding invariant hooks locally and running:

try-runtime --runtime <wasm> on-runtime-upgrade \
  --checks pre-and-post --disable-mbm-checks \
  --blocktime 8000 live --uri wss://api.torus.network

post_upgrade aborted with MigrateAllocator: sudo allocator was not registered as an agent.

Recommendation: (a) add pre_upgrade/post_upgrade invariant hooks to migrations with non-trivial goals, and (b) add --disable-mbm-checks --checks pre-and-post to the try-runtime-upgrade-{mainnet,testnet} just targets so those hooks actually fire in CI. Both are needed — neither alone catches this. (A firing assertion currently surfaces as a wasm trap because runtime/src/apis.rs .unwrap()s the TryRuntime result; the real message is the inner Other("...").)

Suggested test hooks

I wrote pre_upgrade/post_upgrade invariant hooks for MigrateAllocator that catch this (they produced the failure above). Happy to push them as a follow-up if useful — they assert: storage version bumped, sudo is the sole whitelisted allocator, prior allocator agents removed, and (when a prior identity existed) the sudo allocator is registered and received the prior stake.

@renlabs-agent renlabs-agent Bot force-pushed the agent/migrate-allocator branch 3 times, most recently from 2621f43 to 24aad93 Compare June 26, 2026 20:22
@renlabs-agent renlabs-agent Bot force-pushed the agent/migrate-allocator branch from 24aad93 to 07eff0a Compare June 26, 2026 20:31
@steinerkelvin

Copy link
Copy Markdown
Collaborator

✅ Fix verified against live mainnet state

The updated migration (07eff0a0) resolves the blocking issue. I confirmed it end-to-end by running invariant pre_upgrade/post_upgrade hooks against current mainnet state via try-runtime (--disable-mbm-checks --checks pre-and-post, so the single-block migration's checks actually fire):

PR head Hooks on live mainnet state
bed9db1 (original) MigrateAllocator: sudo allocator was not registered as an agent
07eff0a0 (this revision) ✅ passes — sudo allocator registered, external inbound stake transferred

Same test, opposite result. On live state the sudo allocator now ends up registered as an agent and holding the prior allocator's external inbound stake, plus the run is idempotent and the post-state decodes cleanly.

What fixed it:

  • Name collision resolved — registering as b"new_allocator" (valid path agent.new_allocator) instead of reusing the prior allocator's still-present name, so register no longer fails with AgentAlreadyRegistered.
  • External-stake transfer addedstaked ∈ old && staker ∉ old correctly moves delegators' stake to the new allocator while excluding intra-allocator stake.

Remaining notes (all non-blocking)

  1. Agent name is now the literal "new_allocator", not the prior allocator's recognizable name (url + metadata are still inherited). If the intent was to preserve the allocator's on-chain identity/name, this changes it — worth a deliberate confirm. (Reusing the old name is only possible if the old agent is deregistered before the new one registers, which would then require capturing + re-applying stake from freed balances.)
  2. The first removal loop is largely redundant. deregister(old_allocator)stake::clear_key already unwinds the old allocators' stake; explicitly remove_stake-ing their outbound stake beforehand just changes when those funds are unreserved. Harmless, but can likely be dropped.
  3. fund_registration_burn transfers exactly Burn, which register immediately sends back to the treasury, leaving the new allocator at ~0 free balance (can't pay fees afterward). Consider funding Burn + ED + a little headroom.
  4. on_runtime_upgrade returns Weight::zero() while iterating all of StakingTo + agents; a conservative real weight would be more accurate for a migration of this size.

Suggestion for the migration tooling

Worth adding pre_upgrade/post_upgrade hooks to migrations with non-trivial goals, and --disable-mbm-checks --checks pre-and-post to the try-runtime-upgrade-{mainnet,testnet} just targets — otherwise (as here) the MBM-enabled runtime runs the single-block migration with checks: None and these invariants never execute in CI. I have the hooks I used for the verification above and can open them as a follow-up if useful.

@functor-flow functor-flow merged commit d48447d into main Jun 27, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants