Release oasis storage prune and compact subcomands by martintomazic · Pull Request #6521 · oasisprotocol/oasis-core

martintomazic · 2026-05-06T13:40:32Z

Closes #6519.

Relatively trivial to review. LOC is big because I extracted commands to separate files and added missing tests (separate commits).

Motivation

Offline pruning/compaction was introduced because late pruning 1. may not reclaim disk space, 2. may cause node to fall behind even once registered as ready (pending compaction load).
We could also use it to create go/oasis-node: Enable snapshot creation with exact start version #6423, serving as alternative to go/oasis-node/cmd/storage: Add create and import checkpoint cmd #6454.

How to test locally

For testing I used latest Sapphire snapshot (cca 7 months of data, keep_n = 100_000). It took 10min to prune and 6min to compact. For consensus I used snapshot-old (cca 3 months of data, keep_n = 100_000), due to limited disk space on my machine. Pruning took 2min and compaction 3min.

Node started syncing normally after prune/compact commands.

Possible follow-up

For #6454, we may want to add additional flag to the pruning command, e.g. retain_height, that ignores pruning config and calculates corresponding rounds at this height of every runtime, implementing keep_from proposal.

netlify · 2026-05-06T13:40:37Z

✅ Deploy Preview for oasisprotocol-oasis-core canceled.

Name	Link
🔨 Latest commit	`32bbd4e`
🔍 Latest deploy log	https://app.netlify.com/projects/oasisprotocol-oasis-core/deploys/69fd9b5370919a0008f45929

martintomazic · 2026-05-07T12:55:17Z

+	// By calculating retain round from the runtime state DB latest round,
+	// we ensure light history is never pruned past the latest synced runtime
+	// round.
+	retainRound := latest - numKept


In practice we should probably also respect executor prune handler that should prevent pruning past last normal round.

// runtimeLastNormalRound returns the last normal round for the given runtime. func runtimeLastNormalRound(ctx context.Context, ndb db.NodeDB, runtimeID common.Namespace) (uint64, error) { latest, ok := ndb.GetLatestVersion() if !ok { return 0, fmt.Errorf("consensus node DB is empty") } roots, err := ndb.GetRootsForVersion(latest) if err != nil { return 0, fmt.Errorf("failed to get roots for consensus version %d: %w", latest, err) } if len(roots) == 0 { return 0, fmt.Errorf("no roots found for consensus version %d", latest) } tree := mkvs.NewWithRoot(nil, ndb, roots[0], mkvs.WithoutWriteLog()) defer tree.Close() s := roothashState.NewImmutableState(tree) rtState, err := s.RuntimeState(ctx, runtimeID) if err != nil { return 0, fmt.Errorf("failed to get runtime state: %w", err) } return rtState.LastNormalRound, nil }

We have two options:

Pass consensus node DB to pruneRuntimeDBs.

Or just open and close it there (simplest and least changes).

This feels weird as this function should probably accept retainRound, like pruneConsensusDBs should accept retainHeight instead of runtimeIds (solution 2.).

Add new consensusRetainHeight and runtimeRetainRound functions that precompute this limits, possibly write unit tests for those two functions. The annoying part is that those two functions consume runtime histories, nodedbs and consensus nodedbs, meaning we loose resource encapsulation and need to keep them open throughout the whole command. Feels like a better direction, but requires a thorough refactor + makes logging and orchestration incredibly messy.

In addition adding this handler requires consensus state to be always present. Without it you can prune runtime state without having a consensus state locally (e.g. when hacking state if imported using snapshots, don't think this is needed though).

martintomazic · 2026-05-07T13:24:32Z

+	ndb, close, err := openConsensusNodeDB(dataDir)
+	if err != nil {
+		return fmt.Errorf("failed to open NodeDB: %w", err)
+	}
+	defer close()
+
+	latest, ok := ndb.GetLatestVersion()
+	if !ok {
+		logger.Info("skipping consensus pruning as state db is empty")
+		return nil
+	}
+
+	if latest < numKept {
+		logger.Info("skipping consensus pruning as the latest version is smaller than the number of versions to keep")
+		return nil
+	}
+
+	// In case of configured runtimes, do not prune past the earliest reindexed
+	// consensus height, so that light history can be populated correctly.
+	minReindexed, err := minReindexedHeight(dataDir, runtimes)
+	if err != nil {
+		return fmt.Errorf("failed to fetch earliest reindexed consensus height: %w", err)
+	}
+
+	retainHeight := min(
+		latest-numKept, // underflow not possible due to if above.
+		uint64(minReindexed),
+	)


E.g. this could be func consensusRetainHeight(ndb db.NodeDB, histories []history.Histories)(uint64, bool, error) and this function takes retainHeight: uint64 instead of the last two params, which would also allow us to unit test the business logic. As stated above this complicates the orchestration a lot though :(

All logic and style was preserved, except for following the new (better) storage inspect style of command definition.

martintomazic · 2026-05-08T08:06:32Z

+			logger.Info("Starting databases pruning. This may take a while...")
+
+			dataDir := cmdCommon.DataDir()
+			ctx := cmd.Context()


NIT: We should probably wire

signal.Notify(sigCh, os.Interrupt, syscall.SIGTERM) .

As cobra context does not capture this signals by default, resulting in non-graceful shutdown (never had an issue so fat though). Still it is annoying that BadgerDB command does not expose cancelation api, so even with this fixed it won't be complete.

Best way to mitigate unlikely stale state scenario. Moreover, it enables introducing context cancelation.

Code was preserved as is, expect for using the new new*Cmd() pattern introduced in the inspect command.

Offline compaction now also works for the runtime history and state DB.

Make it symetric with the prune command helpers.

martintomazic · 2026-05-08T08:58:31Z

Ready for a review.

Please check the PR context, the main question is how much do we want to complicate our life with prune handlers - comment.

Since the PR is already rather big, I am open to push one PR infront that does:

factors prune/compact commands to separate files,
adds missing tests
minor fixes
prune block and state store in small intervals.

Then this PR becomes much smaller and focused only one new runtime logic and releasing the commands.

matevz · 2026-05-08T10:08:22Z

@@ -386,7 +386,7 @@ enabling it for the first time, or later changing it to retain less data. This
 way they guarantee the node is healthy when it starts.

 Following successful pruning, to release disk space, they are encouraged to run


So many words :)

Suggested change

Following successful pruning, to release disk space, they are encouraged to run

After the pruning operators should run [the compaction command](#compact) to release disk space.

martintomazic force-pushed the martin/feature/release-storage-prune-cmd branch 4 times, most recently from c39199f to ddd1106 Compare May 7, 2026 12:49

martintomazic commented May 7, 2026

View reviewed changes

Comment thread go/oasis-node/cmd/storage/prune.go

martintomazic commented May 7, 2026

View reviewed changes

martintomazic added 4 commits May 7, 2026 22:51

go/oasis-test-runner: Add wait methods to controller

c28109f

go/oasis-test-runner/scenario/e2e: Add offline pruning scenario

79fed10

go/oasis-node/cmd/storage: Factor pruning command to separate file

7f0152a

All logic and style was preserved, except for following the new (better) storage inspect style of command definition.

go/oasis-node/cmd/storage: Generalize pruneConsensusNodeDB

12139cd

martintomazic force-pushed the martin/feature/release-storage-prune-cmd branch from ddd1106 to c768a9a Compare May 7, 2026 21:38

martintomazic commented May 8, 2026

View reviewed changes

martintomazic added 8 commits May 8, 2026 10:13

go/oasis-node/cmd/storage: Prune block and state store in batches

90c1c8c

Best way to mitigate unlikely stale state scenario. Moreover, it enables introducing context cancelation.

go/runtime/history: Add new Prune method

f6b59e8

go/oasis-node/cmd/storage: Release offline pruning

a8f5e1a

go/oasis-node/cmd/storage: Factor compact comand to separate file

551e0f5

Code was preserved as is, expect for using the new new*Cmd() pattern introduced in the inspect command.

go/oasis-node/cmd/storage: Defer badger DB close

69ea9c5

go/runtime/history: Add new Compact command

c046d1f

go/oasis-node/cmd/storage: Release offline compaction command

e527568

Offline compaction now also works for the runtime history and state DB.

go/oasis-node/cmd/storage: Improve compact commands helpers

32bbd4e

Make it symetric with the prune command helpers.

martintomazic force-pushed the martin/feature/release-storage-prune-cmd branch from c768a9a to 32bbd4e Compare May 8, 2026 08:14

martintomazic marked this pull request as ready for review May 8, 2026 08:58

martintomazic requested review from kostko, matevz, peternose and ptrus as code owners May 8, 2026 08:58

martintomazic mentioned this pull request May 8, 2026

Update pruning section oasisprotocol/docs#1729

Open

matevz reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release oasis storage prune and compact subcomands#6521

Release oasis storage prune and compact subcomands#6521
martintomazic wants to merge 12 commits intomasterfrom
martin/feature/release-storage-prune-cmd

martintomazic commented May 6, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 6, 2026 •

edited

Loading

Uh oh!

martintomazic May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

martintomazic May 7, 2026 •

edited

Loading

Uh oh!

martintomazic May 8, 2026

Uh oh!

martintomazic commented May 8, 2026

Uh oh!

Uh oh!

matevz May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -386,7 +386,7 @@ enabling it for the first time, or later changing it to retain less data. This
		way they guarantee the node is healthy when it starts.

		Following successful pruning, to release disk space, they are encouraged to run

	Following successful pruning, to release disk space, they are encouraged to run
	After the pruning operators should run [the compaction command](#compact) to release disk space.

Conversation

martintomazic commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

How to test locally

Possible follow-up

Uh oh!

netlify Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for oasisprotocol-oasis-core canceled.

Uh oh!

martintomazic May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martintomazic May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic May 8, 2026

Choose a reason for hiding this comment

Uh oh!

martintomazic commented May 8, 2026

Uh oh!

Uh oh!

matevz May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martintomazic commented May 6, 2026 •

edited

Loading

netlify Bot commented May 6, 2026 •

edited

Loading

martintomazic May 7, 2026 •

edited

Loading

martintomazic May 7, 2026 •

edited

Loading