Skip to content

chore: sqlite v1 to v2 data migration#4692

Closed
NathanFlurry wants to merge 1 commit intographite-base/4692from
04-21-chore_sqlite_v1_to_v2_data_migration
Closed

chore: sqlite v1 to v2 data migration#4692
NathanFlurry wants to merge 1 commit intographite-base/4692from
04-21-chore_sqlite_v1_to_v2_data_migration

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented Apr 22, 2026

🚅 Deployed to the rivet-pr-4692 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web Apr 24, 2026 at 12:32 pm
frontend-cloud ❌ Build Failed (View Logs) Web Apr 24, 2026 at 7:22 am
frontend-inspector ❌ Build Failed (View Logs) Web Apr 24, 2026 at 7:22 am
website 😴 Sleeping (View Logs) Web Apr 23, 2026 at 6:06 pm
mcp-hub ✅ Success (View Logs) Web Apr 22, 2026 at 9:03 am
ladle ❌ Build Failed (View Logs) Web Apr 22, 2026 at 9:03 am

@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4692 April 22, 2026 09:02 Destroyed
Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 22, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry changed the base branch from 04-19-chore_move_rivetkit_to_task_model to graphite-base/4692 April 22, 2026 09:04
@claude
Copy link
Copy Markdown

claude Bot commented Apr 22, 2026

SQLite v1-to-v2 Migration - Code Review

Overview

This PR implements the v1-to-v2 SQLite data migration path for actor-side storage, runs the migration eagerly at actor boot (on envoys that support v2), and consolidates the VFS layer so only v2 is available going forward. The migration reads the v1 KV snapshot, rolls back any dangling journal, validates integrity via rusqlite, and commits the recovered pages into the v2 staged-commit protocol.

Net change: -7725/+6125 lines, largely from retiring the legacy v2/vfs.rs (5077 deletions) and the EnvoyKv / KvVfs v1 path from database.rs (159 deletions).


Issues

Indentation bug in prepare_v1_migration (takeover.rs)

The body of the if let block is not indented - let existing_head and ensure! appear at the same indent level as the if let condition. cargo fmt would catch this. Worth fixing before merge so reviewers do not have to wonder whether ensure! runs unconditionally.

split_bytes(bytes, 0) returns one oversized chunk

If meta.max_delta_bytes is somehow 0 (misconfiguration, stale engine), this silently promotes the entire database into a single chunk and lets the commit path reject it with an opaque error. A guard before the split_bytes call (ensure!(max_chunk_bytes > 0)) would surface a cleaner failure.


Memory and Performance Concerns

Double-materialise for large databases

The migration holds two full copies of the database in memory at once: snapshot.main.bytes (v1 chunks assembled) and recovered.bytes (post-journal-recovery). For a 128 MiB database that is ~256 MiB peak. Dropping snapshot before reading the recovered file would halve that; page-streaming from the tempfile would eliminate it entirely.

SQLITE_MIGRATION_LOCKS map never shrinks

Entries are added once per actor_id and never removed. After a successful migration the early-return on Native/MigratedFromV1 means the lock is never acquired again, but the allocation lives for the process lifetime. For a long-running envoy that migrates many unique actors this leaks one Arc<Mutex<()>> per actor. Consider evicting the entry after maybe_migrate_v1_to_v2 returns Ok(true).


Metrics

SQLITE_MIGRATION_PAGES uses latency buckets for page counts

The shared BUCKETS are latency-shaped (0.001 to 100 s). Page counts for a 4 KiB-page 128 MiB database run up to ~32768 - nearly everything will land in the infinity bucket. Custom buckets like [1.0, 10.0, 100.0, 500.0, 1000.0, 5000.0, 10000.0, 32768.0] would make the metric useful.


Minor / Nit

test_db() leaks its temp directory. tempdir()?.keep() prevents auto-cleanup. tempdir()? without .keep() would be cleaner unless RocksDB requires the directory to outlive the handle.

Last assertion in reads_v1_files_beyond_the_default_kv_list_limit checks a coincidental value. seed_v1_sparse_chunks stores (chunk_idx as u8).wrapping_add(1). For chunk 16384 that is 1. chunk_count as u8 (= 16385 as u8) is also 1. Both happen to equal 1, so the assertion passes without distinguishing the last chunk from chunk 0. Spelling out 1_u8 literally or mirroring the seeder formula would make the intent clear.


What Is Working Well

  • Double-checked locking pattern (pre-lock existence check, acquire, re-check) is correct.
  • Lease-based retry for stale MigratingFromV1 heads avoids a permanent wedge on envoy crash.
  • decode_db_head fallback to LegacyDBHead correctly handles persisted heads that lack the origin field - BARE is positional so adding a trailing field makes old bytes undecodable as the new type; the two-attempt decode is the right approach.
  • tx_delete_value_precise handles both inline and chunked value layout, and the trailing clear_range is a good safety net for any orphaned subkeys.
  • origin_override on CommitFinalizeRequest lets the migration mark the head atomically at finalize time rather than requiring a separate write.
  • Test coverage is solid: happy-path, idempotency, stale-lease retry, fresh-in-progress rejection, WAL/SHM sidecar rejection, missing chunks, oversized file, zero-size db, and journal recovery are all covered.
  • Removing schema_version plumbing from SqliteRuntimeConfig / SqliteDb / NAPI is a clean simplification - v2 is now the only path at the TypeScript runtime boundary.

Open Question

After a successful migration the v1 KV chunks remain in the store - the migration tags the v2 head as MigratedFromV1 but does not delete the old data. Is a cleanup pass planned, or is v1 data expected to be tombstoned lazily? If out of scope for this PR, a follow-up ticket would help track the storage overhead during the transition window.

Generated with Claude Code (https://claude.com/claude-code)

@NathanFlurry NathanFlurry force-pushed the 04-21-chore_sqlite_v1_to_v2_data_migration branch from c2fbb00 to daf7aa8 Compare April 22, 2026 09:11
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4692 April 22, 2026 09:12 Destroyed
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4692 to 04-19-chore_move_rivetkit_to_task_model April 22, 2026 09:12
@NathanFlurry NathanFlurry force-pushed the 04-21-chore_sqlite_v1_to_v2_data_migration branch from daf7aa8 to f6c7523 Compare April 24, 2026 07:21
@NathanFlurry NathanFlurry force-pushed the 04-19-chore_move_rivetkit_to_task_model branch from 5b8e173 to 02b0b04 Compare April 24, 2026 07:21
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4692 April 24, 2026 07:21 Destroyed
@NathanFlurry NathanFlurry force-pushed the 04-19-chore_move_rivetkit_to_task_model branch from 02b0b04 to b2af041 Compare April 24, 2026 08:12
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4692 April 24, 2026 08:12 Destroyed
This was referenced Apr 24, 2026
@NathanFlurry NathanFlurry changed the base branch from 04-19-chore_move_rivetkit_to_task_model to graphite-base/4692 April 24, 2026 11:24
@NathanFlurry NathanFlurry force-pushed the 04-21-chore_sqlite_v1_to_v2_data_migration branch from 1418052 to b6d4f6f Compare April 24, 2026 11:48
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4692 April 24, 2026 11:49 Destroyed
@NathanFlurry NathanFlurry force-pushed the 04-21-chore_sqlite_v1_to_v2_data_migration branch from b6d4f6f to 59858bf Compare April 24, 2026 12:32
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4692 April 24, 2026 12:32 Destroyed
@NathanFlurry
Copy link
Copy Markdown
Member Author

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant