-
Notifications
You must be signed in to change notification settings - Fork 260
Add how-to guide for storage data migration (#2228) #2299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
82ee314
ef85dcb
53af14e
394b9f6
2d8744a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,342 @@ | ||
| --- | ||
| title: Migrate contract storage data when upgrading data structures | ||
| hide_table_of_contents: true | ||
| description: Use the version marker pattern to safely read and migrate stored data when a contract upgrade changes a data structure | ||
| --- | ||
|
|
||
| When a contract is upgraded and a stored data structure gains new fields, the data already written to the ledger still uses the old layout. Naively reading those old entries with the new type causes the host to trap. This guide explains why that happens, introduces the version marker pattern as the correct solution, and covers lazy versus eager migration strategies and how to test them. | ||
|
|
||
| ## Why intuitive approaches fail | ||
|
|
||
| Suppose a contract stores `DataV1` entries and is upgraded to use `DataV2`, which adds an optional field `c`: | ||
|
|
||
| ```rust | ||
| #[contracttype] | ||
| pub struct Data { a: i64, b: i64 } | ||
|
|
||
| #[contracttype] | ||
| pub struct DataV2 { a: i64, b: i64, c: Option<i64> } | ||
| ``` | ||
|
|
||
| ### Approach 1: Read old entries directly with the new type | ||
|
|
||
| The most natural approach is to read the stored bytes directly as `DataV2` and expect `c` to default to `None`: | ||
|
|
||
| ```rust | ||
| // Reading a DataV1 entry with the DataV2 type. | ||
| // A developer might expect c = None for old entries — but this traps. | ||
| let data: DataV2 = env.storage().persistent().get(&key).unwrap(); | ||
| // Error(Object, UnexpectedSize) | ||
|
carstenjacobsen marked this conversation as resolved.
Outdated
|
||
| ``` | ||
|
|
||
| This traps with `Error(Object, UnexpectedSize)`. The Soroban host validates the field count of the XDR-encoded value against the type definition before returning anything to the contract. Because `DataV1` has two fields and `DataV2` has three, the host rejects the entry before the SDK can handle it. | ||
|
|
||
| ### Approach 2: Use `try_from_val` as a fallback | ||
|
|
||
| Another approach is to use `try_from_val` expecting to catch a deserialization error and recover: | ||
|
|
||
| ```rust | ||
| let raw: Val = env.storage().persistent().get(&key).unwrap(); | ||
| if let Ok(v2) = DataV2::try_from_val(&env, &raw) { | ||
| v2 | ||
| } else { | ||
| // This branch is never reached — the host traps before returning Err. | ||
| let v1 = DataV1::try_from_val(&env, &raw).unwrap(); | ||
| DataV2 { a: v1.a, b: v1.b, c: None } | ||
| } | ||
|
carstenjacobsen marked this conversation as resolved.
|
||
| ``` | ||
|
|
||
| This also traps at the host level. The field count validation happens in the host environment during deserialization — it does not produce a Rust `Err` that the SDK can intercept. There is no way to catch or recover from the mismatch at the contract level. | ||
|
|
||
| The root issue is that a contract cannot determine which type an existing storage entry was written as just by reading it. That information must be stored explicitly. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On a first read, I missed that the first two approaches are what not to do and I got confused. So I think it might be helpful if this how-to guide leads with what someone should do, and then we could have some notes in those little boxes saying here's an approach that someone might think to use but don't do it, it won't work. I would put them further down in the document rather than leading with them. |
||
|
|
||
| ## Version Marker Pattern | ||
|
|
||
| The solution is to store a version number alongside each data entry, keyed by the same identifier. The contract reads the version first, then branches on the result to decode the payload with the correct type. | ||
|
|
||
| ### Key layout | ||
|
|
||
| Define two variants in your key enum — one for the version marker and one for the payload — both keyed by the same `id`: | ||
|
|
||
| ```rust | ||
| #[contracttype] | ||
| pub enum DataKey { | ||
| DataVersion(u32), // version marker keyed by id | ||
| Data(u32), // data keyed by id | ||
| } | ||
| ``` | ||
|
|
||
| Each logical record occupies two storage slots. Because the version is stored per-record rather than globally, each entry is independently versioned. There is no all-or-nothing upgrade requirement. | ||
|
|
||
| ### Reading with version awareness | ||
|
|
||
| Before decoding a storage entry, read its version marker. Use `unwrap_or(1)` to handle entries that were written before versioning was introduced — the absence of a version key is itself a signal that the entry is version 1: | ||
|
|
||
| ```rust | ||
| fn read_data(env: &Env, id: u32) -> DataV2 { | ||
| let version: u32 = env.storage().persistent() | ||
| .get(&DataKey::DataVersion(id)) | ||
| .unwrap_or(1); // default to v1 for entries without version marker | ||
|
|
||
| match version { | ||
| 1 => { | ||
| let v1: DataV1 = env.storage().persistent().get(&DataKey::Data(id)).unwrap(); | ||
| DataV2 { a: v1.a, b: v1.b, c: None } | ||
| } | ||
| _ => env.storage().persistent().get(&DataKey::Data(id)).unwrap(), | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Writing always uses the current version | ||
|
|
||
| Every write stamps the entry with the current version number. An entry that was originally `DataV1` will carry a `DataVersion` marker of `2` the next time it is written back: | ||
|
|
||
| ```rust | ||
| fn write_data(env: &Env, id: u32, data: &DataV2) { | ||
| env.storage().persistent().set(&DataKey::DataVersion(id), &2u32); | ||
| env.storage().persistent().set(&DataKey::Data(id), data); | ||
| } | ||
| ``` | ||
|
|
||
| ### Lazy vs eager migration | ||
|
|
||
| Once version-aware read/write logic is in place, there are two strategies for converting old entries. | ||
|
|
||
| #### Lazy migration (convert on read) | ||
|
|
||
| In lazy migration, old entries are left untouched on the ledger. When a record is read, its version is detected and it is up-converted in memory. When that record is later written back, it is stamped with the new version. No explicit migration step is needed — conversion happens as records are accessed in normal contract use. | ||
|
|
||
| Lazy migration is generally preferred on blockchains. Leaving old entries untouched has no upfront cost and no risk of hitting instruction or ledger-entry limits at upgrade time. Records that are never accessed again are never migrated, which is usually acceptable. | ||
|
|
||
| The `read_data` function shown above already implements lazy migration. Each time an old `DataV1` entry is read and then passed to `write_data`, the entry is silently upgraded in place. | ||
|
|
||
| #### Eager migration (batch conversion) | ||
|
|
||
| In eager migration, an explicit admin function iterates all known records and rewrites them in the new format immediately after the upgrade is deployed: | ||
|
|
||
| ```rust | ||
| pub fn migrate_all(env: &Env, ids: Vec<u32>) { | ||
| // Caller should be an authorized admin. | ||
| for id in ids.iter() { | ||
| let version: u32 = env.storage().persistent() | ||
| .get(&DataKey::DataVersion(id)) | ||
| .unwrap_or(1); | ||
|
|
||
| if version < 2 { | ||
| // read_data up-converts to DataV2 in memory. | ||
| let migrated = read_data(&env, id); | ||
| // write_data stamps the entry as version 2. | ||
| write_data(&env, id, &migrated); | ||
| } | ||
|
carstenjacobsen marked this conversation as resolved.
|
||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Eager migration is rarely practical for large datasets on Soroban. Each rewrite consumes fees and burns instructions, and a single transaction cannot migrate an unbounded number of records — the contract will hit instruction or ledger-entry limits. If the batch must span multiple transactions, the contract is in a mixed-version state throughout the window, which means version-aware read logic is still required anyway. | ||
|
|
||
| Eager migration is occasionally appropriate when the total number of records is small and known in advance (for example, a fixed registry of a few dozen entries), or when you need to permanently drop old version branches from the read path. | ||
|
|
||
| :::caution | ||
|
|
||
| Never remove a version branch from `read_data` while old entries of that version can still exist on the ledger. Doing so will cause any remaining old entries to trap when accessed. | ||
|
|
||
| ::: | ||
|
|
||
| ### Testing migrations | ||
|
|
||
| Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly. | ||
|
|
||
| The Soroban test environment allows you to set storage state directly. Use this to write `DataV1` entries (without a `DataVersion` key) and verify that `read_data` up-converts them correctly: | ||
|
|
||
| ```rust | ||
| #[cfg(test)] | ||
| use super::*; | ||
| use soroban_sdk::Env; | ||
|
|
||
| #[test] | ||
| fn test_reads_v1_entry_as_v2() { | ||
| let env = Env::default(); | ||
| let id: u32 = 42; | ||
| let contract_id = env.register(Contract, ()); | ||
| let client = ContractClient::new(&env, &contract_id); | ||
|
|
||
| // Simulate what the old contract wrote: a DataV1 payload, | ||
| // no DataVersion entry (old contracts did not write one). | ||
| let v1_data = DataV1 { a: 10, b: 20 }; | ||
| env.as_contract(&contract_id, || { | ||
| env.storage().persistent().set(&DataKey::Data(id), &v1_data); | ||
| }); | ||
|
|
||
| let result = read_data(&env, id); | ||
|
carstenjacobsen marked this conversation as resolved.
|
||
|
|
||
| assert_eq!(result.a, 10); | ||
| assert_eq!(result.b, 20); | ||
| assert_eq!(result.c, None); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_reads_v2_entry_correctly() { | ||
| let env = Env::default(); | ||
| let id: u32 = 99; | ||
| let contract_id = env.register(Contract, ()); | ||
| let client = ContractClient::new(&env, &contract_id); | ||
|
|
||
| let v2_data = DataV2 { a: 1, b: 2, c: Some(3) }; | ||
| write_data(&env, id, &v2_data); | ||
|
|
||
| let result = read_data(&env, id); | ||
|
|
||
| assert_eq!(result.a, 1); | ||
| assert_eq!(result.b, 2); | ||
| assert_eq!(result.c, Some(3)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_write_upgrades_v1_entry_to_v2() { | ||
| let env = Env::default(); | ||
| let id: u32 = 7; | ||
| let contract_id = env.register(Contract, ()); | ||
| let client = ContractClient::new(&env, &contract_id); | ||
|
|
||
| // Write a v1 entry directly, as the old contract would have. | ||
| let v1_data = DataV1 { a: 5, b: 6 }; | ||
| env.as_contract(&contract_id, || { | ||
| env.storage().persistent().set(&DataKey::Data(id), &v1_data); | ||
| }); | ||
|
|
||
| // Read it — lazy migration produces a DataV2 in memory. | ||
| let migrated = read_data(&env, id); | ||
| assert_eq!(migrated.c, None); | ||
|
|
||
| // Write it back — this stamps the entry as version 2. | ||
| write_data(&env, id, &migrated); | ||
|
|
||
| env.as_contract(&contract_id, || { | ||
| let stored_version: u32 = env.storage().persistent() | ||
| .get(&DataKey::DataVersion(id)) | ||
| .unwrap(); | ||
| }); | ||
| assert_eq!(stored_version, 2); | ||
|
|
||
| // Subsequent reads should take the v2 branch. | ||
| let result = read_data(&env, id); | ||
| assert_eq!(result.a, 5); | ||
| assert_eq!(result.b, 6); | ||
| assert_eq!(result.c, None); | ||
| } | ||
| ``` | ||
|
|
||
| The three test cases cover the three states a record can be in after an upgrade: | ||
|
|
||
| - A `DataV1` entry with no version marker (pre-versioning era records) | ||
| - A `DataV2` entry written by the new contract | ||
| - A `DataV1` entry that is read and then written back (the lazy migration round-trip) | ||
|
|
||
| ## Versioned Enum Pattern | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd lead with this approach, and then list the other approaches as alternatives, and then finally have a section of approaches you think will work but won't. |
||
|
|
||
| Another approach is to implement a versioned enum that can hold either a `V1` or `V2` data struct. | ||
|
|
||
| ```rust | ||
| #[contracttype] | ||
| pub enum Data { | ||
| V1(DataV1), | ||
| V2(DataV2), | ||
| } | ||
|
|
||
| #[contracttype] | ||
| pub enum DataKey { | ||
| Data(u64), | ||
| } | ||
| ``` | ||
|
|
||
| ### Migration Logic | ||
|
|
||
| The migration logic enumerates the two data formats and converts `V1` data to `V2` format, and passes `V2` format through. If it's already `V1`, it maps fields `a` and `b` over and sets the new `c` field to `None` (the field that was added in `V2`). If it's already `V2`, it passes through unchanged. This is a lazy migration — old data is upgraded on read, not in a bulk migration. | ||
|
|
||
| ```rust | ||
| impl Data { | ||
| pub fn into_v2(self) -> DataV2 { | ||
| match self { | ||
| Data::V1(v1) => DataV2 { a: v1.a, b: v1.b, c: None }, | ||
| Data::V2(v2) => v2, | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Reading with version awareness | ||
|
|
||
| The value is read from storage and then `into_v2()` ensures that the returned value is in the `V2` format. | ||
|
|
||
| ```rust | ||
| pub fn read_data(e: Env, id: u32) -> Option<DataV2> { | ||
| let data_enum: Data = e.storage().persistent().get(&DataKey::Data(id))?; | ||
| Some(data_enum.into_v2()) | ||
| } | ||
|
carstenjacobsen marked this conversation as resolved.
Outdated
|
||
| ``` | ||
|
|
||
| ### Writing always uses the current version | ||
|
|
||
| The write function `write_data()` takes a data argument in the `DataV2` format. | ||
|
|
||
| ```rust | ||
| pub fn write_data(e: Env, id: u32, data: DataV2) { | ||
| e.storage().persistent().set(&DataKey::Data(id), &Data::V2(data)); | ||
| } | ||
| ``` | ||
|
|
||
| ### Testing migrations | ||
|
|
||
| Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly. | ||
|
|
||
| In this test data in the `V1` format is first stored. Then it's read using the `read_data` function, which converts data in the `V1` format to V2 format with `into_v2()` before returning the result. The result is tested with `assert_eq!()`, and stored with the same `id` as it was stored with, which means the `V1` formatted data is overwritten with the same data in `V2` format. | ||
|
|
||
| Then the data is read from storage to verify it's stored in the `V2` format, and finally the data is read using the `read_data()` function to verify that the data is also returned in the `V2` format by the read function. | ||
|
|
||
| ```rust | ||
| #[test] | ||
| fn test_write_upgrades_v1_entry_to_v2_1() { | ||
| let env = Env::default(); | ||
| let id: u32 = 7; | ||
| let contract_id = env.register(Contract, ()); | ||
| let client = ContractClient::new(&env, &contract_id); | ||
|
|
||
|
carstenjacobsen marked this conversation as resolved.
Outdated
|
||
| // Inject a V1 entry directly, simulating legacy on-chain state. | ||
| env.as_contract(&contract_id, || { | ||
| env.storage() | ||
| .persistent() | ||
| .set(&DataKey::Data(id), &Data::V1(DataV1 { a: 5, b: 6 })); | ||
| }); | ||
|
|
||
| // Read it — into_v2() migrates lazily; c must be None. | ||
| let migrated = client.read_data(&id).unwrap(); | ||
| assert_eq!(migrated.a, 5); | ||
| assert_eq!(migrated.b, 6); | ||
| assert_eq!(migrated.c, None); | ||
|
|
||
| // Write it back — write_data always stores Data::V2(...). | ||
| client.write_data(&id, &migrated); | ||
|
|
||
| // Confirm the stored enum variant is now V2, not V1. | ||
| let stored: Data = env.as_contract(&contract_id, || { | ||
| env.storage().persistent().get(&DataKey::Data(id)) | ||
| }) | ||
| .unwrap(); | ||
|
|
||
| match stored { | ||
| Data::V2(v2) => { | ||
| assert_eq!(v2.a, 5); | ||
| assert_eq!(v2.b, 6); | ||
| assert_eq!(v2.c, None); | ||
| } | ||
| Data::V1(_) => panic!("expected Data::V2 after write_data, found Data::V1"), | ||
| } | ||
|
|
||
| // Subsequent reads go through the V2 branch and return identical values. | ||
| let result = client.read_data(&id).unwrap(); | ||
| assert_eq!(result.a, 5); | ||
| assert_eq!(result.b, 6); | ||
| assert_eq!(result.c, None); | ||
| } | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.