Skip to content

Commit c1447c3

Browse files
Add how-to guide for storage data migration (#2228) (#2299)
* Adding migration guide to storage * Updated test cases * Added page to routes.txt * re-arrange content on the page. add a warning before anti-patterns --------- Co-authored-by: Elliot <elliot@stellar.org>
1 parent 0402727 commit c1447c3

2 files changed

Lines changed: 352 additions & 0 deletions

File tree

Lines changed: 351 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,351 @@
1+
---
2+
title: Migrate contract storage data when upgrading data structures
3+
hide_table_of_contents: true
4+
description: Use the version marker pattern to safely read and migrate stored data when a contract upgrade changes a data structure
5+
---
6+
7+
When a contract is upgraded and a stored data structure gains new fields, the data already written to the ledger still uses the old layout. Naively reading those old entries with the new type causes the host to trap. This guide introduces the version marker pattern as the correct solution, covers lazy versus eager migration strategies and how to test them, and explains why the "intuitive" approach fails.
8+
9+
## Versioned Enum Pattern
10+
11+
Suppose a contract stores `DataV1` entries and is upgraded to use `DataV2`, which adds an optional field `c`:
12+
13+
```rust
14+
#[contracttype]
15+
pub struct DataV1 { a: i64, b: i64 }
16+
17+
#[contracttype]
18+
pub struct DataV2 { a: i64, b: i64, c: Option<i64> }
19+
```
20+
21+
The recommended approach in this circumstance is to implement a versioned enum that can hold either a `V1` or `V2` data struct.
22+
23+
```rust
24+
#[contracttype]
25+
pub enum Data {
26+
V1(DataV1),
27+
V2(DataV2),
28+
}
29+
30+
#[contracttype]
31+
pub enum DataKey {
32+
Data(u64),
33+
}
34+
```
35+
36+
### Migration Logic
37+
38+
The migration logic enumerates the two data formats and converts `V1` data to `V2` format, and passes `V2` format through. If it's already `V1`, it maps fields `a` and `b` over and sets the new `c` field to `None` (the field that was added in `V2`). If it's already `V2`, it passes through unchanged. This is a lazy migration - old data is upgraded on read, not in a bulk migration.
39+
40+
```rust
41+
impl Data {
42+
pub fn into_v2(self) -> DataV2 {
43+
match self {
44+
Data::V1(v1) => DataV2 { a: v1.a, b: v1.b, c: None },
45+
Data::V2(v2) => v2,
46+
}
47+
}
48+
}
49+
```
50+
51+
### Reading with version awareness
52+
53+
The value is read from storage and then `into_v2()` ensures that the returned value is in the `V2` format.
54+
55+
```rust
56+
pub fn read_data(e: Env, id: u32) -> Option<DataV2> {
57+
let data_enum: Data = e.storage().persistent().get(&DataKey::Data(id))?;
58+
Some(data_enum.into_v2())
59+
}
60+
```
61+
62+
### Writing always uses the current version
63+
64+
The write function `write_data()` takes a data argument in the `DataV2` format.
65+
66+
```rust
67+
pub fn write_data(e: Env, id: u32, data: DataV2) {
68+
e.storage().persistent().set(&DataKey::Data(id), &Data::V2(data));
69+
}
70+
```
71+
72+
### Testing migrations
73+
74+
Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly.
75+
76+
In this test data in the `V1` format is first stored. Then it's read using the `read_data` function, which converts data in the `V1` format to V2 format with `into_v2()` before returning the result. The result is tested with `assert_eq!()`, and stored with the same `id` as it was stored with, which means the `V1` formatted data is overwritten with the same data in `V2` format.
77+
78+
Then the data is read from storage to verify it's stored in the `V2` format, and finally the data is read using the `read_data()` function to verify that the data is also returned in the `V2` format by the read function.
79+
80+
```rust
81+
#[test]
82+
fn test_write_upgrades_v1_entry_to_v2_1() {
83+
let env = Env::default();
84+
let id: u32 = 7;
85+
let contract_id = env.register(Contract, ());
86+
let client = ContractClient::new(&env, &contract_id);
87+
88+
// Inject a V1 entry directly, simulating legacy on-chain state.
89+
env.as_contract(&contract_id, || {
90+
env.storage()
91+
.persistent()
92+
.set(&DataKey::Data(id), &Data::V1(DataV1 { a: 5, b: 6 }));
93+
});
94+
95+
// Read it - into_v2() migrates lazily; c must be None.
96+
let migrated = client.read_data(&id).unwrap();
97+
assert_eq!(migrated.a, 5);
98+
assert_eq!(migrated.b, 6);
99+
assert_eq!(migrated.c, None);
100+
101+
// Write it back - write_data always stores Data::V2(...).
102+
client.write_data(&id, &migrated);
103+
104+
// Confirm the stored enum variant is now V2, not V1.
105+
let stored: Data = env.as_contract(&contract_id, || {
106+
env.storage().persistent().get(&DataKey::Data(id))
107+
})
108+
.unwrap();
109+
110+
match stored {
111+
Data::V2(v2) => {
112+
assert_eq!(v2.a, 5);
113+
assert_eq!(v2.b, 6);
114+
assert_eq!(v2.c, None);
115+
}
116+
Data::V1(_) => panic!("expected Data::V2 after write_data, found Data::V1"),
117+
}
118+
119+
// Subsequent reads go through the V2 branch and return identical values.
120+
let result = client.read_data(&id).unwrap();
121+
assert_eq!(result.a, 5);
122+
assert_eq!(result.b, 6);
123+
assert_eq!(result.c, None);
124+
}
125+
```
126+
127+
## Version Marker Pattern
128+
129+
An alternative solution is to store a version number alongside each data entry, keyed by the same identifier. The contract reads the version first, then branches on the result to decode the payload with the correct type.
130+
131+
### Key layout
132+
133+
Define two variants in your key enum - one for the version marker and one for the payload - both keyed by the same `id`:
134+
135+
```rust
136+
#[contracttype]
137+
pub enum DataKey {
138+
DataVersion(u32), // version marker, keyed by id
139+
Data(u32), // data, keyed by the same id
140+
}
141+
```
142+
143+
Each logical record occupies two storage slots. Because the version is stored per-record rather than globally, each entry is independently versioned. There is no all-or-nothing upgrade requirement.
144+
145+
### Reading with version awareness
146+
147+
Before decoding a storage entry, read its version marker. Use `unwrap_or(1)` to handle entries that were written before versioning was introduced. The absence of a version key is itself a signal that the entry is version 1:
148+
149+
```rust
150+
fn read_data(env: &Env, id: u32) -> DataV2 {
151+
let version: u32 = env.storage().persistent()
152+
.get(&DataKey::DataVersion(id))
153+
.unwrap_or(1); // default to v1 for entries without version marker
154+
155+
match version {
156+
1 => {
157+
let v1: DataV1 = env.storage().persistent().get(&DataKey::Data(id)).unwrap();
158+
DataV2 { a: v1.a, b: v1.b, c: None }
159+
}
160+
_ => env.storage().persistent().get(&DataKey::Data(id)).unwrap(),
161+
}
162+
}
163+
```
164+
165+
### Writing always uses the current version
166+
167+
Every write stamps the entry with the current version number. An entry that was originally `DataV1` will carry a `DataVersion` marker of `2` the next time it is written back:
168+
169+
```rust
170+
fn write_data(env: &Env, id: u32, data: &DataV2) {
171+
env.storage().persistent().set(&DataKey::DataVersion(id), &2u32);
172+
env.storage().persistent().set(&DataKey::Data(id), data);
173+
}
174+
```
175+
176+
### Lazy vs eager migration
177+
178+
Once version-aware read/write logic is in place, there are two strategies for converting old entries.
179+
180+
#### Lazy migration (convert on read)
181+
182+
In lazy migration, old entries are left untouched on the ledger. When a record is read, its version is detected and it is up-converted in memory. When that record is later written back, it is stamped with the new version. No explicit migration step is needed - conversion happens as records are accessed in normal contract use.
183+
184+
Lazy migration is generally preferred on blockchains. Leaving old entries untouched has no upfront cost and no risk of hitting instruction or ledger-entry limits at upgrade time. Records that are never accessed again are never migrated, which is usually acceptable.
185+
186+
The `read_data` function shown above already implements lazy migration. Each time an old `DataV1` entry is read and then passed to `write_data`, the entry is silently upgraded in place.
187+
188+
#### Eager migration (batch conversion)
189+
190+
In eager migration, an explicit admin function iterates all known records and rewrites them in the new format immediately after the upgrade is deployed:
191+
192+
```rust
193+
pub fn migrate_all(env: &Env, ids: Vec<u32>) {
194+
// Caller should be an authorized admin.
195+
for id in ids.iter() {
196+
let version: u32 = env.storage().persistent()
197+
.get(&DataKey::DataVersion(id))
198+
.unwrap_or(1);
199+
200+
if version < 2 {
201+
// read_data up-converts to DataV2 in memory.
202+
let migrated = read_data(&env, id);
203+
// write_data stamps the entry as version 2.
204+
write_data(&env, id, &migrated);
205+
}
206+
}
207+
}
208+
```
209+
210+
Eager migration is rarely practical for large datasets on Soroban. Each rewrite consumes fees and burns instructions, and a single transaction cannot migrate an unbounded number of records - the contract will hit instruction or ledger-entry limits. If the batch must span multiple transactions, the contract is in a mixed-version state throughout the window, which means version-aware read logic is still required anyway.
211+
212+
Eager migration is occasionally appropriate when the total number of records is small and known in advance (for example, a fixed registry of a few dozen entries), or when you need to permanently drop old version branches from the read path.
213+
214+
:::caution
215+
216+
Never remove a version branch from `read_data` while old entries of that version can still exist on the ledger. Doing so will cause any remaining old entries to trap when accessed.
217+
218+
:::
219+
220+
### Testing migrations
221+
222+
Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly.
223+
224+
The Soroban test environment allows you to set storage state directly. Use this to write `DataV1` entries (without a `DataVersion` key) and verify that `read_data` up-converts them correctly:
225+
226+
```rust
227+
#[cfg(test)]
228+
use super::*;
229+
use soroban_sdk::Env;
230+
231+
#[test]
232+
fn test_reads_v1_entry_as_v2() {
233+
let env = Env::default();
234+
let id: u32 = 42;
235+
let contract_id = env.register(Contract, ());
236+
let client = ContractClient::new(&env, &contract_id);
237+
238+
// Simulate what the old contract wrote: a DataV1 payload,
239+
// no DataVersion entry (old contracts did not write one).
240+
let v1_data = DataV1 { a: 10, b: 20 };
241+
env.as_contract(&contract_id, || {
242+
env.storage().persistent().set(&DataKey::Data(id), &v1_data);
243+
});
244+
245+
let result = read_data(&env, id);
246+
247+
assert_eq!(result.a, 10);
248+
assert_eq!(result.b, 20);
249+
assert_eq!(result.c, None);
250+
}
251+
252+
#[test]
253+
fn test_reads_v2_entry_correctly() {
254+
let env = Env::default();
255+
let id: u32 = 99;
256+
let contract_id = env.register(Contract, ());
257+
let client = ContractClient::new(&env, &contract_id);
258+
259+
let v2_data = DataV2 { a: 1, b: 2, c: Some(3) };
260+
write_data(&env, id, &v2_data);
261+
262+
let result = read_data(&env, id);
263+
264+
assert_eq!(result.a, 1);
265+
assert_eq!(result.b, 2);
266+
assert_eq!(result.c, Some(3));
267+
}
268+
269+
#[test]
270+
fn test_write_upgrades_v1_entry_to_v2() {
271+
let env = Env::default();
272+
let id: u32 = 7;
273+
let contract_id = env.register(Contract, ());
274+
let client = ContractClient::new(&env, &contract_id);
275+
276+
// Write a v1 entry directly, as the old contract would have.
277+
let v1_data = DataV1 { a: 5, b: 6 };
278+
env.as_contract(&contract_id, || {
279+
env.storage().persistent().set(&DataKey::Data(id), &v1_data);
280+
});
281+
282+
// Read it - lazy migration produces a DataV2 in memory.
283+
let migrated = read_data(&env, id);
284+
assert_eq!(migrated.c, None);
285+
286+
// Write it back - this stamps the entry as version 2.
287+
write_data(&env, id, &migrated);
288+
289+
env.as_contract(&contract_id, || {
290+
let stored_version: u32 = env.storage().persistent()
291+
.get(&DataKey::DataVersion(id))
292+
.unwrap();
293+
});
294+
assert_eq!(stored_version, 2);
295+
296+
// Subsequent reads should take the v2 branch.
297+
let result = read_data(&env, id);
298+
assert_eq!(result.a, 5);
299+
assert_eq!(result.b, 6);
300+
assert_eq!(result.c, None);
301+
}
302+
```
303+
304+
The three test cases cover the three states a record can be in after an upgrade:
305+
306+
- A `DataV1` entry with no version marker (pre-versioning era records)
307+
- A `DataV2` entry written by the new contract
308+
- A `DataV1` entry that is read and then written back (the lazy migration round-trip)
309+
310+
## Why intuitive approaches fail
311+
312+
The techniques presented here may not immediately seem necessary. The "apparent" obvious solutions may be to programmatically handle the discrepancies in data types, rather than modify any of the underlying data structures, or adjust how the storage entries are read or written.
313+
314+
:::warning
315+
316+
We've outlined a couple of the more "obvious" approaches to this problem, to illustrate _why_ these anti-patterns are not ideal. Please do not use the following code snippets as examples to be emulated. Rather, read the context of them, and learn why to avoid them.
317+
318+
:::
319+
320+
### Approach 1: Read old entries directly with the new type
321+
322+
You may think the most natural approach is to read the stored bytes directly as `DataV2` and expect `c` to default to `None`:
323+
324+
```rust
325+
let key = DataKey::DataV2(1u32);
326+
// Reading a DataV1 entry with the DataV2 type.
327+
// A developer might expect c = None for old entries - but this traps.
328+
let data: DataV2 = env.storage().persistent().get(&key).unwrap();
329+
// Error(Object, UnexpectedSize)
330+
```
331+
332+
This traps with `Error(Object, UnexpectedSize)`. The Soroban host validates the field count of the XDR-encoded value against the type definition before returning anything to the contract. Because `DataV1` has two fields and `DataV2` has three, the host rejects the entry before the SDK can handle it.
333+
334+
### Approach 2: Use `try_from_val` as a fallback
335+
336+
Another approach is to use `try_from_val` expecting to catch a deserialization error and recover:
337+
338+
```rust
339+
let raw: Val = env.storage().persistent().get(&key).unwrap();
340+
if let Ok(v2) = DataV2::try_from_val(&env, &raw) {
341+
v2
342+
} else {
343+
// This branch is never reached - the host traps before returning Err.
344+
let v1 = DataV1::try_from_val(&env, &raw).unwrap();
345+
DataV2 { a: v1.a, b: v1.b, c: None }
346+
}
347+
```
348+
349+
This also traps at the host level. The field count validation happens in the host environment during deserialization - it does not produce a Rust `Err` that the SDK can intercept. There is no way to catch or recover from the mismatch at the contract level.
350+
351+
The root issue is that a contract cannot determine which type an existing storage entry was written as just by reading it. That information must be stored explicitly.

routes.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@
108108
/docs/build/guides/rpc/retrieve-contract-code-python
109109
/docs/build/guides/storage
110110
/docs/build/guides/storage/choosing-the-right-storage
111+
/docs/build/guides/storage/migrate-contract-storage
111112
/docs/build/guides/storage/use-instance
112113
/docs/build/guides/storage/use-persistent
113114
/docs/build/guides/storage/use-temporary

0 commit comments

Comments
 (0)