|
| 1 | +# Memory usage |
| 2 | + |
| 3 | +## Device memory size |
| 4 | + |
| 5 | +Judging by `.ld` files in [the Ledger Rust SDK](https://github.com/LedgerHQ/ledger-device-rust-sdk/tree/cad196841dbd72c037cfa01bec81a4a3ae57a04e/ledger_secure_sdk_sys/devices), |
| 6 | +the amount of SRAM each model has is: |
| 7 | +| Device | SRAM | |
| 8 | +| ----------------- | ---- | |
| 9 | +| apex_p, nanosplus | 40KB | |
| 10 | +| flex, stax | 36KB | |
| 11 | +| nanox | 28KB | |
| 12 | + |
| 13 | +The first part of the RAM will be occupied by the app's globals (one of which will be the heap used by the Rust code) and the rest is stack. |
| 14 | + |
| 15 | +The `HEAP_SIZE` variable in `.cargo/config.toml` specifies the size of the Rust heap (which is just [a static array under the hood](https://github.com/LedgerHQ/ledger-device-rust-sdk/blob/cad196841dbd72c037cfa01bec81a4a3ae57a04e/ledger_secure_sdk_sys/src/lib.rs#L64)). |
| 16 | + |
| 17 | +I.e. the bigger `HEAP_SIZE` is, the less is the stack. And with the `HEAP_SIZE` of 16KB, we'll have less than 12KB of stack at nanox. |
| 18 | + |
| 19 | +## Reducing app stack usage |
| 20 | + |
| 21 | +- Function's parameters and the return value consume stack space (unless the object is small enough to be put into a register). |
| 22 | +- Moving an object around inside the function body may increase stack consumption as well. |
| 23 | + |
| 24 | +So, |
| 25 | +- Box large types if you need to pass/return them by value. |
| 26 | +- Avoid unboxing boxed large objects when passing them by value. E.g. even if a function only needs `LargeObj`, |
| 27 | + pass `Box<LargeObj>` to it anyway (which would be discouraged by the "normal" best practices), because passing it |
| 28 | + unboxed would increase the stack usage.\ |
| 29 | + This includes the case when a member function consumes `self` - declare it as `self: Box<Self>` instead. |
| 30 | +- `sizeof` of 200 bytes is probably large enough. E.g. in the past boxing certain objects of roughly this size |
| 31 | + decreased stack usage by roughly 1.3KB (which is more than 10% of all stack space available on nanox). |
| 32 | + |
| 33 | +### Determining the current stack usage of the app |
| 34 | + |
| 35 | +Build the app with `emit-stack-sizes`: |
| 36 | +``` |
| 37 | +RUSTFLAGS="-Z emit-stack-sizes" cargo ledger build nanox |
| 38 | +``` |
| 39 | +After that you can use `llvm-readobj` to obtain sizes of stack frames of each function: |
| 40 | +``` |
| 41 | +llvm-readobj --stack-sizes --demangle target/nanox/release/mintlayer-app |
| 42 | +``` |
| 43 | + |
| 44 | +You can also force `llvm-readobj` to emit json and use `jq` to sort the output by the stack size. E.g. the following |
| 45 | +will print 20 functions with the biggest stack frame size: |
| 46 | +``` |
| 47 | +llvm-readobj --stack-sizes --demangle --elf-output-style=JSON target/nanox/release/mintlayer-app | jq -r '.[].StackSizes | sort_by(.Entry.Size) | reverse | .[:20][] | .Entry | "\(.Size)\t\(.Functions | join(", "))"' |
| 48 | +``` |
| 49 | + |
| 50 | +### Determining the actual available stack |
| 51 | + |
| 52 | +At least in the current version of the SDK, the linker script emits symbols that |
| 53 | +can be used to determine the actual stack size, e.g. via `llvm-readelf`: |
| 54 | +``` |
| 55 | +llvm-readelf -s target/nanox/release/mintlayer-app | rg '_stack|_estack' |
| 56 | +``` |
| 57 | +Example output: |
| 58 | +``` |
| 59 | +1581: da7a425c 0 NOTYPE GLOBAL DEFAULT 6 app_stack_canary |
| 60 | +1624: da7a7000 0 NOTYPE GLOBAL DEFAULT 6 _estack |
| 61 | +1697: da7a4260 0 NOTYPE GLOBAL DEFAULT 6 _stack |
| 62 | +``` |
| 63 | +Here `_estack` is the end of the stack area, `_stack` is the beginning of it and `app_stack_canary` is a 4-byte marker |
| 64 | +placed just below `_stack` and used to detect stack overflows. The difference between `_estack` and `_stack` will be |
| 65 | +the stack size, in this case it's da7a7000-da7a4260=2DA0 (11680 in decimal). |
| 66 | + |
| 67 | +### Other notes |
| 68 | + |
| 69 | +This code: |
| 70 | +``` |
| 71 | +fn foo(x: &X) { |
| 72 | + match x { |
| 73 | + X::A => { /*do stuff*/ }, |
| 74 | + X::B => { /*do other stuff*/ }, |
| 75 | + } |
| 76 | +} |
| 77 | +``` |
| 78 | +may use more stack than: |
| 79 | +``` |
| 80 | +fn foo(x: &X) { |
| 81 | + match x { |
| 82 | + X::A => stuff(), |
| 83 | + X::B => other_stuff(), |
| 84 | + } |
| 85 | +} |
| 86 | +
|
| 87 | +#[inline(never)] fn stuff() { /*do stuff*/ } |
| 88 | +#[inline(never)] fn other_stuff() { /*do other stuff*/ } |
| 89 | +``` |
| 90 | +I.e. it seems that LLVM cannot always reuse stack slots between different branches of the `match`, and with bigger enums |
| 91 | +and bigger stack usage in each branch the overhead becomes bigger as well. So, splitting a large `match` into separate |
| 92 | +non-inlinable functions may be a way of reducing the app's stack usage, but this should probably be the last resort, |
| 93 | +because if all large objects are boxed, the stack usage in each branch should be relatively small, which will make |
| 94 | +the overhead relatively small as well. |
0 commit comments