Commit 8fe47c7
authored
[Rhi] [bug] Fix the Unified Allocator to no longer return first two allocations as dupes (#8705)
Issue: #
### Brief Summary
The unified allocator always allocated the first two allocations as dupe
memory addresses, which always clobbered each other.
copilot:summary
# New walkthrough
Looking at the Unified allocator code, we can see that when we first
allocate a memory chunk, we do not add `size` to `head`, and thus the
next allocation will receive the exact same address too. Thus, there
will be two structs or similar, in memory, which clobber each other,
leading to plausibly a plethora of hard-to-debug crashes.
## High level overview of how allocator works
The allocator can work with two types of request:
- `exclusive`
- `not exclusive`
For exclusive requests:
- a buffer is allocated from the system:
-
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/rhi/common/unified_allocator.cpp#L74-L75
- the size of the buffer matches the requested bytes (to within
alignment bytes)
- a new chunk is created
- `chunk.data` is set to the start of this buffer
- `chunk.head` is too, but it's not really used for exclusive access
- `chunk.tail` is set to the end of the buffer, but again not really
used for exclusive access
Exclusive access requests are thus fairly straightforward
For non-exclusive, it is slightly more complex
- for the first request,we allocate a much larger buffer than the
request
- by default 1GB
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/rhi/common/unified_allocator.cpp#L9-L10
- we create a new chunk
- `chunk.data` is set to the start of this buffer
- `chunk.head` is set to the start of unused space in this buffer
- it should be set to `chunk.data + size`
- prior to this PR, it is incorrectly being set to point to `chunk.data`
though
- meaning that the next request will incorrectly return the start of
this chunk, again
- then we return `chunk.head`
- for subsequent requests, we look for a chunk that has available space
(head - tail <= requested size)
- when we find such a chunk:
- we add `size` to `head` (to within alignment)
- we return the old `head` (to within alignment)
## Proposed fix
The proposed fix is to set `head` to `data + size` for newly allocated
chunks
- thinking about it, an alternative fix is to split the function into
two parts:
- first part searches for an existing chunk, or makes a new one
- does not return the allocated address
- does not update head etc
- second part is always executed
- updates head
- returns old head
I don't really have a strong opinion on which fix we prefer. The second
approach seems mildly cleaner perhaps, since decouples 'finding/creating
a chunk' from 'updating the chunk and returning the requested memory
pointer'.
## Low level details
In more details, and assuming non exclusive mode:
- let's say client requests `size` bytes
- we allocate a chunk much larger than that, `default_allocator_size`
bytes
-
https://github.com/hughperkins/taichi/blob/0c41277f7b4a597247ea23760336dcba5c7f7efc/taichi/rhi/common/unified_allocator.cpp#L65-L75
- the address of this chunk is stored in `ptr`
- we create a `chunk` structure to store information about the chunk we
just allocated
-
https://github.com/hughperkins/taichi/blob/0c41277f7b4a597247ea23760336dcba5c7f7efc/taichi/rhi/common/unified_allocator.cpp#L63
-
https://github.com/hughperkins/taichi/blob/0c41277f7b4a597247ea23760336dcba5c7f7efc/taichi/rhi/common/unified_allocator.cpp#L76-L79
- ptr is stored in chunk.data
- head is set to ptr too, via chunk.data
- tail is set to ptr + allocation size, via chunk.data
- we return ptr
-
https://github.com/hughperkins/taichi/blob/0c41277f7b4a597247ea23760336dcba5c7f7efc/taichi/rhi/common/unified_allocator.cpp#L85
- we should have added allocation_size to chunk.head
- we can look at what happens when we re-use this chunk later, to
confirm this:
When we re-use a chunk:
- we loop over all allocated chunks, looking for non-exclusive chunks
-
https://github.com/hughperkins/taichi/blob/0c41277f7b4a597247ea23760336dcba5c7f7efc/taichi/rhi/common/unified_allocator.cpp#L39-L45
- we add allocation size to head, adjusting for alignment, store that in
ret, and check if ret is less than tail
-
https://github.com/hughperkins/taichi/blob/0c41277f7b4a597247ea23760336dcba5c7f7efc/taichi/rhi/common/unified_allocator.cpp#L49-L53
- if ret is less or equal to tail, then we:
- update head to be equal to ret (so, we've updated it to be old head +
allocation_size, adjusted for alignment)
- return ret
- (and break out of the loop, by virtue of the return)
- otherwise, we ignore, and keep looping over available chunks
- (if no suitable chunks found, then we will allocate a brand new chunk)
# Original Walkthrough
High level summary:
===============
- both the LLVMRuntime and the result_buffer are allocated to the same
memory address
- this results in the return error code from running a kernel
overwriting the address of snode tree 19
- this results in subsequent access to any field having snode tree 19
crashing Taichi
Reproducing the bug
===================
This bug was initially reproduced in
#8569 , but knowing what the
bug is, we can reproduce it using the following much simpler code:
```
import taichi as ti
ti.init(arch=ti.arm64, debug=True)
fields = []
for i in range(20):
fields.append(ti.field(float, shape=()))
ti.sync()
@ti.kernel
def foo():
fields[19][None] = 1
foo()
foo()
```
What this code does:
- allocates snode trees 0 through 19, by creating fields indexed 0
through 19, and immediately calling ti.sync, to materialize the snode
tree
- you can optionally print out the snode tree ids as long as you have a
version of master that includes the PR at
#8697, to verify this assertion
- following the creation of snode trees 0 through 19, we call a kernel
twice
- the first kernel runs without issue
- however, the address of snode tree 19 will be set to 0, following this
kernel call, because it is overwritten by the return code of this call
- when we run the second kernel call, using the address of snode tree 19
- which is now set to 0 - to access values from snode tree 19, causes a
segmentation fault:
[E 04/30/25 19:00:30.022 3136495] Received signal 11 (Segmentation
fault: 11)
Detailed walkthrough
====================
1. LLVMRuntime and result_buffer are allocated the same memory address
- When we first initialize the LLVMRuntime, we:
- allocate a result_buffer from the unified allocator, via the host
allocator
- result_buffer allocated here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/llvm_runtime_executor.cpp#L699-L700
- call runtime_initialize
- here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/llvm_runtime_executor.cpp#L706-L711
- passing in the result_buffer
- and the host allocator
- inside runtime_initialize, we:
- allocate the LLVMRuntime, using the same allocator
- here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/runtime_module/runtime.cpp#L932-L933
- interestingly, the address allocated for the LLVMRuntime memory is
identical to the address of the result_buffer memory
- verifiable by printing out the two addresses. Over multiple runs, they
consistently have the same address as each other (though the exact
addresses vary between runs)
- these are both allocated from the exact same allocator
- if you print out the address of the allocator in each location, they
are identical
- and no deallocations take place between the allocations
- so, how is this possible?
- looking at the unified allocator, there is a concept of 'exclusive'
- here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/rhi/common/unified_allocator.cpp#L32
- if a request for memory is not marked as exclusive, previously
allocated buffers can be re-used, and allocated to new requests
- here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/rhi/common/unified_allocator.cpp#L57
- the default is exclusive = false
- here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/rhi/common/unified_allocator.h#L31
- therefore, by default, memory chunks allocated can be
re-used/returned/allocated across multiple requests
Let's first walk through the effects of LLVMRuntime and result_buffer
occupying the same space.
2. The return code of a kernel overwrites snode tree address 19
- following a kernel launch, the method
runtime_retrieve_and_reset_error_code is run on runtime.cpp
- here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/runtime_module/runtime.cpp#L727-L730
- this method calls `runtime->set_result(taichi_result_buffer_error_id,
runtime->error_code);`
- the first parameter is a constant
- defined here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/inc/constants.h#L21
- `constexpr std::size_t taichi_max_num_ret_value = 30;`
- `set_result`:
- is here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/runtime_module/runtime.cpp#L600-L604
- sets result_buffer[i] to t
- here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/runtime_module/runtime.cpp#L602
- in this case, i is taichi_max_num_ret_value
- which is 30
- t is the return code
- empirically this has a value of 0, in the test cases described above
- i is used to index onto an array of i64
- here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/runtime_module/runtime.cpp#L602
- therefore each element of the array has 8 bytes
- and therefore to get the address of the element which will be set to
0, we should multiply the index, which is 30, by 8
- thus, we will zero out 8 bytes at byte offset 30 * 8 = 240
- the base address for this offset is result_buffer
- however, result_buffer has the same address as LLVMRuntime
- (as discussed in the first section)
- so we are going to clobber 8 bytes in LLVMRuntime with zeros, at
offset 240
- let's now look at where byte offset 240 is in LLVMRuntime
- LLVMRuntime struct:
- is here
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/runtime_module/runtime.cpp#L552-L562
- there are two PreallocatedMemoryChunks, each of which contains two
pointers and a size_t
-
https://github.com/taichi-dev/taichi/blob/562e05fa70ba196b200f03e4006c51bd0962341b/taichi/runtime/llvm/runtime_module/runtime.cpp#L546-L550
- each pointer is 8 bytes, and size_t likely 8 bytes, for 24 bytes each
- 48 bytes total
- host_allocator_type is a pointer to function -> 8 more bytes
- assert_failed_type, host_printf_type, host_vsnprintf_type, and Ptr are
also all pointers, so 8 bytes each, for a total for them of 32 bytes
- now we arrive at roots, which is the snode tree roots address array
- at this point, we are at an offset of 48 + 8 + 32 = 88
- so our offset into roots will be 240 - 88 = 152
- each element of roots is also a pointer
- so size 8 bytes
- 152 bytes / 8 bytes = 19
- thus when we write the return code of 0 to result_buffer[30], we
clobber the address of tree snode 19 with 0
3. kernel access of tree snode 19
- when a kernel is initialized, and that kernel uses a field that is
allocated on snode tree 19:
- the lowered kernel calls `%10 = call ptr @LLVMRuntime_get_roots(ptr
%9, i32 19` (exact statement index varies depends on the kernel of
course)
- this will return 0
- then when we access a field based on offset 0, we crash.
~Proposed fix~
============
~We need to ensure that the allocator does not allocate the same memory
block twice, both to the LLVMRuntime and to the result_buffer~
- ~my proposed fix is to expose the `exclusive` parameter to the
LLVMRuntime~
- ~and to set this parameter to `true`, when used from the runtime~
~Questions~
=========
~A question in my mind is, why we would ever want exclusive to not be
true. And by default, it is in fact set to false. I feel like there is
some knowledge or insight that is missing to me.~
copilot:walkthrough1 parent 3b9bdd8 commit 8fe47c7
4 files changed
Lines changed: 44 additions & 5 deletions
File tree
- cmake
- taichi/rhi/common
- tests/cpp/rhi/common
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
39 | 38 | | |
40 | 39 | | |
41 | 40 | | |
| |||
49 | 48 | | |
50 | 49 | | |
51 | 50 | | |
52 | | - | |
| 51 | + | |
53 | 52 | | |
54 | 53 | | |
55 | 54 | | |
| |||
62 | 61 | | |
63 | 62 | | |
64 | 63 | | |
| 64 | + | |
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
| 77 | + | |
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
0 commit comments