Add Escape Analysis#551
Conversation
pschachte
left a comment
There was a problem hiding this comment.
This looks great, and I really appreciate your documentation here. Makes the code much easier to review.
I've placed a few requests for documentation clarification inline, and one suggested improvement.
The other thing for you to fix is all the failing test cases. Before resubmitting, please run the update-exp script and scroll carefully through the differences in each file, making sure that they are all as expected (they're mostly calls to foreign lpvm alloc that have had a {stack} flag added, and the consequent changes to LLVM code). I.e., make sure that the all those stack allocations are safe. For each test case, if so, say yes, and it will replace the .exp file with the new output. Then add all the changed .exp files to the pull request branch. If not, you'll need to fix the bugs.
You should also add your new test case to the test-cases/final-dump directory and run update-exp; it will show you the output and prompt you to accept or reject it. Then also add the source and .exp files to the branch. Finally, please add a couple more tests to that test case, one that creates a fresh structure, passes it to another proc, which passes it back, and see if the escape analysis still sees that it doesn't escape (it's OK if it doesn't, but we should know), and one that does something similar, but the structure really does escape, so we can see that it doesn't optimise cases it shouldn't.
| -- argument cannot be a tail call (the callee would dereference | ||
| -- a pointer into a stack frame that tail-call would have freed). | ||
| -- LLVM enforces this automatically; sets doesAlloca accordingly. | ||
| Just sizeVal -> do |
There was a problem hiding this comment.
Does LLVM's alloca only work for fixed size allocations? If so, please say that here in a comment. If not, please add a comment saying this is a temporary restriction. Unless you can generalise it so it works for variable sized allocations.
There was a problem hiding this comment.
alloca also works for variable sized allocations from my research, but I'm not super familiar with LLVM in general so I chose to take a safer approach. I'll comment this 👍 Thank you
There was a problem hiding this comment.
I don't see why we can't support variably sized stack allocations :)
There was a problem hiding this comment.
Hello, sorry for the late reply. Been a bit busy with proposal and presentations. But I looked into variable sized alloca and I think it isn't worth the effort. Mainly for two reasons:
- Variable-sized alloca forces LLVM to keep a frame pointer because the stack frame size can't be computed statically. Wybe actively uses TCO (TailCallModCons is an optimization flag). These two features are fundamentally incompatible.
malloccan return null; a stack overflow from a large VLA just segfaults or corrupts the stack silently. For objects that are large or runtime-sized, heap allocation is the safer default.
There was a problem hiding this comment.
I think we should support it in LLVM generation, but never generate it in the escape analysis. There are cases where we alloc a variable length array, and if the user wanted to supply their own stack flag I don't see why we shouldn't support that
There was a problem hiding this comment.
Since tail recursion optimisation only works for certain calling conventions, and from what you've told me, you can't use a calling convention that supports TRO together with variable-sized alloca, I think we should call this Not Yet Implemented for now (there's a nyi function to report such missing features). If we get around to doing our own TRO during LLVM generation, we should revisit this.
| isAllocOrMutate (PrimForeign "lpvm" "mutate" _ _) = True | ||
| isAllocOrMutate _ = False | ||
|
|
||
| -- | Compute the set of variables whose allocated memory must survive the |
There was a problem hiding this comment.
I think you mean "may" rather than "must". This is an important distinction: this optimisation can only be done if it's impossible for the allocation to survive the proc body it appears in. You have to be conservative in approximating this (and you have to approximate; you can't always get it exactly right).
There was a problem hiding this comment.
Oh, I think I understand what you mean. OK, it's difficult to express this clearly, but I think it's clearer to write of what will happen when the program runs, so think of a value as escaping a proc body if it is possible that the value could be referred to after the proc returns.
| prims = List.map content $ collectAllBodyPrims body | ||
| -- For mutate(fIn, fOut, offset, destr, size, startOff, member, ...), | ||
| -- if fOut escapes then: | ||
| -- (a) fIn escapes (same struct, just a new version) |
There was a problem hiding this comment.
fIn might not escape in this case, if the destructive flag is not set in the mutate. In that case, the mutate instruction makes a fresh copy, so fOut escaping doesn't imply that fIn does, unless the structure contains pointers.
If your analysis/transformation is happening before the destructive update transformation, it would be useful to document that this limits the stack allocation transformation here. Otherwise, I think you should be able to do better here.
|
I also had the thought of adding a flag to disable/enable escape analysis. Is this a good idea or should we just leave it as always enabled? |
|
Adding a command line flag is a good idea. It is stack allocation, not the escape analysis, that would be relevant to programmers, so I would name the switch in terms of stack allocation rather than escape analysis. So far, we optimise by default, so I think the switch should disable automatic stack allocation. |
|
I added the flag for |
| -- Side-effect: any call that receives this address as an i64 | ||
| -- argument cannot be a tail call (the callee would dereference | ||
| -- a pointer into a stack frame that tail-call would have freed). | ||
| -- LLVM enforces this automatically; sets doesAlloca accordingly. |
There was a problem hiding this comment.
We should be able to do a tail call if the stack allocated space never enters the tail call.
There was a problem hiding this comment.
Something like
def foo use !io {
?p = position(1, 2)
!print(p)
!foo
}
Should be able to stack allocate p.
I'm thinking this case could be fairly common, say if something is used for an intermediate computation before a tail call.
There was a problem hiding this comment.
Good point! Fixed - instead of blocking all tail calls once any alloca is emitted, we now track which variables hold stack-allocated addresses and only block tail calls that directly receive one of those variables as an argument. Since escape analysis already guarantees stack-allocated addresses can't reach a call through any other mechanism, checking direct arguments is sufficient.
This adds back a lot of tail calls.
| -- argument cannot be a tail call (the callee would dereference | ||
| -- a pointer into a stack frame that tail-call would have freed). | ||
| -- LLVM enforces this automatically; sets doesAlloca accordingly. | ||
| Just sizeVal -> do |
There was a problem hiding this comment.
I don't see why we can't support variably sized stack allocations :)
| -- For mutate(fIn, fOut, offset, destr, size, startOff, member, ...), | ||
| -- if fOut escapes then: | ||
| -- (a) fIn escapes (same struct, just a new version). Note: this | ||
| -- analysis runs before the destructive-update transformation sets |
There was a problem hiding this comment.
Why can't we run it afterwards?
There was a problem hiding this comment.
technically we could, but it would require splitting the current single-pass design. The destr flags are set incrementally during transformBody using an alias map that's built as the body is traversed; computeEscapedVars runs before that traversal starts, so the updated flags don't exist yet. To use post-destr escape analysis we'd need a preliminary sub-pass just to accumulate the alias map and annotate destr flags, then re-run escape analysis on the result. Happy to do that as a follow-up if the precision loss is worth fixing.
There was a problem hiding this comment.
Sure. I will leave this unresolved as a reminder (assuming it doesn't block the approval)
| -- the same type escapes. This is sound because these ops can't | ||
| -- embed a value into a result of a different type. | ||
| -- • user-defined calls (PrimCall) and foreign calls to other | ||
| -- languages (e.g. C): we have no visibility into the callee, so |
There was a problem hiding this comment.
I would hope for Wybe procs we do.
Is there a way we can annotate Params as saying "this input escapes iff some output also escapes"?
Not sure that makes sense, but I'm sure we can do better here. For (co-recursive) price we'd have to be conservative or attempt to find a fixed point.
There was a problem hiding this comment.
For instance,
def foo(p0, ?p1, q0, ?q1) {
p0 = ?p1
q0 = ?q1
}
For a call to foo, p0 should only escape if p1 does, not if q1 does.
Unless I'm misunderstanding, I believe the current analysis would have p0 and q0 escape if p1 or q1 escape.
There was a problem hiding this comment.
You are correct. I did try to implement a fix for this, but my fix is causing ./complex/exp/testcase_multi_specz-int_list.out to fail with a segmentation fault.
Since this is more of a precision issue, rather than correctness, I'd like to move this to my next PR so I can take a closer look at this.
There was a problem hiding this comment.
Again, I'll leave this open as a reminder
ab5bf6e to
25900bd
Compare
| # escaped), so p also does not escape. | ||
| def {noinline} test_pass_through use !io { | ||
| ?p = position(1, 2) | ||
| !pass_back(p, ?q) |
There was a problem hiding this comment.
Why does this have a !? It shouldn't need it
This PR implements intraprocedural escape analysis for LPVM alloc instructions. Allocations whose result does not outlive the current procedure call are now emitted as LLVM alloca (stack) instead of heap allocations, avoiding GC pressure for short-lived local structs.
How it works
Two checks are combined at each alloc site:
Mutation-chain reachability - backwards fixed-point over the body. A variable escapes if it is an output parameter, or reaches one via mutate or pass-through edges.
Alias-map check - checks whether the alloc result is connected to a global or aliased output in the incremental alias disjoint-set.
If neither fires and the size is a compile-time constant, the
{stack}flag is added to the alloc. LLVM codegen then lowers it toalloca i8, i64 <size> + ptrtointinstead of a GC malloc call.Verification
LLVM:
make_point: Heap-allocated.print_midpoint: Stack-allocated.