Skip to content

Add Escape Analysis#551

Open
SaumitraLohokare wants to merge 12 commits into
pschachte:masterfrom
SaumitraLohokare:feat/escape-analysis
Open

Add Escape Analysis#551
SaumitraLohokare wants to merge 12 commits into
pschachte:masterfrom
SaumitraLohokare:feat/escape-analysis

Conversation

@SaumitraLohokare
Copy link
Copy Markdown

This PR implements intraprocedural escape analysis for LPVM alloc instructions. Allocations whose result does not outlive the current procedure call are now emitted as LLVM alloca (stack) instead of heap allocations, avoiding GC pressure for short-lived local structs.

How it works

Two checks are combined at each alloc site:

  1. Mutation-chain reachability - backwards fixed-point over the body. A variable escapes if it is an output parameter, or reaches one via mutate or pass-through edges.

  2. Alias-map check - checks whether the alloc result is connected to a global or aliased output in the incremental alias disjoint-set.

If neither fires and the size is a compile-time constant, the {stack} flag is added to the alloc. LLVM codegen then lowers it to alloca i8, i64 <size> + ptrtoint instead of a GC malloc call.

Verification

pub type point { pub point(x: int, y: int) }

# mid is never returned → should be stack-allocated
pub def print_midpoint(p1: point, p2: point) use !io {
    ?mid = point((p1^x + p2^x) / 2, (p1^y + p2^y) / 2)
    !println(mid^x)
    !println(mid^y)
}

# q is an output parameter → heap-allocated
pub def make_point(?q: point) {
    ?q = point(10, 20)
}

LLVM:

make_point: Heap-allocated.

define external fastcc i64 @"escape_ex#.make_point<0>"() {
  %"tmp#6##0" = call ccc ptr @wybe_malloc(i32 16)   ; heap allocation
  %"tmp#3##0" = ptrtoint ptr %"tmp#6##0" to i64
  %"tmp#7##0" = inttoptr i64 %"tmp#3##0" to ptr
  store i64 10, ptr %"tmp#7##0"                      ; x = 10
  %"tmp#8##0" = add i64 %"tmp#3##0", 8
  %"tmp#9##0" = inttoptr i64 %"tmp#8##0" to ptr
  store i64 20, ptr %"tmp#9##0"                      ; y = 20
  ret i64 %"tmp#3##0"
}

print_midpoint: Stack-allocated.

define external fastcc void @"escape_ex#.print_midpoint<0>"(i64 %"p1##0", i64 %"p2##0") {
  ; compute (p1.x + p2.x) / 2
  ...
  ; compute (p1.y + p2.y) / 2
  ...
  ; mid = point(...) — stack allocated
  %"tmp#44##0" = alloca i8, i64 16, align 8          ; ← alloca, not malloc
  %"tmp#25##0" = ptrtoint ptr %"tmp#44##0" to i64
  %"tmp#45##0" = inttoptr i64 %"tmp#25##0" to ptr
  store i64 %"tmp#1##0", ptr %"tmp#45##0"            ; mid.x
  %"tmp#46##0" = add i64 %"tmp#25##0", 8
  %"tmp#47##0" = inttoptr i64 %"tmp#46##0" to ptr
  store i64 %"tmp#5##0", ptr %"tmp#47##0"            ; mid.y
  ; read back and print
  ...
  ret void
}

Copy link
Copy Markdown
Owner

@pschachte pschachte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and I really appreciate your documentation here. Makes the code much easier to review.

I've placed a few requests for documentation clarification inline, and one suggested improvement.

The other thing for you to fix is all the failing test cases. Before resubmitting, please run the update-exp script and scroll carefully through the differences in each file, making sure that they are all as expected (they're mostly calls to foreign lpvm alloc that have had a {stack} flag added, and the consequent changes to LLVM code). I.e., make sure that the all those stack allocations are safe. For each test case, if so, say yes, and it will replace the .exp file with the new output. Then add all the changed .exp files to the pull request branch. If not, you'll need to fix the bugs.

You should also add your new test case to the test-cases/final-dump directory and run update-exp; it will show you the output and prompt you to accept or reject it. Then also add the source and .exp files to the branch. Finally, please add a couple more tests to that test case, one that creates a fresh structure, passes it to another proc, which passes it back, and see if the escape analysis still sees that it doesn't escape (it's OK if it doesn't, but we should know), and one that does something similar, but the structure really does escape, so we can see that it doesn't optimise cases it shouldn't.

Comment thread src/LLVM.hs
-- argument cannot be a tail call (the callee would dereference
-- a pointer into a stack frame that tail-call would have freed).
-- LLVM enforces this automatically; sets doesAlloca accordingly.
Just sizeVal -> do
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does LLVM's alloca only work for fixed size allocations? If so, please say that here in a comment. If not, please add a comment saying this is a temporary restriction. Unless you can generalise it so it works for variable sized allocations.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloca also works for variable sized allocations from my research, but I'm not super familiar with LLVM in general so I chose to take a safer approach. I'll comment this 👍 Thank you

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we can't support variably sized stack allocations :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, sorry for the late reply. Been a bit busy with proposal and presentations. But I looked into variable sized alloca and I think it isn't worth the effort. Mainly for two reasons:

  • Variable-sized alloca forces LLVM to keep a frame pointer because the stack frame size can't be computed statically. Wybe actively uses TCO (TailCallModCons is an optimization flag). These two features are fundamentally incompatible.
  • malloc can return null; a stack overflow from a large VLA just segfaults or corrupts the stack silently. For objects that are large or runtime-sized, heap allocation is the safer default.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should support it in LLVM generation, but never generate it in the escape analysis. There are cases where we alloc a variable length array, and if the user wanted to supply their own stack flag I don't see why we shouldn't support that

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since tail recursion optimisation only works for certain calling conventions, and from what you've told me, you can't use a calling convention that supports TRO together with variable-sized alloca, I think we should call this Not Yet Implemented for now (there's a nyi function to report such missing features). If we get around to doing our own TRO during LLVM generation, we should revisit this.

Comment thread src/Transform.hs Outdated
isAllocOrMutate (PrimForeign "lpvm" "mutate" _ _) = True
isAllocOrMutate _ = False

-- | Compute the set of variables whose allocated memory must survive the
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean "may" rather than "must". This is an important distinction: this optimisation can only be done if it's impossible for the allocation to survive the proc body it appears in. You have to be conservative in approximating this (and you have to approximate; you can't always get it exactly right).

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I understand what you mean. OK, it's difficult to express this clearly, but I think it's clearer to write of what will happen when the program runs, so think of a value as escaping a proc body if it is possible that the value could be referred to after the proc returns.

Comment thread src/Transform.hs Outdated
prims = List.map content $ collectAllBodyPrims body
-- For mutate(fIn, fOut, offset, destr, size, startOff, member, ...),
-- if fOut escapes then:
-- (a) fIn escapes (same struct, just a new version)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fIn might not escape in this case, if the destructive flag is not set in the mutate. In that case, the mutate instruction makes a fresh copy, so fOut escaping doesn't imply that fIn does, unless the structure contains pointers.

If your analysis/transformation is happening before the destructive update transformation, it would be useful to document that this limits the stack allocation transformation here. Otherwise, I think you should be able to do better here.

Comment thread src/Transform.hs
Comment thread src/Transform.hs Outdated
Comment thread src/Transform.hs
@SaumitraLohokare
Copy link
Copy Markdown
Author

I also had the thought of adding a flag to disable/enable escape analysis. Is this a good idea or should we just leave it as always enabled?

@pschachte
Copy link
Copy Markdown
Owner

Adding a command line flag is a good idea. It is stack allocation, not the escape analysis, that would be relevant to programmers, so I would name the switch in terms of stack allocation rather than escape analysis. So far, we optimise by default, so I think the switch should disable automatic stack allocation.

@SaumitraLohokare
Copy link
Copy Markdown
Author

I added the flag for stack-alloc under --opt. It is on by default. And also added a new test case for stack allocation.
Ran update-exp as well, all tests passing.

Comment thread src/LLVM.hs Outdated
-- Side-effect: any call that receives this address as an i64
-- argument cannot be a tail call (the callee would dereference
-- a pointer into a stack frame that tail-call would have freed).
-- LLVM enforces this automatically; sets doesAlloca accordingly.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to do a tail call if the stack allocated space never enters the tail call.

Copy link
Copy Markdown
Collaborator

@jimbxb jimbxb Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

def foo use !io {
    ?p = position(1, 2)
    !print(p)
    !foo
}

Should be able to stack allocate p.

I'm thinking this case could be fairly common, say if something is used for an intermediate computation before a tail call.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Fixed - instead of blocking all tail calls once any alloca is emitted, we now track which variables hold stack-allocated addresses and only block tail calls that directly receive one of those variables as an argument. Since escape analysis already guarantees stack-allocated addresses can't reach a call through any other mechanism, checking direct arguments is sufficient.
This adds back a lot of tail calls.

Comment thread src/LLVM.hs
-- argument cannot be a tail call (the callee would dereference
-- a pointer into a stack frame that tail-call would have freed).
-- LLVM enforces this automatically; sets doesAlloca accordingly.
Just sizeVal -> do
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we can't support variably sized stack allocations :)

Comment thread src/Transform.hs
-- For mutate(fIn, fOut, offset, destr, size, startOff, member, ...),
-- if fOut escapes then:
-- (a) fIn escapes (same struct, just a new version). Note: this
-- analysis runs before the destructive-update transformation sets
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we run it afterwards?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically we could, but it would require splitting the current single-pass design. The destr flags are set incrementally during transformBody using an alias map that's built as the body is traversed; computeEscapedVars runs before that traversal starts, so the updated flags don't exist yet. To use post-destr escape analysis we'd need a preliminary sub-pass just to accumulate the alias map and annotate destr flags, then re-run escape analysis on the result. Happy to do that as a follow-up if the precision loss is worth fixing.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will leave this unresolved as a reminder (assuming it doesn't block the approval)

Comment thread src/Transform.hs
-- the same type escapes. This is sound because these ops can't
-- embed a value into a result of a different type.
-- • user-defined calls (PrimCall) and foreign calls to other
-- languages (e.g. C): we have no visibility into the callee, so
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope for Wybe procs we do.

Is there a way we can annotate Params as saying "this input escapes iff some output also escapes"?

Not sure that makes sense, but I'm sure we can do better here. For (co-recursive) price we'd have to be conservative or attempt to find a fixed point.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance,

def foo(p0, ?p1, q0, ?q1) {
    p0 = ?p1
    q0 = ?q1
}

For a call to foo, p0 should only escape if p1 does, not if q1 does.

Unless I'm misunderstanding, I believe the current analysis would have p0 and q0 escape if p1 or q1 escape.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I did try to implement a fix for this, but my fix is causing ./complex/exp/testcase_multi_specz-int_list.out to fail with a segmentation fault.
Since this is more of a precision issue, rather than correctness, I'd like to move this to my next PR so I can take a closer look at this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'll leave this open as a reminder

@SaumitraLohokare SaumitraLohokare marked this pull request as draft April 1, 2026 00:50
@SaumitraLohokare SaumitraLohokare marked this pull request as ready for review May 19, 2026 23:07
# escaped), so p also does not escape.
def {noinline} test_pass_through use !io {
?p = position(1, 2)
!pass_back(p, ?q)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have a !? It shouldn't need it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants