Introduce Rewriter abstraction to cache use lookups by maleadt · Pull Request #235 · JuliaGPU/cuTile.jl

maleadt · 2026-05-28T13:45:20Z

As observed in #233, compilation of large kernels is slow. The culprit turned out to be the user/use lookup, which walks the entire IR every time since our IR doesn't encode use-def chains (as opposed to LLVM or MLIR). To avoid this cost, introduce a Rewriter abstraction that caches these lookups while invalidating them upon insertion, mutation, etc. Once again modeled after MLIR.

randn with Float64 does down from 13s to 1.3s on my system. Test time goes from 1m50 to 1m05.

maleadt · 2026-05-28T14:08:10Z

CI goes from 7m50 to 5m14, with the random tests going from 298s to 57s... I think this obviates #233 then.

AntonOresten · 2026-05-28T14:48:09Z

Great, thanks!

maleadt added 3 commits May 28, 2026 15:18

Cache user lookups.

37f140e

Introduce Rewriter abstraction.

ebce359

Rewrite comments.

cf5b486

maleadt merged commit 4a3cc60 into main May 28, 2026
1 check passed

maleadt deleted the tb/optimize_rewriter branch May 28, 2026 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Rewriter abstraction to cache use lookups#235

Introduce Rewriter abstraction to cache use lookups#235
maleadt merged 3 commits into
mainfrom
tb/optimize_rewriter

maleadt commented May 28, 2026

Uh oh!

maleadt commented May 28, 2026

Uh oh!

Uh oh!

AntonOresten commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maleadt commented May 28, 2026

Uh oh!

maleadt commented May 28, 2026

Uh oh!

Uh oh!

AntonOresten commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants