Skip to content

Add CUDA Examples/Improve CUDA robustness#202

Open
SousaTrashBin wants to merge 104 commits into
masterfrom
add_some_cuda_examples
Open

Add CUDA Examples/Improve CUDA robustness#202
SousaTrashBin wants to merge 104 commits into
masterfrom
add_some_cuda_examples

Conversation

@SousaTrashBin
Copy link
Copy Markdown
Collaborator

the only example that is not working properly is histogram.ae, I'm pretty sure it's cause I have a kernel inside another kernel. LLVM/CPU was not tested and needs to be verified posteriorly.

@SousaTrashBin SousaTrashBin requested a review from alcides April 27, 2026 23:39
@alcides alcides force-pushed the add_some_cuda_examples branch from 152e24b to 54a8846 Compare April 29, 2026 13:45
… normal/core ast and gpu ast (not sure if this is the smartest way of doing this)
…hings are more isolated and less likely to break
… normal/core ast and gpu ast (not sure if this is the smartest way of doing this)
SousaTrashBin and others added 14 commits April 30, 2026 09:54
…rations (still didn't test if this works on gpu)
…werer

When _extract_call_info returns prev_args from a partial application,
eff_ty only contains the remaining parameter types. Using eff_ty with
offset=len(prev_args) caused incorrect type expectations and premature
full-application detection, dropping trailing arguments. This fix uses
target.type (the full function type) when prev_args exist, matching
what _lower_builtin_call already does.

Fixes histogram.ae producing all-zero word counts due to the wordcount
call being silently dropped during lowering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nodes

- Remove stale gpu/llvm imports from decorators/__init__.py
- Add missing LLVMVectorGet, LLVMVectorSet, LLVMVectorSize classes to llvm_ast.py
- Add find_calls() method to LLVMTerm and all compound term subclasses
- Fix target_machine attribute error in CPULLVMExecutionEngine
- Update old import "X.ae" syntax to open X in all LLVM examples

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alcides alcides force-pushed the add_some_cuda_examples branch from 6345117 to 2dbc106 Compare April 30, 2026 08:54
alcides and others added 5 commits April 30, 2026 12:44
The e2e tests use @llvm which was renamed to @cpu but never aliased.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The LLVM test examples use Vector_size which is handled natively by
the LLVM backend but was missing from the Vector library, causing
KeyError when the evaluator falls back.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g LLVMVectorSize alias

- Remove leftover `print(f"DEBUG: ...")` in lowerer.py
- Upgrade silent fallback/disable log messages from debug to warning level
- Add `LLVMVectorSize = Any` to runtime else-branch in core.py for consistency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CPULLVMPipeline was never imported anywhere. MultiBackendPipeline in
aeon/llvm/pipeline.py handles CPU-only as its default and is the one
used by the driver.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner

@alcides alcides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of code cleaning would make this good to be merged.

Comment thread aeon/llvm/cuda/executor.py Outdated
if ir_hash not in self._module_cache:
ptx = self._compile_to_ptx(llvm_ir)
self._module_cache[ir_hash] = self._load_module(ptx)
with open("debug.ll", "w") as f:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isto pode ir para uma função auxiliar e estar atrás de uma flag?

Copy link
Copy Markdown
Collaborator Author

@SousaTrashBin SousaTrashBin May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

só a parte de escrever para um ficheiro? ou toda a parte de compilação do ptx?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escrever para ficheiro.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolvido no commit bf44b74

Comment thread aeon/llvm/cuda/executor.py Outdated
finally:
for d_ptr in device_ptrs:
self.libcuda.cuMemFree_v2(d_ptr)
if isinstance(ret_type, LLVMPointerType) or "Vector" in str(ret_type):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em vez de Vector in str(...), devíamos ter uma resolução de nomes para colocar tudo qualified, e comparar com o qualified name.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qualified ou neste caso unqualified? assumo que a parte do "qualified" seja à esquerda do '.' e à direita seja a parte unqualified, adiciono dois fields aos LLVMTypes, um para a qualified e outro para a unqualified? (e vejo se qualified == "Vector" e unqualified == "Vector")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolvido nos commits bf44b74 1001175 e 05473ec (agora o Vector é um tipo built-in e a conversão é feita de forma correta sem "work-arounds" pela função que transforma um tipo nativo para o de LLVM)

Comment thread aeon/llvm/pipeline.py Outdated
@SousaTrashBin SousaTrashBin requested a review from alcides May 6, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants