Add CUDA Examples/Improve CUDA robustness#202
Conversation
152e24b to
54a8846
Compare
… normal/core ast and gpu ast (not sure if this is the smartest way of doing this)
… before finishing this)
…e GPU AST and llvm instructions
…hings are more isolated and less likely to break
…and LLVM IR generation
… normal/core ast and gpu ast (not sure if this is the smartest way of doing this)
… before finishing this)
…e GPU AST and llvm instructions
…nagement on vectors
…rations (still didn't test if this works on gpu)
…werer When _extract_call_info returns prev_args from a partial application, eff_ty only contains the remaining parameter types. Using eff_ty with offset=len(prev_args) caused incorrect type expectations and premature full-application detection, dropping trailing arguments. This fix uses target.type (the full function type) when prev_args exist, matching what _lower_builtin_call already does. Fixes histogram.ae producing all-zero word counts due to the wordcount call being silently dropped during lowering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nodes - Remove stale gpu/llvm imports from decorators/__init__.py - Add missing LLVMVectorGet, LLVMVectorSet, LLVMVectorSize classes to llvm_ast.py - Add find_calls() method to LLVMTerm and all compound term subclasses - Fix target_machine attribute error in CPULLVMExecutionEngine - Update old import "X.ae" syntax to open X in all LLVM examples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6345117 to
2dbc106
Compare
The LLVM test examples use Vector_size which is handled natively by the LLVM backend but was missing from the Vector library, causing KeyError when the evaluator falls back. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g LLVMVectorSize alias - Remove leftover `print(f"DEBUG: ...")` in lowerer.py - Upgrade silent fallback/disable log messages from debug to warning level - Add `LLVMVectorSize = Any` to runtime else-branch in core.py for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CPULLVMPipeline was never imported anywhere. MultiBackendPipeline in aeon/llvm/pipeline.py handles CPU-only as its default and is the one used by the driver. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alcides
left a comment
There was a problem hiding this comment.
A bit of code cleaning would make this good to be merged.
| if ir_hash not in self._module_cache: | ||
| ptx = self._compile_to_ptx(llvm_ir) | ||
| self._module_cache[ir_hash] = self._load_module(ptx) | ||
| with open("debug.ll", "w") as f: |
There was a problem hiding this comment.
Isto pode ir para uma função auxiliar e estar atrás de uma flag?
There was a problem hiding this comment.
só a parte de escrever para um ficheiro? ou toda a parte de compilação do ptx?
| finally: | ||
| for d_ptr in device_ptrs: | ||
| self.libcuda.cuMemFree_v2(d_ptr) | ||
| if isinstance(ret_type, LLVMPointerType) or "Vector" in str(ret_type): |
There was a problem hiding this comment.
Em vez de Vector in str(...), devíamos ter uma resolução de nomes para colocar tudo qualified, e comparar com o qualified name.
There was a problem hiding this comment.
qualified ou neste caso unqualified? assumo que a parte do "qualified" seja à esquerda do '.' e à direita seja a parte unqualified, adiciono dois fields aos LLVMTypes, um para a qualified e outro para a unqualified? (e vejo se qualified == "Vector" e unqualified == "Vector")
…er by extracting multiple implementation steps to simpler individual functions
the only example that is not working properly is histogram.ae, I'm pretty sure it's cause I have a kernel inside another kernel. LLVM/CPU was not tested and needs to be verified posteriorly.