cudad is a for-fun CUDA SASS decompiler and binary static-analysis playground.
It reads .sass text dumps from NVIDIA tooling (cuobjdump / nvdisasm) and tries to recover CUDA-like pseudo-C. The project is intentionally honest rather than heroic: if a final SASS construct is not understood well enough, the output keeps an explicit helper/debt marker instead of pretending to be clean source code.
The current north star is simple: recover better pseudo-C for real CUDA kernels while keeping enough structured facts around that bugs in the decompiler are easy to see. This is not a source-to-source compiler, not a recompilable CUDA C generator, and not a formal verifier.
- Final SASS is the ground truth; PTX/source-like names are only recovered when backed by ABI, CFG, SSA, or memory-root evidence.
- Pretty output must not erase uncertainty. Unresolved branches, unmodeled helper calls, unknown opcodes, live-ins, and ambiguous memory roots stay visible.
- Analysis facts are first-class. The optional JSON report exists so tools and tests do not have to scrape pseudo-C.
- Tests should catch plausible lies: raw helper leaks, undeclared temps, fake pointer indexes, hidden call debt, lost roots, and unbound gotos.
- Synthetic happy-path-only tests are discouraged; prefer real bundled SASS kernels and corpus-wide negative invariants.
- Tolerant SASS decoding for
cuobjdump/nvdisasm-style text, including multi-function dumps and SM metadata. - CFG construction, SSA lifting, and deterministic IR optimization before high-level lowering.
- CUDA ABI/profile recovery for kernel params, common builtins, parameter aliases, const-memory observations, and shared/local/global memory spaces.
- Memory-space-aware pseudo-C lowering for
Param,Const,Global,Shared,Local, andGenericaccesses. - Root-preserving address rendering: proven global pointer roots remain rooted even when the element index is not clean enough for
arg[i]syntax. - Idiom recovery for common SASS patterns:
PRMTbyte shuffles/sign extension,LOP3crypto boolean idioms,SHFrotates/shifts,POPC,FLO,VOTE/VOTEU,UFFMA,R2UR, half2 zero/init cases, carry-chainIADD3.X/UIADD3.X, and selected shared atomic idioms. - Explicit unresolved-control helpers for indirect
BRXand convergence break/continue shapes that are not yet proven source-levelswitch/break/continue. - Reciprocal/division helper cleanup that removes
FCHK/CALL.REL.NOINCscaffolding only when the AST proves the fast path and slow path are locally equivalent; otherwise the helper remains visible. - Machine-readable inventories for ABI facts, memory accesses, live-ins, CFG metrics, structurizer fallback pressure, call proof debt, control-flow hazards, opcode coverage, and AST proof records.
- Pseudo-C is often temp-heavy and may still contain
goto BB...fallback regions. - The output is not guaranteed source-equivalent or recompilable CUDA C.
CALL.REL.NOINCandFCHKare intentionally visible when helper-call semantics or slow-path equivalence are not proven.- Indirect branches, convergence tokens, and some helper/libdevice paths are still conservative debt markers.
- Hard pointer/index cases can render as rooted byte-address expressions instead of clean typed array indexes.
- Real reduction-heavy kernels still expose remaining call/control/address proof debt; that is better than hiding it behind fake C.
Run on a bundled fixture:
cargo run -- -i test_cu/test_div.sassEmit machine-readable analysis facts:
cargo run -- -i test_cu/test_div.sass --analysis-json
cargo run -- -i test_cu/test_div.sass --analysis-json --hazards-onlyInspect CFG or optimized SSA:
cargo run -- -i test_cu/if_loop.sass --cfg-dot > /tmp/if_loop.cfg.dot
cargo run -- -i test_cu/if_loop.sass --ssa-dot -o /tmp/if_loop.ssa.dotInclude ABI commentary in rendered pseudo-C:
cargo run -- -i test_cu/test_div.sass --explain-abiRun validation:
cargo fmt --check
cargo test --quietRegenerate curated full-pass snapshots after an intentional backend change:
cargo run --example regen_goldensSASS text
-> parser.rs DecodedInstruction stream
-> cfg.rs basic blocks and explicit edges
-> ir.rs SSA-friendly IR and SSA renaming
-> ir_dce/constprop/algebra/cse/... optimized IR
-> function_analysis.rs ABI, memory, roots, types, live machine facts
-> *_inventory.rs JSON-facing fact inventories and hazard/debt ledgers
-> structurizer/ structured regions plus goto fallbacks
-> ast_lowering/ CUDA-like expressions, memory accesses, and statements
-> symbol_plan.rs deterministic params, locals, live-ins, shared/local objects
-> ast_passes/ bounded cleanup and proof-backed helper elision
-> ast render pseudo-C
The important rule is that later pretty-code stages consume structured facts; they do not rediscover semantics by parsing rendered text.
--analysis-json is the framework-facing output. It reports what the backend knows and what it does not know without requiring downstream users to scrape pseudo-C. Each function report can include:
- schema/view metadata so raw lifted facts, optimized facts, and render-view facts are not confused
- CFG/SSA shape, reachability, unresolved terminators, missing branches/fallthroughs, and complexity metrics
- ABI profile provenance, parameter aliases, const-memory observations, and profile-sensitive warnings
- memory accesses by kind/space/width/root, root confidence, external const-memory roots, unknown ABI const-memory roots, and unresolved-address reasons
- live-in machine registers with use-site role, kind, confidence, and reason
- call-site facts for
CALL/CAL/JCAL, including decoded target, block IDs, lexical return successor, explicit return-register setup, callee target-path summary, integration blockers, per-target grouping, and proof-debt counts - call-rendering facts that cross-check final pseudo-C against the call inventory and AST proof records, so hidden unproven call debt is reported as a bug/debt condition
- opcode-family coverage split into modeled pure operations, must-preserve side-effect operations, and unknown families
- AST proof records such as division-helper elision certificates keyed by decoded call address
Library users can call analyze_sass, analyze_sass_with_options, analyze_decoded_function, or analyze_decoded_functions and serialize SassAnalysisReport directly. Smaller entrypoints such as build_call_inventory, scan_control_flow_hazards, summarize_opcode_coverage, build_abi_inventory, build_memory_inventory, build_live_in_inventory, summarize_cfg_metrics, and build_structurizer_inventory_with_diagnostics expose individual fact surfaces without running the whole CLI.
A small kernel currently decompiles to output like this:
void _Z15test_2_para_intiiPi(int32_t arg0, int32_t arg1, uint32_t* arg2_ptr) {
uint32_t r5_0;
uint32_t ur4_0;
uint32_t ur4_1;
uint32_t ur5_0;
uint32_t r0_0;
bool p1_0;
...
r5_0 = abs(arg1);
ur4_0 = arg0;
ur4_1 = ur4_0 ^ ur5_0;
r0_0 = __int2float_ru(r5_0);
...
arg2_ptr[0] = r5_3;
return;
}That is the intended current style: useful structure first, beauty later. If the backend cannot justify a nicer construct, it leaves an explicit helper such as __cudad_unresolved_indirect_branch(...) or a visible CALL.REL.NOINC(...) instead of making up source.
cudad expects SASS text, not .cu source.
nvcc -arch=sm_89 -cubin kernel.cu -o kernel.cubin
cuobjdump --dump-sass kernel.cubin > kernel.sass
cargo run -- -i kernel.sass -o kernel.pseudo.cMulti-function dumps are split automatically. --analysis-json reports every selected function; use --function <name> to narrow the report or when you want one function for --cfg-dot / --ssa-dot.
-i, --input <INPUT> Input SASS file (default: bundled sample)
-o, --output <OUTPUT> Output file for pseudocode, DOT, or analysis JSON
--cfg-dot Dump CFG as DOT
--ssa-dot Dump optimized SSA as DOT
--analysis-json Emit static-analysis facts as JSON
--hazards-only With --analysis-json, emit only functions that contain hazards
--function <FUNCTION> Select one `Function :` section by name
--abi-profile <ABI_PROFILE> Force ABI profile (`auto|legacy140|modern160`)
--explain-abi Include ABI analysis comments in structured output
src/parser.rs- tolerant SASS decodesrc/cfg.rs- basic-block and CFG constructionsrc/ir.rs- SSA IR construction and DOT renderingsrc/ir_*- IR optimization passessrc/function_analysis.rs- canonical post-SSA fact basesrc/analysis_report.rs- JSON/static-analysis report surfacesrc/abi/andsrc/abi_inventory.rs- ABI profile detection, aliases, const-memory factssrc/memory_model.rsandsrc/memory_inventory.rs- memory opcode semantics and memory fact reportingsrc/live_in_inventory.rs- live-in / undefined-use inventorysrc/call_inventory.rs- call-site proof-debt classificationsrc/control_flow_hazards.rs- conservative control-flow uncertainty scannersrc/cfg_metrics.rs- CFG complexity and irreducibility metricssrc/structurizer/andsrc/structurizer_inventory.rs- control-flow collapse and fallback diagnosticssrc/ast_lowering/- structured lowering from SSA + analysis facts to ASTsrc/ast_passes/andsrc/ast_proofs.rs- cleanup passes and proof recordssrc/backend_pipeline.rs- canonical driver used by the CLI and examplestests/- unit, integration, corpus, and API-surface testspipeline_audit.html- visual map of the pipeline and the main fixed/open design issues
For implementation details, see docs/dev/current_architecture.md and docs/dev/development.md.