cudad

cudad is a for-fun CUDA SASS decompiler and binary static-analysis playground.

It reads .sass text dumps from NVIDIA tooling (cuobjdump / nvdisasm) and tries to recover CUDA-like pseudo-C. The project is intentionally honest rather than heroic: if a final SASS construct is not understood well enough, the output keeps an explicit helper/debt marker instead of pretending to be clean source code.

The current north star is simple: recover better pseudo-C for real CUDA kernels while keeping enough structured facts around that bugs in the decompiler are easy to see. This is not a source-to-source compiler, not a recompilable CUDA C generator, and not a formal verifier.

Design stance

Final SASS is the ground truth; PTX/source-like names are only recovered when backed by ABI, CFG, SSA, or memory-root evidence.
Pretty output must not erase uncertainty. Unresolved branches, unmodeled helper calls, unknown opcodes, live-ins, and ambiguous memory roots stay visible.
Analysis facts are first-class. The optional JSON report exists so tools and tests do not have to scrape pseudo-C.
Tests should catch plausible lies: raw helper leaks, undeclared temps, fake pointer indexes, hidden call debt, lost roots, and unbound gotos.
Synthetic happy-path-only tests are discouraged; prefer real bundled SASS kernels and corpus-wide negative invariants.

What works today

Tolerant SASS decoding for cuobjdump / nvdisasm-style text, including multi-function dumps and SM metadata.
CFG construction, SSA lifting, and deterministic IR optimization before high-level lowering.
CUDA ABI/profile recovery for kernel params, common builtins, parameter aliases, const-memory observations, and shared/local/global memory spaces.
Memory-space-aware pseudo-C lowering for Param, Const, Global, Shared, Local, and Generic accesses.
Root-preserving address rendering: proven global pointer roots remain rooted even when the element index is not clean enough for arg[i] syntax.
Idiom recovery for common SASS patterns: PRMT byte shuffles/sign extension, LOP3 crypto boolean idioms, SHF rotates/shifts, POPC, FLO, VOTE/VOTEU, UFFMA, R2UR, half2 zero/init cases, carry-chain IADD3.X / UIADD3.X, and selected shared atomic idioms.
Explicit unresolved-control helpers for indirect BRX and convergence break/continue shapes that are not yet proven source-level switch / break / continue.
Reciprocal/division helper cleanup that removes FCHK / CALL.REL.NOINC scaffolding only when the AST proves the fast path and slow path are locally equivalent; otherwise the helper remains visible.
Machine-readable inventories for ABI facts, memory accesses, live-ins, CFG metrics, structurizer fallback pressure, call proof debt, control-flow hazards, opcode coverage, and AST proof records.

Current limits

Pseudo-C is often temp-heavy and may still contain goto BB... fallback regions.
The output is not guaranteed source-equivalent or recompilable CUDA C.
CALL.REL.NOINC and FCHK are intentionally visible when helper-call semantics or slow-path equivalence are not proven.
Indirect branches, convergence tokens, and some helper/libdevice paths are still conservative debt markers.
Hard pointer/index cases can render as rooted byte-address expressions instead of clean typed array indexes.
Real reduction-heavy kernels still expose remaining call/control/address proof debt; that is better than hiding it behind fake C.

Quick start

Run on a bundled fixture:

cargo run -- -i test_cu/test_div.sass

Emit machine-readable analysis facts:

cargo run -- -i test_cu/test_div.sass --analysis-json
cargo run -- -i test_cu/test_div.sass --analysis-json --hazards-only

Inspect CFG or optimized SSA:

cargo run -- -i test_cu/if_loop.sass --cfg-dot > /tmp/if_loop.cfg.dot
cargo run -- -i test_cu/if_loop.sass --ssa-dot -o /tmp/if_loop.ssa.dot

Include ABI commentary in rendered pseudo-C:

cargo run -- -i test_cu/test_div.sass --explain-abi

Run validation:

cargo fmt --check
cargo test --quiet

Regenerate curated full-pass snapshots after an intentional backend change:

cargo run --example regen_goldens

Pipeline

SASS text
  -> parser.rs                         DecodedInstruction stream
  -> cfg.rs                            basic blocks and explicit edges
  -> ir.rs                             SSA-friendly IR and SSA renaming
  -> ir_dce/constprop/algebra/cse/...  optimized IR
  -> function_analysis.rs              ABI, memory, roots, types, live machine facts
  -> *_inventory.rs                    JSON-facing fact inventories and hazard/debt ledgers
  -> structurizer/                     structured regions plus goto fallbacks
  -> ast_lowering/                     CUDA-like expressions, memory accesses, and statements
  -> symbol_plan.rs                    deterministic params, locals, live-ins, shared/local objects
  -> ast_passes/                       bounded cleanup and proof-backed helper elision
  -> ast render                        pseudo-C

The important rule is that later pretty-code stages consume structured facts; they do not rediscover semantics by parsing rendered text.

Static analysis JSON

--analysis-json is the framework-facing output. It reports what the backend knows and what it does not know without requiring downstream users to scrape pseudo-C. Each function report can include:

schema/view metadata so raw lifted facts, optimized facts, and render-view facts are not confused
CFG/SSA shape, reachability, unresolved terminators, missing branches/fallthroughs, and complexity metrics
ABI profile provenance, parameter aliases, const-memory observations, and profile-sensitive warnings
memory accesses by kind/space/width/root, root confidence, external const-memory roots, unknown ABI const-memory roots, and unresolved-address reasons
live-in machine registers with use-site role, kind, confidence, and reason
call-site facts for CALL / CAL / JCAL, including decoded target, block IDs, lexical return successor, explicit return-register setup, callee target-path summary, integration blockers, per-target grouping, and proof-debt counts
call-rendering facts that cross-check final pseudo-C against the call inventory and AST proof records, so hidden unproven call debt is reported as a bug/debt condition
opcode-family coverage split into modeled pure operations, must-preserve side-effect operations, and unknown families
AST proof records such as division-helper elision certificates keyed by decoded call address

Library users can call analyze_sass, analyze_sass_with_options, analyze_decoded_function, or analyze_decoded_functions and serialize SassAnalysisReport directly. Smaller entrypoints such as build_call_inventory, scan_control_flow_hazards, summarize_opcode_coverage, build_abi_inventory, build_memory_inventory, build_live_in_inventory, summarize_cfg_metrics, and build_structurizer_inventory_with_diagnostics expose individual fact surfaces without running the whole CLI.

What the output looks like

A small kernel currently decompiles to output like this:

void _Z15test_2_para_intiiPi(int32_t arg0, int32_t arg1, uint32_t* arg2_ptr) {
  uint32_t r5_0;
  uint32_t ur4_0;
  uint32_t ur4_1;
  uint32_t ur5_0;
  uint32_t r0_0;
  bool p1_0;
  ...

  r5_0 = abs(arg1);
  ur4_0 = arg0;
  ur4_1 = ur4_0 ^ ur5_0;
  r0_0 = __int2float_ru(r5_0);
  ...
  arg2_ptr[0] = r5_3;
  return;
}

That is the intended current style: useful structure first, beauty later. If the backend cannot justify a nicer construct, it leaves an explicit helper such as __cudad_unresolved_indirect_branch(...) or a visible CALL.REL.NOINC(...) instead of making up source.

Real input workflow

cudad expects SASS text, not .cu source.

nvcc -arch=sm_89 -cubin kernel.cu -o kernel.cubin
cuobjdump --dump-sass kernel.cubin > kernel.sass
cargo run -- -i kernel.sass -o kernel.pseudo.c

Multi-function dumps are split automatically. --analysis-json reports every selected function; use --function <name> to narrow the report or when you want one function for --cfg-dot / --ssa-dot.

CLI

-i, --input <INPUT>              Input SASS file (default: bundled sample)
-o, --output <OUTPUT>            Output file for pseudocode, DOT, or analysis JSON
    --cfg-dot                    Dump CFG as DOT
    --ssa-dot                    Dump optimized SSA as DOT
    --analysis-json              Emit static-analysis facts as JSON
    --hazards-only               With --analysis-json, emit only functions that contain hazards
    --function <FUNCTION>        Select one `Function :` section by name
    --abi-profile <ABI_PROFILE>  Force ABI profile (`auto|legacy140|modern160`)
    --explain-abi                Include ABI analysis comments in structured output

Repo layout

src/parser.rs - tolerant SASS decode
src/cfg.rs - basic-block and CFG construction
src/ir.rs - SSA IR construction and DOT rendering
src/ir_* - IR optimization passes
src/function_analysis.rs - canonical post-SSA fact base
src/analysis_report.rs - JSON/static-analysis report surface
src/abi/ and src/abi_inventory.rs - ABI profile detection, aliases, const-memory facts
src/memory_model.rs and src/memory_inventory.rs - memory opcode semantics and memory fact reporting
src/live_in_inventory.rs - live-in / undefined-use inventory
src/call_inventory.rs - call-site proof-debt classification
src/control_flow_hazards.rs - conservative control-flow uncertainty scanner
src/cfg_metrics.rs - CFG complexity and irreducibility metrics
src/structurizer/ and src/structurizer_inventory.rs - control-flow collapse and fallback diagnostics
src/ast_lowering/ - structured lowering from SSA + analysis facts to AST
src/ast_passes/ and src/ast_proofs.rs - cleanup passes and proof records
src/backend_pipeline.rs - canonical driver used by the CLI and examples
tests/ - unit, integration, corpus, and API-surface tests
pipeline_audit.html - visual map of the pipeline and the main fixed/open design issues

For implementation details, see docs/dev/current_architecture.md and docs/dev/development.md.

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
docs		docs
examples		examples
src		src
test_cu		test_cu
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pipeline_audit.html		pipeline_audit.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cudad

Design stance

What works today

Current limits

Quick start

Pipeline

Static analysis JSON

What the output looks like

Real input workflow

CLI

Repo layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cudad

Design stance

What works today

Current limits

Quick start

Pipeline

Static analysis JSON

What the output looks like

Real input workflow

CLI

Repo layout

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages