Skip to content

itewqq/cudad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

216 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cudad

cudad is a for-fun CUDA SASS decompiler and binary static-analysis playground.

It reads .sass text dumps from NVIDIA tooling (cuobjdump / nvdisasm) and tries to recover CUDA-like pseudo-C. The project is intentionally honest rather than heroic: if a final SASS construct is not understood well enough, the output keeps an explicit helper/debt marker instead of pretending to be clean source code.

The current north star is simple: recover better pseudo-C for real CUDA kernels while keeping enough structured facts around that bugs in the decompiler are easy to see. This is not a source-to-source compiler, not a recompilable CUDA C generator, and not a formal verifier.

Design stance

  • Final SASS is the ground truth; PTX/source-like names are only recovered when backed by ABI, CFG, SSA, or memory-root evidence.
  • Pretty output must not erase uncertainty. Unresolved branches, unmodeled helper calls, unknown opcodes, live-ins, and ambiguous memory roots stay visible.
  • Analysis facts are first-class. The optional JSON report exists so tools and tests do not have to scrape pseudo-C.
  • Tests should catch plausible lies: raw helper leaks, undeclared temps, fake pointer indexes, hidden call debt, lost roots, and unbound gotos.
  • Synthetic happy-path-only tests are discouraged; prefer real bundled SASS kernels and corpus-wide negative invariants.

What works today

  • Tolerant SASS decoding for cuobjdump / nvdisasm-style text, including multi-function dumps and SM metadata.
  • CFG construction, SSA lifting, and deterministic IR optimization before high-level lowering.
  • CUDA ABI/profile recovery for kernel params, common builtins, parameter aliases, const-memory observations, and shared/local/global memory spaces.
  • Memory-space-aware pseudo-C lowering for Param, Const, Global, Shared, Local, and Generic accesses.
  • Root-preserving address rendering: proven global pointer roots remain rooted even when the element index is not clean enough for arg[i] syntax.
  • Idiom recovery for common SASS patterns: PRMT byte shuffles/sign extension, LOP3 crypto boolean idioms, SHF rotates/shifts, POPC, FLO, VOTE/VOTEU, UFFMA, R2UR, half2 zero/init cases, carry-chain IADD3.X / UIADD3.X, and selected shared atomic idioms.
  • Explicit unresolved-control helpers for indirect BRX and convergence break/continue shapes that are not yet proven source-level switch / break / continue.
  • Reciprocal/division helper cleanup that removes FCHK / CALL.REL.NOINC scaffolding only when the AST proves the fast path and slow path are locally equivalent; otherwise the helper remains visible.
  • Machine-readable inventories for ABI facts, memory accesses, live-ins, CFG metrics, structurizer fallback pressure, call proof debt, control-flow hazards, opcode coverage, and AST proof records.

Current limits

  • Pseudo-C is often temp-heavy and may still contain goto BB... fallback regions.
  • The output is not guaranteed source-equivalent or recompilable CUDA C.
  • CALL.REL.NOINC and FCHK are intentionally visible when helper-call semantics or slow-path equivalence are not proven.
  • Indirect branches, convergence tokens, and some helper/libdevice paths are still conservative debt markers.
  • Hard pointer/index cases can render as rooted byte-address expressions instead of clean typed array indexes.
  • Real reduction-heavy kernels still expose remaining call/control/address proof debt; that is better than hiding it behind fake C.

Quick start

Run on a bundled fixture:

cargo run -- -i test_cu/test_div.sass

Emit machine-readable analysis facts:

cargo run -- -i test_cu/test_div.sass --analysis-json
cargo run -- -i test_cu/test_div.sass --analysis-json --hazards-only

Inspect CFG or optimized SSA:

cargo run -- -i test_cu/if_loop.sass --cfg-dot > /tmp/if_loop.cfg.dot
cargo run -- -i test_cu/if_loop.sass --ssa-dot -o /tmp/if_loop.ssa.dot

Include ABI commentary in rendered pseudo-C:

cargo run -- -i test_cu/test_div.sass --explain-abi

Run validation:

cargo fmt --check
cargo test --quiet

Regenerate curated full-pass snapshots after an intentional backend change:

cargo run --example regen_goldens

Pipeline

SASS text
  -> parser.rs                         DecodedInstruction stream
  -> cfg.rs                            basic blocks and explicit edges
  -> ir.rs                             SSA-friendly IR and SSA renaming
  -> ir_dce/constprop/algebra/cse/...  optimized IR
  -> function_analysis.rs              ABI, memory, roots, types, live machine facts
  -> *_inventory.rs                    JSON-facing fact inventories and hazard/debt ledgers
  -> structurizer/                     structured regions plus goto fallbacks
  -> ast_lowering/                     CUDA-like expressions, memory accesses, and statements
  -> symbol_plan.rs                    deterministic params, locals, live-ins, shared/local objects
  -> ast_passes/                       bounded cleanup and proof-backed helper elision
  -> ast render                        pseudo-C

The important rule is that later pretty-code stages consume structured facts; they do not rediscover semantics by parsing rendered text.

Static analysis JSON

--analysis-json is the framework-facing output. It reports what the backend knows and what it does not know without requiring downstream users to scrape pseudo-C. Each function report can include:

  • schema/view metadata so raw lifted facts, optimized facts, and render-view facts are not confused
  • CFG/SSA shape, reachability, unresolved terminators, missing branches/fallthroughs, and complexity metrics
  • ABI profile provenance, parameter aliases, const-memory observations, and profile-sensitive warnings
  • memory accesses by kind/space/width/root, root confidence, external const-memory roots, unknown ABI const-memory roots, and unresolved-address reasons
  • live-in machine registers with use-site role, kind, confidence, and reason
  • call-site facts for CALL / CAL / JCAL, including decoded target, block IDs, lexical return successor, explicit return-register setup, callee target-path summary, integration blockers, per-target grouping, and proof-debt counts
  • call-rendering facts that cross-check final pseudo-C against the call inventory and AST proof records, so hidden unproven call debt is reported as a bug/debt condition
  • opcode-family coverage split into modeled pure operations, must-preserve side-effect operations, and unknown families
  • AST proof records such as division-helper elision certificates keyed by decoded call address

Library users can call analyze_sass, analyze_sass_with_options, analyze_decoded_function, or analyze_decoded_functions and serialize SassAnalysisReport directly. Smaller entrypoints such as build_call_inventory, scan_control_flow_hazards, summarize_opcode_coverage, build_abi_inventory, build_memory_inventory, build_live_in_inventory, summarize_cfg_metrics, and build_structurizer_inventory_with_diagnostics expose individual fact surfaces without running the whole CLI.

What the output looks like

A small kernel currently decompiles to output like this:

void _Z15test_2_para_intiiPi(int32_t arg0, int32_t arg1, uint32_t* arg2_ptr) {
  uint32_t r5_0;
  uint32_t ur4_0;
  uint32_t ur4_1;
  uint32_t ur5_0;
  uint32_t r0_0;
  bool p1_0;
  ...

  r5_0 = abs(arg1);
  ur4_0 = arg0;
  ur4_1 = ur4_0 ^ ur5_0;
  r0_0 = __int2float_ru(r5_0);
  ...
  arg2_ptr[0] = r5_3;
  return;
}

That is the intended current style: useful structure first, beauty later. If the backend cannot justify a nicer construct, it leaves an explicit helper such as __cudad_unresolved_indirect_branch(...) or a visible CALL.REL.NOINC(...) instead of making up source.

Real input workflow

cudad expects SASS text, not .cu source.

nvcc -arch=sm_89 -cubin kernel.cu -o kernel.cubin
cuobjdump --dump-sass kernel.cubin > kernel.sass
cargo run -- -i kernel.sass -o kernel.pseudo.c

Multi-function dumps are split automatically. --analysis-json reports every selected function; use --function <name> to narrow the report or when you want one function for --cfg-dot / --ssa-dot.

CLI

-i, --input <INPUT>              Input SASS file (default: bundled sample)
-o, --output <OUTPUT>            Output file for pseudocode, DOT, or analysis JSON
    --cfg-dot                    Dump CFG as DOT
    --ssa-dot                    Dump optimized SSA as DOT
    --analysis-json              Emit static-analysis facts as JSON
    --hazards-only               With --analysis-json, emit only functions that contain hazards
    --function <FUNCTION>        Select one `Function :` section by name
    --abi-profile <ABI_PROFILE>  Force ABI profile (`auto|legacy140|modern160`)
    --explain-abi                Include ABI analysis comments in structured output

Repo layout

  • src/parser.rs - tolerant SASS decode
  • src/cfg.rs - basic-block and CFG construction
  • src/ir.rs - SSA IR construction and DOT rendering
  • src/ir_* - IR optimization passes
  • src/function_analysis.rs - canonical post-SSA fact base
  • src/analysis_report.rs - JSON/static-analysis report surface
  • src/abi/ and src/abi_inventory.rs - ABI profile detection, aliases, const-memory facts
  • src/memory_model.rs and src/memory_inventory.rs - memory opcode semantics and memory fact reporting
  • src/live_in_inventory.rs - live-in / undefined-use inventory
  • src/call_inventory.rs - call-site proof-debt classification
  • src/control_flow_hazards.rs - conservative control-flow uncertainty scanner
  • src/cfg_metrics.rs - CFG complexity and irreducibility metrics
  • src/structurizer/ and src/structurizer_inventory.rs - control-flow collapse and fallback diagnostics
  • src/ast_lowering/ - structured lowering from SSA + analysis facts to AST
  • src/ast_passes/ and src/ast_proofs.rs - cleanup passes and proof records
  • src/backend_pipeline.rs - canonical driver used by the CLI and examples
  • tests/ - unit, integration, corpus, and API-surface tests
  • pipeline_audit.html - visual map of the pipeline and the main fixed/open design issues

For implementation details, see docs/dev/current_architecture.md and docs/dev/development.md.

About

A vide-coded, experimental CUDA SASS decompiler

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages