Rust-GPU
diff --git a/‎crates/cuda_builder/src/lib.rs‎
Lines changed: 20 additions & 6 deletions b/‎crates/cuda_builder/src/lib.rs‎
Lines changed: 20 additions & 6 deletions
@@ -94,14 +94,26 @@ pub struct CudaBuilder {
     /// Maxwell (5.x) will be deprecated in CUDA 12 and we anticipate for that. Moreover,
     /// `6.x` contains support for things like f64 atomic add and half precision float ops.
     ///
-    /// ## Target Features for Conditional Compilation
+    /// ## Architecture Suffixes (CUDA 12.9+)
+    ///
+    /// Starting with CUDA 12.9, architectures can have suffixes:
+    ///
+    /// - **No suffix** (e.g., `Compute70`): Forward-compatible across all future GPUs.
+    ///   Best for general compatibility.
+    /// - **'f' suffix** (e.g., `Compute100f`): Family-specific features, forward-compatible
+    ///   within same major version (10.0, 10.3, etc.) but NOT across major versions.
+    /// - **'a' suffix** (e.g., `Compute100a`): Architecture-specific features (mainly Tensor Cores).
+    ///   Code ONLY runs on that exact compute capability, no compatibility with any other GPU.
     ///
-    /// The chosen architecture enables a target feature that can be used for
-    /// conditional compilation with `#[cfg(target_feature = "compute_XX")]`.
-    /// This feature means "at least this capability", matching NVIDIA's semantics.
+    /// Most applications should use base architectures (no suffix). Only use 'f' or 'a'
+    /// if you need specific features and understand the compatibility trade-offs.
     ///
-    /// For other patterns (exact ranges, maximum capabilities), use boolean `cfg` logic.
-    /// See the compute capabilities guide for examples.
+    /// ## Target Features for Conditional Compilation
+    ///
+    /// The chosen architecture enables target features for conditional compilation:
+    /// - Base arch: `#[cfg(target_feature = "compute_70")]` - enabled on 7.0+
+    /// - Family variant: `#[cfg(target_feature = "compute_100f")]` - enabled only on 10.x family
+    /// - Arch variant: `#[cfg(target_feature = "compute_100a")]` - enabled only on exact 10.0
     ///
     /// For example, with `.arch(NvvmArch::Compute61)`:
     /// ```ignore
@@ -110,6 +122,8 @@ pub struct CudaBuilder {
     ///     // Code that requires compute capability 6.1+
     /// }
     /// ```
+    ///
+    /// See: <https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/>
     pub arch: NvvmArch,
     /// Flush denormal values to zero when performing single-precision floating point operations.
     /// `false` by default.