|
| 1 | +# Nix Builder design |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +kernel-builder uses a Nix-based builder that orchestrates the build. The Nix |
| 6 | +builder provides: |
| 7 | + |
| 8 | +- Reproducible evaluation. The same Nix builder version will always produce |
| 9 | + the same derivations (build recipes). |
| 10 | +- Largely reproducible builds by using a build sandbox that only has the |
| 11 | + dependencies specified in a derivation. |
| 12 | +- Seamless creation of different build environments (e.g. different Torch |
| 13 | + and CUDA combinations). |
| 14 | + |
| 15 | +## Kernel build steps |
| 16 | + |
| 17 | +A kernel derivation builds a kernel in the following steps: |
| 18 | + |
| 19 | +1. Generate CMake files for the kernel using |
| 20 | + `kernel-builder create-pyproject`. |
| 21 | +2. Generate Ninja build files using CMake. |
| 22 | +3. Build the kernel using Ninja. |
| 23 | +4. Perform various checks on the compiled kernel, such as: |
| 24 | + - Verify that the kernel only uses ABI3/`manylinux_2_28` symbols. |
| 25 | + - Verify that the kernel can be loaded by the `kernels` Python package. |
| 26 | +5. Strip runpaths (ELF-embedded library directories) from kernel binaries |
| 27 | + to make the kernel distribution-independent. |
| 28 | + |
| 29 | +## manylinux_2_28 compatibility |
| 30 | + |
| 31 | +To achieve `manylinux_2_28` compatibility, kernels are built using a |
| 32 | +toolchain similar to the `manylinux_2_28` Docker images. This toolchain |
| 33 | +is based on the gcc toolsets from AlmaLinux 8. `manylinux_2_28` [uses |
| 34 | +AlmaLinux 8 as its base](https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based), |
| 35 | +so we have to compile against the same glibc/libstdc++ versions to |
| 36 | +ensure compatibility. |
| 37 | + |
| 38 | +We repackage the AlmaLinux 8 toolsets and libstdc++ as Nix derivations (see |
| 39 | +the `nix-builder/packages/manylinux_2_28` source directory). Then we merge |
| 40 | +various toolset packages to an unwrapped gcc that resembles unwrapped gcc in |
| 41 | +nixpkgs. Finally, we wrap binutils and gcc to combine them into a stdenv. |
| 42 | + |
| 43 | +The stdenv does not reuse glibc from AlmaLinux, since its dynamic loader has |
| 44 | +hardcoded FHS paths (`/lib64` etc.) that are not valid in Nix. Using this |
| 45 | +dynamic loader results in linking errors, since the paths in the dynamic |
| 46 | +loader are used as a last resort (to link glibc libraries). So, instead we |
| 47 | +build our own glibc 2.28 package |
| 48 | +(see `nix-builder/pkgs/manylinux_2_28/stdenv.nix`) and use that. |
| 49 | + |
| 50 | +## The package set pattern |
| 51 | + |
| 52 | +We repackage various existing package sets as Nix derivations. For instance, |
| 53 | +this is done for ROCm, XPU, and manylinux_2_28 packages. We do this because |
| 54 | +we want these libraries to be as close as what the user would install. This |
| 55 | +avoids compatibility issues between the kernels and the official vendor |
| 56 | +packages. For instance, suppose that we built a ROCm library as a shared |
| 57 | +library and ROCm provides the same library as a static library, then compiled |
| 58 | +kernels could use symbols that cannot be resolved when installing the official |
| 59 | +ROCm packages. Similarly, using the official packages allows us to test |
| 60 | +against the official upstram packages. |
| 61 | + |
| 62 | +These package sets all follow the same pattern: |
| 63 | + |
| 64 | +```nix |
| 65 | +{ |
| 66 | + lib, |
| 67 | + callPackage, |
| 68 | + newScope, |
| 69 | + pkgs, |
| 70 | +}: |
| 71 | +
|
| 72 | +{ |
| 73 | + packageMetadata, |
| 74 | +}: |
| 75 | +
|
| 76 | +let |
| 77 | + inherit (lib.fixedPoints) extends composeManyExtensions; |
| 78 | +
|
| 79 | + fixedPoint = final: { |
| 80 | + inherit lib; |
| 81 | + }; |
| 82 | + composed = lib.composeManyExtensions [ |
| 83 | + # Base package set. |
| 84 | + (import ./components.nix { inherit packageMetadata; }) |
| 85 | +
|
| 86 | + # Package-specific overrides. |
| 87 | + (import ./overrides.nix) |
| 88 | +
|
| 89 | + # Additional overlays that extend the package set. |
| 90 | + (import ./some-overlay.nix) |
| 91 | + ]; |
| 92 | +in |
| 93 | +lib.makeScope newScope (lib.extends composed fixedPoint) |
| 94 | +``` |
| 95 | + |
| 96 | +We use a fixed point to build up the package set as a list of |
| 97 | +[overlays](https://nixos.org/manual/nixpkgs/stable/#sec-overlays-definition). |
| 98 | +This has various benefits. For instance, it allows us to refine the |
| 99 | +package set incrementally and we can refer to the final versions of |
| 100 | +packages in intermediate overlays. |
| 101 | + |
| 102 | +The package sets all use a similar list of overlays: |
| 103 | + |
| 104 | +- An initial overlay (`components.nix`) that applies a generic builder |
| 105 | + to the package set metadata. The metadata typically comes from a Yum/DNF |
| 106 | + repository that contains RPM packages.The generic builder will extract the |
| 107 | + RPMs and move binaries, libraries, and headers to the right location. This |
| 108 | + results in a set of Nix derivations that may or may not build. |
| 109 | +- The next overlay (`overrides.nix`) fixes up derivations generated by the |
| 110 | + generic builder in the previous overlay that do not build. Fixing the |
| 111 | + derivations typically consists of adding missing dependencies and changing |
| 112 | + embedded FHS paths to Nix store paths. |
| 113 | +- Additional overlays with derivations that combine outputs from previous |
| 114 | + overlays. One typical example are derivations that construct a full compiler |
| 115 | + toolchain (e.g. `nix-builder/pkgs/manylinux_2_28/gcc-unwrapped.nix`). |
0 commit comments