Skip to content

Commit 8ffffd3

Browse files
authored
nix-builder: switch to manylinux_2_28, dynamically link libstdc++ (#558)
Before this change, we were ensuring manylinux_2_28 compliance by: * Building and linking against glibc 2.27 (there was no nixpkgs revision with 2.28). * Linking libstdc++ statically. Linking libstdc++ statically has turned out to be a pain point. If a kernel uses libstdc++ functionality that does some global initialization, the initialization of the dynamically-linked libstdc++ (e.g. through Torch) and the statically linked libstdc++ could conflict. One example is `std::regex`, which indirectly initializes some global code for locale handling. As an alternative, I explored just linking libstdc++ dynamically, which worked up to some point by avoiding certain C++ features. However gcc 13 moved the initialization of the standard stream objects into the shared library. As a result, library compiled with gcc 13 or later requires a libstdc++ that is newer than manylinux_2_28. The EL8 (RHEL, AlmaLinux, etc.) gcc toolset used in the manylinux_2_28 solves this by using EL8 libstdc++ and providing a static library that gets linked into a binary for newer C++ features. Since this is part of a fairly large patchset to gcc, it does not seem feasible to reproduce this easily in the Nix gcc derivations. So, instead, we repackage the gcc toolsets from AlmaLinux as Nix derivations in this change. This gives us a toolchain that is as close to the official manylinux_2_28 toolchain as possible. The packaged toolchains are exposed as a standard nixpkgs stdenv, so that they can be used with other Nix derivations (such as the Torch/tvm-ffi extensions). One exception is made in reproducing the toolchain and that is that we build glibc 2.28 ourselves. glibc carries the dynamic loader. The dynamic loader in AlmaLinux embeds FHS paths such as `/lib64`, which are not valid in Nix, and lead to linking errors since the library directory of e.g. `libc.so.6` cannot be found. By rebuilding glibc 2.28, we get a dynamic loader with the correct paths into the Nix store.
1 parent 7f87c71 commit 8ffffd3

53 files changed

Lines changed: 7197 additions & 137 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/build_kernel.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ jobs:
5050
name: built-kernels-${{ matrix.arch }}
5151
path: |
5252
activation-kernel
53+
cpp20-symbols-kernel
5354
cutlass-gemm-kernel
5455
cutlass-gemm-tvm-ffi-kernel
5556
extra-data

docs/source/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,7 @@
6666
- local: builder-cli
6767
title: Builder CLI Reference
6868
title: CLI Reference
69+
- sections:
70+
- local: builder/design-nix-builder
71+
title: Nix Builder
72+
title: Design
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Nix Builder design
2+
3+
## Introduction
4+
5+
kernel-builder uses a Nix-based builder that orchestrates the build. The Nix
6+
builder provides:
7+
8+
- Reproducible evaluation. The same Nix builder version will always produce
9+
the same derivations (build recipes).
10+
- Largely reproducible builds by using a build sandbox that only has the
11+
dependencies specified in a derivation.
12+
- Seamless creation of different build environments (e.g. different Torch
13+
and CUDA combinations).
14+
15+
## Kernel build steps
16+
17+
A kernel derivation builds a kernel in the following steps:
18+
19+
1. Generate CMake files for the kernel using
20+
`kernel-builder create-pyproject`.
21+
2. Generate Ninja build files using CMake.
22+
3. Build the kernel using Ninja.
23+
4. Perform various checks on the compiled kernel, such as:
24+
- Verify that the kernel only uses ABI3/`manylinux_2_28` symbols.
25+
- Verify that the kernel can be loaded by the `kernels` Python package.
26+
5. Strip runpaths (ELF-embedded library directories) from kernel binaries
27+
to make the kernel distribution-independent.
28+
29+
## manylinux_2_28 compatibility
30+
31+
To achieve `manylinux_2_28` compatibility, kernels are built using a
32+
toolchain similar to the `manylinux_2_28` Docker images. This toolchain
33+
is based on the gcc toolsets from AlmaLinux 8. `manylinux_2_28` [uses
34+
AlmaLinux 8 as its base](https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based),
35+
so we have to compile against the same glibc/libstdc++ versions to
36+
ensure compatibility.
37+
38+
We repackage the AlmaLinux 8 toolsets and libstdc++ as Nix derivations (see
39+
the `nix-builder/packages/manylinux_2_28` source directory). Then we merge
40+
various toolset packages to an unwrapped gcc that resembles unwrapped gcc in
41+
nixpkgs. Finally, we wrap binutils and gcc to combine them into a stdenv.
42+
43+
The stdenv does not reuse glibc from AlmaLinux, since its dynamic loader has
44+
hardcoded FHS paths (`/lib64` etc.) that are not valid in Nix. Using this
45+
dynamic loader results in linking errors, since the paths in the dynamic
46+
loader are used as a last resort (to link glibc libraries). So, instead we
47+
build our own glibc 2.28 package
48+
(see `nix-builder/pkgs/manylinux_2_28/stdenv.nix`) and use that.
49+
50+
## The package set pattern
51+
52+
We repackage various existing package sets as Nix derivations. For instance,
53+
this is done for ROCm, XPU, and manylinux_2_28 packages. We do this because
54+
we want these libraries to be as close as what the user would install. This
55+
avoids compatibility issues between the kernels and the official vendor
56+
packages. For instance, suppose that we built a ROCm library as a shared
57+
library and ROCm provides the same library as a static library, then compiled
58+
kernels could use symbols that cannot be resolved when installing the official
59+
ROCm packages. Similarly, using the official packages allows us to test
60+
against the official upstram packages.
61+
62+
These package sets all follow the same pattern:
63+
64+
```nix
65+
{
66+
lib,
67+
callPackage,
68+
newScope,
69+
pkgs,
70+
}:
71+
72+
{
73+
packageMetadata,
74+
}:
75+
76+
let
77+
inherit (lib.fixedPoints) extends composeManyExtensions;
78+
79+
fixedPoint = final: {
80+
inherit lib;
81+
};
82+
composed = lib.composeManyExtensions [
83+
# Base package set.
84+
(import ./components.nix { inherit packageMetadata; })
85+
86+
# Package-specific overrides.
87+
(import ./overrides.nix)
88+
89+
# Additional overlays that extend the package set.
90+
(import ./some-overlay.nix)
91+
];
92+
in
93+
lib.makeScope newScope (lib.extends composed fixedPoint)
94+
```
95+
96+
We use a fixed point to build up the package set as a list of
97+
[overlays](https://nixos.org/manual/nixpkgs/stable/#sec-overlays-definition).
98+
This has various benefits. For instance, it allows us to refine the
99+
package set incrementally and we can refer to the final versions of
100+
packages in intermediate overlays.
101+
102+
The package sets all use a similar list of overlays:
103+
104+
- An initial overlay (`components.nix`) that applies a generic builder
105+
to the package set metadata. The metadata typically comes from a Yum/DNF
106+
repository that contains RPM packages.The generic builder will extract the
107+
RPMs and move binaries, libraries, and headers to the right location. This
108+
results in a set of Nix derivations that may or may not build.
109+
- The next overlay (`overrides.nix`) fixes up derivations generated by the
110+
generic builder in the previous overlay that do not build. Fixing the
111+
derivations typically consists of adding missing dependencies and changing
112+
embedded FHS paths to Nix store paths.
113+
- Additional overlays with derivations that combine outputs from previous
114+
overlays. One typical example are derivations that construct a full compiler
115+
toolchain (e.g. `nix-builder/pkgs/manylinux_2_28/gcc-unwrapped.nix`).
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
[general]
2+
name = "cpp20-symbols"
3+
version = 1
4+
license = "Apache-2.0"
5+
backends = ["cpu"]
6+
7+
[torch]
8+
src = [
9+
"torch-ext/torch_binding.cpp",
10+
"torch-ext/torch_binding.h",
11+
]
12+
13+
[kernel.cpp20_symbols_cpu]
14+
backend = "cpu"
15+
depends = ["torch"]
16+
src = ["cpu/cpu.cpp"]
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#include <array>
2+
#include <charconv>
3+
#include <stdexcept>
4+
5+
#include <torch/all.h>
6+
7+
// std::to_chars(char*, char*, double) is a floating-point overload that
8+
// requires GLIBCXX_3.4.29, introduced in GCC 11. We use this to verify
9+
// that manylinux_2_28 kernels build correctly: the Red Hat toolset
10+
// statically links the newer libstdc++ symbols that exceed the system
11+
// GLIBCXX_3.4.25 ceiling of AlmaLinux 8 / RHEL 8.
12+
torch::Tensor float_to_chars(torch::Tensor const &input) {
13+
std::array<char, 32> buf;
14+
auto [ptr, ec] = std::to_chars(buf.begin(), buf.end(), input.item<double>());
15+
if (ec != std::errc{})
16+
throw std::runtime_error("to_chars failed");
17+
return input;
18+
}
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
description = "Flake for invalid-cpp-manylinux-symbols kernel";
3+
4+
inputs = {
5+
kernel-builder.url = "path:../../..";
6+
};
7+
8+
outputs =
9+
{
10+
self,
11+
kernel-builder,
12+
}:
13+
kernel-builder.lib.genKernelFlakeOutputs {
14+
inherit self;
15+
path = ./.;
16+
};
17+
}

examples/kernels/cpp20-symbols/tests/__init__.py

Whitespace-only changes.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
import cpp20_symbols
2+
import pytest
3+
import torch
4+
5+
6+
@pytest.mark.kernels_ci
7+
def test_float_to_chars_runs():
8+
x = torch.tensor([3.14], dtype=torch.float64)
9+
out = cpp20_symbols.float_to_chars(x)
10+
torch.testing.assert_close(out, x)
11+
12+
13+
@pytest.mark.kernels_ci
14+
def test_float_to_chars_float32():
15+
x = torch.tensor([2.71828], dtype=torch.float32)
16+
out = cpp20_symbols.float_to_chars(x)
17+
torch.testing.assert_close(out, x)
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
import torch
2+
3+
from ._ops import ops
4+
5+
6+
def float_to_chars(input: torch.Tensor) -> torch.Tensor:
7+
return ops.float_to_chars(input)
8+
9+
10+
__all__ = ["float_to_chars"]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#include <torch/library.h>
2+
3+
#include "registration.h"
4+
#include "torch_binding.h"
5+
6+
TORCH_LIBRARY_EXPAND(TORCH_EXTENSION_NAME, ops) {
7+
ops.def("float_to_chars(Tensor input) -> Tensor");
8+
#if defined(CPU_KERNEL)
9+
ops.impl("float_to_chars", torch::kCPU, &float_to_chars);
10+
#endif
11+
}
12+
13+
REGISTER_EXTENSION(TORCH_EXTENSION_NAME)

0 commit comments

Comments
 (0)