Skip to content

Commit 3615479

Browse files
authored
Merge pull request rust-lang#2851 from rust-lang/tshepang/sembr
sembr a few files
2 parents 784330e + 4e867ff commit 3615479

5 files changed

Lines changed: 114 additions & 61 deletions

File tree

src/autodiff/installation.md

Lines changed: 43 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,44 @@
11
# Installation
22

3-
In the near future, `std::autodiff` should become available for users via rustup. As a rustc/enzyme/autodiff contributor however, you will still need to build rustc from source.
4-
For the meantime, you can download up-to-date builds to enable `std::autodiff` on your latest nightly toolchain, if you are using either of:
5-
**Linux**, with `x86_64-unknown-linux-gnu` or `aarch64-unknown-linux-gnu`
6-
**Windows**, with `x86_64-llvm-mingw` or `aarch64-llvm-mingw`
3+
In the near future, `std::autodiff` should become available for users via rustup.
4+
As a rustc/enzyme/autodiff contributor however, you will still need to build rustc from source.
5+
For the meantime, you can download up-to-date builds to enable `std::autodiff` on your latest nightly toolchain, if you are using either of:
6+
**Linux**, with `x86_64-unknown-linux-gnu` or `aarch64-unknown-linux-gnu`
7+
**Windows**, with `x86_64-llvm-mingw` or `aarch64-llvm-mingw`
78

8-
You can also download slightly outdated builds for **Apple** (aarch64-apple), which should generally work for now.
9+
You can also download slightly outdated builds for **Apple** (aarch64-apple), which should generally work for now.
910

10-
If you need any other platform, you can build rustc including autodiff from source. Please open an issue if you want to help enabling automatic builds for your prefered target.
11+
If you need any other platform, you can build rustc including autodiff from source.
12+
Please open an issue if you want to help enabling automatic builds for your prefered target.
1113

1214
## Installation guide
1315

14-
If you want to use `std::autodiff` and don't plan to contribute PR's to the project, then we recommend to just use your existing nightly installation and download the missing component. In the future, rustup will be able to do it for you.
16+
If you want to use `std::autodiff` and don't plan to contribute PR's to the project, then we recommend to just use your existing nightly installation and download the missing component.
17+
In the future, rustup will be able to do it for you.
1518
For now, you'll have to manually download and copy it.
1619

1720
1) On our github repository, find the last merged PR: [`Repo`]
18-
2) Scroll down to the lower end of the PR, where you'll find a rust-bors message saying `Test successful` with a `CI` link.
19-
3) Click on the `CI` link, and grep for your target. E.g. `dist-x86_64-linux` or `dist-aarch64-llvm-mingw` and click `Load summary`.
21+
2) Scroll down to the lower end of the PR, where you'll find a rust-bors message saying `Test successful` with a `CI` link.
22+
3) Click on the `CI` link, and grep for your target. E.g. `dist-x86_64-linux` or `dist-aarch64-llvm-mingw` and click `Load summary`.
2023
4) Under the `CI artifacts` section, find the `enzyme-nightly` artifact, download, and unpack it.
2124
5) Copy the artifact (libEnzyme-22.so for linux, libEnzyme-22.dylib for apple, etc.), which should be in a folder named `enzyme-preview`, to your rust toolchain directory. E.g. for linux: `cp ~/Downloads/enzyme-nightly-x86_64-unknown-linux-gnu/enzyme-preview/lib/rustlib/x86_64-unknown-linux-gnu/lib/libEnzyme-22.so ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib`
2225

23-
Apple support was temporarily reverted, due to downstream breakages. If you want to download autodiff for apple, please look at the artifacts from this [`run`].
26+
Apple support was temporarily reverted, due to downstream breakages.
27+
If you want to download autodiff for apple, please look at the artifacts from this [`run`].
2428

2529
## Installation guide for Nix user.
2630

27-
This setup was recommended by a nix and autodiff user. It uses [`Overlay`]. Please verify for yourself if you are comfortable using that repository.
31+
This setup was recommended by a nix and autodiff user.
32+
It uses [`Overlay`].
33+
Please verify for yourself if you are comfortable using that repository.
2834
In that case you might use the following nix configuration to get a rustc that supports `std::autodiff`.
2935
```nix
3036
{
3137
enzymeLib = pkgs.fetchzip {
3238
url = "https://ci-artifacts.rust-lang.org/rustc-builds/ec818fda361ca216eb186f5cf45131bd9c776bb4/enzyme-nightly-x86_64-unknown-linux-gnu.tar.xz";
3339
sha256 = "sha256-Rnrop44vzS+qmYNaRoMNNMFyAc3YsMnwdNGYMXpZ5VY=";
3440
};
35-
41+
3642
rustToolchain = pkgs.symlinkJoin {
3743
name = "rust-with-enzyme";
3844
paths = [pkgs.rust-bin.nightly.latest.default];
@@ -48,59 +54,64 @@ In that case you might use the following nix configuration to get a rustc that s
4854

4955
## Build instructions
5056

51-
First you need to clone and configure the Rust repository. Based on your preferences, you might also want to `--enable-clang` or `--enable-lld`.
52-
```bash
57+
First you need to clone and configure the Rust repository.
58+
Based on your preferences, you might also want to `--enable-clang` or `--enable-lld`.
59+
```console
5360
git clone git@github.com:rust-lang/rust
5461
cd rust
5562
./configure --release-channel=nightly --enable-llvm-enzyme --enable-llvm-link-shared --enable-llvm-assertions --enable-ninja --enable-option-checking --disable-docs --set llvm.download-ci-llvm=false
5663
```
5764

5865
Afterwards you can build rustc using:
59-
```bash
66+
```console
6067
./x build --stage 1 library
6168
```
6269

6370
Afterwards rustc toolchain link will allow you to use it through cargo:
64-
```
71+
```console
6572
rustup toolchain link enzyme build/host/stage1
6673
rustup toolchain install nightly # enables -Z unstable-options
6774
```
6875

6976
You can then run our test cases:
7077

71-
```bash
78+
```console
7279
./x test --stage 1 tests/codegen-llvm/autodiff
7380
./x test --stage 1 tests/pretty/autodiff
7481
./x test --stage 1 tests/ui/autodiff
7582
./x test --stage 1 tests/run-make/autodiff
7683
./x test --stage 1 tests/ui/feature-gates/feature-gate-autodiff.rs
7784
```
7885

79-
Autodiff is still experimental, so if you want to use it in your own projects, you will need to add `lto="fat"` to your Cargo.toml
80-
and use `RUSTFLAGS="-Zautodiff=Enable" cargo +enzyme` instead of `cargo` or `cargo +nightly`.
86+
Autodiff is still experimental, so if you want to use it in your own projects, you will need to add `lto="fat"` to your Cargo.toml
87+
and use `RUSTFLAGS="-Zautodiff=Enable" cargo +enzyme` instead of `cargo` or `cargo +nightly`.
8188

8289
## Compiler Explorer and dist builds
8390

84-
Our compiler explorer instance can be updated to a newer rustc in a similar way. First, prepare a docker instance.
85-
```bash
91+
Our compiler explorer instance can be updated to a newer rustc in a similar way.
92+
First, prepare a docker instance.
93+
```console
8694
docker run -it ubuntu:22.04
8795
export CC=clang CXX=clang++
8896
apt update
89-
apt install wget vim python3 git curl libssl-dev pkg-config lld ninja-build cmake clang build-essential
97+
apt install wget vim python3 git curl libssl-dev pkg-config lld ninja-build cmake clang build-essential
9098
```
9199
Then build rustc in a slightly altered way:
92-
```bash
100+
```console
93101
git clone https://github.com/rust-lang/rust
94102
cd rust
95103
./configure --release-channel=nightly --enable-llvm-enzyme --enable-llvm-link-shared --enable-llvm-assertions --enable-ninja --enable-option-checking --disable-docs --set llvm.download-ci-llvm=false
96104
./x dist
97105
```
98-
We then copy the tarball to our host. The dockerid is the newest entry under `docker ps -a`.
99-
```bash
106+
We then copy the tarball to our host.
107+
The dockerid is the newest entry under `docker ps -a`.
108+
```console
100109
docker cp <dockerid>:/rust/build/dist/rust-nightly-x86_64-unknown-linux-gnu.tar.gz rust-nightly-x86_64-unknown-linux-gnu.tar.gz
101110
```
102111
Afterwards we can create a new (pre-release) tag on the EnzymeAD/rust repository and make a PR against the EnzymeAD/enzyme-explorer repository to update the tag.
103-
Remember to ping `tgymnich` on the PR to run his update script. Note: We should archive EnzymeAD/rust and update the instructions here. The explorer should soon
112+
Remember to ping `tgymnich` on the PR to run his update script.
113+
Note: We should archive EnzymeAD/rust and update the instructions here.
114+
The explorer should soon
104115
be able to get the rustc toolchain from the official rust servers.
105116

106117

@@ -110,7 +121,7 @@ Following the Rust build instruction above will build LLVMEnzyme, LLDEnzyme, and
110121
We recommend that approach, if you just want to use any of them and have no experience with cmake.
111122
However, if you prefer to just build Enzyme without Rust, then these instructions might help.
112123

113-
```bash
124+
```console
114125
git clone git@github.com:llvm/llvm-project
115126
cd llvm-project
116127
mkdir build
@@ -121,15 +132,16 @@ ninja install
121132
```
122133
This gives you a working LLVM build, now we can continue with building Enzyme.
123134
Leave the `llvm-project` folder, and execute the following commands:
124-
```bash
135+
```console
125136
git clone git@github.com:EnzymeAD/Enzyme
126137
cd Enzyme/enzyme
127-
mkdir build
128-
cd build
138+
mkdir build
139+
cd build
129140
cmake .. -G Ninja -DLLVM_DIR=<YourLocalPath>/llvm-project/build/lib/cmake/llvm/ -DLLVM_EXTERNAL_LIT=<YourLocalPath>/llvm-project/llvm/utils/lit/lit.py -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=YES -DBUILD_SHARED_LIBS=ON
130141
ninja
131142
```
132-
This will build Enzyme, and you can find it in `Enzyme/enzyme/build/lib/<LLD/Clang/LLVM/lib>Enzyme.so`. (Endings might differ based on your OS).
143+
This will build Enzyme, and you can find it in `Enzyme/enzyme/build/lib/<LLD/Clang/LLVM/lib>Enzyme.so`.
144+
(Endings might differ based on your OS).
133145

134146
[`Repo`]: https://github.com/rust-lang/rust/
135147
[`run`]: https://github.com/rust-lang/rust/pull/153026#issuecomment-3950046599

src/offload/contributing.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
# Contributing
22

3-
Contributions are always welcome. This project is experimental, so the documentation and code are likely incomplete. Please ask on [Zulip](https://rust-lang.zulipchat.com/#narrow/channel/422870-t-compiler.2Fgpgpu-backend) (preferred) or the Rust Community Discord for help if you get stuck or if our documentation is unclear.
3+
Contributions are always welcome.
4+
This project is experimental, so the documentation and code are likely incomplete.
5+
Please ask on [Zulip](https://rust-lang.zulipchat.com/#narrow/channel/422870-t-compiler.2Fgpgpu-backend) (preferred) or the Rust Community Discord for help if you get stuck or if our documentation is unclear.
46

5-
We generally try to automate as much of the compilation process as possible for users. However, as a contributor it might sometimes be easier to directly rewrite and compile the LLVM-IR modules (.ll) to quickly iterate on changes, without needing to repeatedly recompile rustc. For people familiar with LLVM we therefore have the shell script below. Only when you are then happy with the IR changes you can work on updating rustc to generate the new, desired output.
7+
We generally try to automate as much of the compilation process as possible for users.
8+
However, as a contributor it might sometimes be easier to directly rewrite and compile the LLVM-IR modules (.ll) to quickly iterate on changes, without needing to repeatedly recompile rustc.
9+
For people familiar with LLVM we therefore have the shell script below.
10+
Only when you are then happy with the IR changes you can work on updating rustc to generate the new, desired output.
611

712
```sh
813
set -e
@@ -29,4 +34,6 @@ opt lib.ll -o lib.bc
2934
LIBOMPTARGET_INFO=-1 OFFLOAD_TRACK_ALLOCATION_TRACES=true ./a.out
3035
```
3136

32-
Please update the `<path>` placeholders on the `clang-linker-wrapper` invocation. You will likely also need to adjust the library paths. See the linked usage section for details: [usage](usage.md#compile-instructions)
37+
Please update the `<path>` placeholders on the `clang-linker-wrapper` invocation.
38+
You will likely also need to adjust the library paths.
39+
See the linked usage section for details: [usage](usage.md#compile-instructions)

src/offload/installation.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,32 @@
11
# Installation
22

3-
`std::offload` is partly available in nightly builds for users. For now, everyone however still needs to build rustc from source to use all features of it.
3+
`std::offload` is partly available in nightly builds for users.
4+
For now, everyone however still needs to build rustc from source to use all features of it.
45

56
## Build instructions
67

78
First you need to clone and configure the Rust repository:
8-
```bash
9+
```console
910
git clone git@github.com:rust-lang/rust
1011
cd rust
1112
./configure --enable-llvm-link-shared --release-channel=nightly --enable-llvm-assertions --enable-llvm-offload --enable-llvm-enzyme --enable-clang --enable-lld --enable-option-checking --enable-ninja --disable-docs
1213
```
1314

1415
Afterwards you can build rustc using:
15-
```bash
16+
```console
1617
./x build --stage 1 library
1718
```
1819

1920
Afterwards rustc toolchain link will allow you to use it through cargo:
20-
```
21+
```console
2122
rustup toolchain link offload build/host/stage1
2223
rustup toolchain install nightly # enables -Z unstable-options
2324
```
2425

2526

2627

2728
## Build instruction for LLVM itself
28-
```bash
29+
```console
2930
git clone git@github.com:llvm/llvm-project
3031
cd llvm-project
3132
mkdir build
@@ -39,6 +40,6 @@ This gives you a working LLVM build.
3940

4041
## Testing
4142
run
42-
```
43+
```console
4344
./x test --stage 1 tests/codegen-llvm/gpu_offload
4445
```

src/offload/internals.md

Lines changed: 40 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,47 @@
11
# std::offload
22

3-
This module is under active development. Once upstream, it should allow Rust developers to run Rust code on GPUs.
3+
This module is under active development.
4+
Once upstream, it should allow Rust developers to run Rust code on GPUs.
45
We aim to develop a `rusty` GPU programming interface, which is safe, convenient and sufficiently fast by default.
5-
This includes automatic data movement to and from the GPU, in a efficient way. We will (later)
6-
also offer more advanced, possibly unsafe, interfaces which allow a higher degree of control.
6+
This includes automatic data movement to and from the GPU, in a efficient way.
7+
We will (later) also offer more advanced,
8+
possibly unsafe, interfaces which allow a higher degree of control.
79

8-
The implementation is based on LLVM's "offload" project, which is already used by OpenMP to run Fortran or C++ code on GPUs.
9-
While the project is under development, users will need to call other compilers like clang to finish the compilation process.
10+
The implementation is based on LLVM's "offload" project,
11+
which is already used by OpenMP to run Fortran or C++ code on GPUs.
12+
While the project is under development,
13+
users will need to call other compilers like clang to finish the compilation process.
1014

1115
## High-level compilation design:
12-
We use a single-source, two-pass compilation approach.
13-
14-
First we compile all functions that should be offloaded for the device (e.g nvptx64, amdgcn-amd-amdhsa, intel in the future). Currently we require cumbersome `#cfg(target_os="")` annotations, but we intend to recognize those in the future based on our offload intrinsic.
15-
This first compilation currently does not leverage rustc's internal Query system, so it will always recompile your kernels at the moment. This should be easy to fix, but we prioritize features and runtime performance improvements at the moment. Please reach out if you want to implement it, though!
16-
17-
We then compile the code for the host (e.g. x86-64), where most of the offloading logic happens. On the host side, we generate calls to the openmp offload runtime, to inform it about the layout of the types (a simplified version of the autodiff TypeTrees). We also use the type system to figure out whether kernel arguments have to be moved only to the device (e.g. `&[f32;1024]`), from the device, or both (e.g. `&mut [f64]`). We then launch the kernel, after which we inform the runtime to end this environment and move data back (as far as needed).
18-
19-
The second pass for the host will load the kernel artifacts from the previous compilation. rustc in general may not "guess" or hardcode the build directory layout, and as such it must be told the path to the kernel artifacts in the second invocation. The logic for this could be integrated into cargo, but it also only requires a trivial cargo wrapper, which we could trivially provide via crates.io till we see larger adoption.
20-
21-
It might seem tempting to think about a single-source, single pass compilation approach. However, a lot of the rustc frontend (e.g. AST) will drop any dead code (e.g. code behind an inactive `cfg`). Getting the frontend to expand and lower code for two targets naively will result in multiple definitions of the same symbol (and other issues). Trying to teach the whole rustc middle and backend to be aware that any symbol now might contain two implementations is a large undertaking, and it is questionable why we should make the whole compiler more complex, if the alternative is a ~5 line cargo wrapper. We still control the full compilation pipeline and have both host and device code available, therefore there shouldn't be a runtime performance difference between the two approaches.
2216

17+
We use a single-source, two-pass compilation approach.
18+
19+
First we compile all functions that should be offloaded for the device
20+
(e.g nvptx64, amdgcn-amd-amdhsa, intel in the future).
21+
Currently we require cumbersome `#cfg(target_os="")` annotations, but we intend to recognize those in the future based on our offload intrinsic.
22+
This first compilation currently does not leverage rustc's internal Query system, so it will always recompile your kernels at the moment.
23+
This should be easy to fix, but we prioritize features and runtime performance improvements at the moment.
24+
Please reach out if you want to implement it, though!
25+
26+
We then compile the code for the host (e.g. x86-64), where most of the offloading logic happens.
27+
On the host side, we generate calls to the openmp offload runtime,
28+
to inform it about the layout of the types (a simplified version of the autodiff TypeTrees).
29+
We also use the type system to figure out whether kernel arguments have to be moved only to the device (e.g. `&[f32;1024]`),
30+
from the device, or both (e.g. `&mut [f64]`).
31+
We then launch the kernel,
32+
after which we inform the runtime to end this environment and move data back (as far as needed).
33+
34+
The second pass for the host will load the kernel artifacts from the previous compilation.
35+
rustc in general may not "guess" or hardcode the build directory layout,
36+
and as such it must be told the path to the kernel artifacts in the second invocation.
37+
The logic for this could be integrated into cargo,
38+
but it also only requires a trivial cargo wrapper,
39+
which we could trivially provide via crates.io till we see larger adoption.
40+
41+
It might seem tempting to think about a single-source, single pass compilation approach.
42+
However, a lot of the rustc frontend (e.g. AST) will drop any dead code (e.g. code behind an inactive `cfg`).
43+
Getting the frontend to expand and lower code for two targets naively will result in multiple definitions of the same symbol (and other issues).
44+
Trying to teach the whole rustc middle and backend to be aware that any symbol now might contain two implementations is a large undertaking,
45+
and it is questionable why we should make the whole compiler more complex, if the alternative is a ~5 line cargo wrapper.
46+
We still control the full compilation pipeline and have both host and device code available,
47+
therefore there shouldn't be a runtime performance difference between the two approaches.

0 commit comments

Comments
 (0)