|
3 | 3 | The Replit RTLD Loader allows dynamically loaded shared libraries (.so) |
4 | 4 | to work seamlessly in Repls. It uses |
5 | 5 | the [rtld-audit API](https://man7.org/linux/man-pages/man7/rtld-audit.7.html) |
6 | | -to observe a process' library loading activity to learn which Nix channel |
7 | | -the binary was built from (if any), and then uses REPLIT_LD_LIBRARY_PATH variable to resolve |
8 | | -libraries. It is a better alternative than using `LD_LIBRARY_PATH` because: |
| 6 | +to observe a process's library loading activities. When the native loader ld-linux |
| 7 | +fails to find a library, RTLD loader steps in resolves the desired library using |
| 8 | +directories in the `REPLIT_LD_LIBRARY_PATH` variable. It is a better alternative |
| 9 | +than using `LD_LIBRARY_PATH` because rather than overriding the default behavior |
| 10 | +of the system loader, it acts as a fallback. |
9 | 11 |
|
10 | | -1. rather than overriding the default behavior of the system loader, it acts as a fallback |
11 | | -2. it is aware of how Nix-built binaries work |
| 12 | +## Background and Motivation |
12 | 13 |
|
13 | | -See [LD_AUDIT-based Shared Library Loader Experiments](https://docs.google.com/document/d/1llRzZdBZIKDFk5n5NQromYMCeDaYUCB9pVLy2vahTH4) |
14 | | -for more background. |
| 14 | +At Replit we use Nix to deliver almost all our software to users. However, we noticed |
| 15 | +users experiencing programs crashing with errors like these: |
| 16 | + |
| 17 | +``` |
| 18 | +symbol lookup error: /nix/store/dg8mpqqykmw9c7l0bgzzb5znkymlbfjw-glibc-2.37-8/lib/libc.so.6: undefined symbol: _dl_audit_symbind_alt, version GLIBC_PRIVATE |
| 19 | +``` |
| 20 | + |
| 21 | +``` |
| 22 | +/nix/store/dg8mpqqykmw9c7l0bgzzb5znkymlbfjw-glibc-2.37-8/lib/libm.so.6: version `GLIBC_2.38' not found (required by /nix/store/8w6mm5q1n7i7cs1933im5vkbgvjlglfn-python3-3.10.13/lib/libpython3.10.so.1.0) |
| 23 | +``` |
| 24 | + |
| 25 | +glibc is the GNU standard C library, a fundamental library used by virtually all programs. |
| 26 | +These errors mean there is a mismatch between the version of glibc required by a library and the one that's available. |
| 27 | +This can happen when we have programs and libraries from different Nix channels interacting with each other. |
| 28 | + |
| 29 | +For example, if a Python program uses libcairo (maybe via pycairo), an entry containing the `libcairo.so` shared library would be added to the `LD_LIBRARY_PATH` variable, telling the system library loader to look for libraries there in addition to the normal places. But if Python is built from a different Nix channel from libcairo, they may depend on different versions of glibc. Python will get to choose its desired glibc version, but if it is incompatible with libcairo because its Nix channel is older than that of libcairo, the program will crash when we try to load libcairo. |
| 30 | + |
| 31 | +We found the approach of using `LD_LIBRARY_PATH` too heavy-handed: it forced programs to abide by it even if the program already knows where its required its compatible libraries are, via its own [runpath](https://amir.rachum.com/shared-libraries/). A tamer approach is called for. With the RTLD loader, we now use the `REPLIT_LD_LIBRARY_PATH` variable, which will be used to search libraries only after loader fails to find the required libraries within the program's runpath. This plus delivering our software on the latest Nix channel will help us get rid of those glibc version mis-match problems. |
15 | 32 |
|
16 | 33 | ## How it Works |
17 | 34 |
|
18 | 35 | 1. Activate the loader via the LD_AUDIT variable when running a program, ex: `LD_AUDIT=rtld_loader.so python main.py` |
19 | | -2. The loader will observe which `libc.so` is loaded when the program runs based on the la_objopen hook. If it is |
20 | | - from a known Nix path, we'll use it to infer the Nix channel the program was built from. |
21 | | -3. If the system loader fails locate the library, it will also search the directory paths within `REPLIT_LD_LIBRARY_PATH` |
22 | | - for a matching library. |
| 36 | +2. If the system loader fails to locate a library, say `libcairo.so`, it will search the directories within `REPLIT_LD_LIBRARY_PATH` |
| 37 | +for a library with that name. |
| 38 | + |
| 39 | +How does it tell the system loader has failed to load a library? We use a `la_objsearch` hook |
| 40 | +in the [rtld_loader API](https://man7.org/linux/man-pages/man7/rtld-audit.7.html). If the `flag` argument |
| 41 | +passed in is equal to `LA_SER_DEFAULT`, that means the system loader has failed to find the requested library |
| 42 | +from the `runpath` entries of the binary executable and is instead defaulting to searching the system library |
| 43 | +paths. RTLD loader detects this when it occurs and intercepts the request, searches for the requested library |
| 44 | +via directories listed in `REPLIT_LD_LIBRARY_PATH`, and returns the full path of the library if it is found. |
23 | 45 |
|
24 | 46 | ## Logging |
25 | 47 |
|
|
0 commit comments