Skip to content

Latest commit

 

History

History
159 lines (128 loc) · 7.42 KB

File metadata and controls

159 lines (128 loc) · 7.42 KB

Phase 2 — The modern, feature-rich Linux 7.1 kernel

Goal: a kernel built around the newest and best features, deliberately dropping deprecated / 32-bit / legacy support, targeting current hardware.

Approach: start from the arch default (make defconfig — already a modern, QEMU-bootable config), then merge a single documented fragment, modern.config, that:

  • turns on flagship modern features and hardening,
  • drops 32-bit and legacy baggage,
  • enables newest-hardware support,

and build the whole thing with Clang/LLD (LLVM=1) — which is what unlocks kernel Control-Flow Integrity (kCFI).

The build artifacts and final .config are under artifacts/modern/.


Why Clang/LLVM instead of GCC?

The single biggest modern security feature for the kernel today is kCFI (CONFIG_CFI) — forward-edge Control-Flow Integrity. At every indirect call the compiler inserts a check that the target's type signature matches, which neutralises a huge class of ROP/JOP exploitation. kCFI requires Clang. On x86-64 the modern kCFI implementation does not need full LTO, so we get this flagship protection at modest build cost. (In 7.1 the old CONFIG_CFI_CLANG symbol is transitional; the live symbol is now CONFIG_CFI.)

The feature set (and why each matters)

Security & hardening (the headline)

Option What it buys
CONFIG_CFI kCFI — type-checked indirect calls; kills most ROP/JOP
CONFIG_INIT_STACK_ALL_ZERO every stack variable zero-initialised → no uninit-leak
CONFIG_INIT_ON_ALLOC_DEFAULT_ON heap zeroed on allocation
CONFIG_ZERO_CALL_USED_REGS scrub call-used registers on return (anti-ROP)
CONFIG_RANDOMIZE_BASE/_MEMORY KASLR for kernel text and memory map
CONFIG_RANDOM_KMALLOC_CACHES randomised slab caches → harder heap grooming
CONFIG_SLAB_FREELIST_HARDENED/_RANDOM, SHUFFLE_PAGE_ALLOCATOR allocator hardening
CONFIG_LIST_HARDENED, BUG_ON_DATA_CORRUPTION detect linked-list corruption
CONFIG_HARDENED_USERCOPY, FORTIFY_SOURCE, STACKPROTECTOR_STRONG classic, still essential
CONFIG_STRICT_KERNEL_RWX, STRICT_MODULE_RWX, VMAP_STACK W^X + guard-paged stacks

LSMs — modern sandboxing & access control

LANDLOCK (unprivileged sandboxing), LOCKDOWN_LSM, YAMA, SELINUX, APPARMOR, and BPF_LSM (policy in eBPF). SECURITY_DMESG_RESTRICT hides kernel pointers from non-root.

Scheduler / preemption

PREEMPT_DYNAMIC — switch preemption model at boot (preempt=none|voluntary| full|lazy), including 7.1's new lazy preemption. SCHED_CORE gives SMT core-scheduling (side-channel isolation), and schedutil is the default governor. (Linux's scheduler core is EEVDF since 6.6 — we're on top of it.)

Memory management

Multi-Gen LRU enabled at boot (LRU_GEN_ENABLED) — dramatically better page reclaim than the classic LRU. Transparent Huge Pages (madvise), zswap with ZSTD backed by zsmalloc, DAMON (data-access monitoring for proactive reclaim), KSM, userfaultfd.

Networking

BBR as the default TCP congestion control, fq/fq_codel/CAKE qdiscs, in-kernel WireGuard, kernel-TLS offload, nftables (iptables is legacy), and MPTCP (multipath TCP).

Storage & filesystems (modern, legacy dropped)

ext4, btrfs, XFS, F2FS, EROFS, overlayfs, NTFS3 (the modern read-write NTFS; the old read-only ntfs is gone), FUSE + virtiofs. NVMe including multipath and NVMe/TCP fabrics. io_uring for async I/O. (Note: bcachefs was removed from mainline before 7.1, so it is intentionally absent.)

Observability

BTF debug info (DEBUG_INFO_BTF + DWARF5) for eBPF CO-RE, BPF_JIT always-on, ftrace/kprobes/uprobes.

Newest hardware

amd_pstate (EPP) for Zen CPUs, CXL (Compute Express Link — newest server memory/accelerator fabric), USB4/Thunderbolt, PCIe ASPM + PTM, confidential computing (AMD SME/SEV-SNP, Intel TDX guest), and modern GPUs as modules incl. the new Intel Xe driver. ntsync (fast Win32-style sync primitives for Wine/Proton, merged 6.14).

Image & modules

ZSTD-compressed kernel image, module signing with SHA-512.

What we dropped (the "no backward-compat" part)

Dropped How (verified in the final .config)
32-bit syscall emulation IA32_EMULATIONexplicitly off
legacy modify_ldt MODIFY_LDT_SYSCALLexplicitly off (needs EXPERT)
32-bit compat time COMPAT_32BIT_TIMEexplicitly off
x32 ABI X86_X32_ABIauto-off (depends on IA32_EMULATION)
16-bit / vm86 segments X86_16BITauto-off (no longer compiled once LDT is gone)

On targeting "newest hardware" via -march: the kernel intentionally compiles at the x86-64 baseline and only uses advanced ISA (AVX2/AVX-512…) inside explicit FPU-guarded regions — forcing a global -march=x86-64-v3 can break early-boot code, so we did not do it. In 7.1 the clean knob is CONFIG_X86_NATIVE_CPU (-march=native), but it requires clang ≥ 19.1 (we have 18), so "target newest hardware" here means driver/feature enablement, which is what actually matters.

Build steps (exactly what was run)

KSRC=~/os_experiments/linux_kernel/src
BMOD=~/os_experiments/linux_kernel/build-modern

# generate config WITH the LLVM toolchain so compiler-gated options (kCFI) resolve
make -C "$KSRC" LLVM=1 O="$BMOD" defconfig
"$KSRC"/scripts/kconfig/merge_config.sh -m -O "$BMOD" "$BMOD/.config" modern.config
# drop deep legacy bits that need EXPERT, tag the build
"$KSRC"/scripts/config --file "$BMOD/.config" \
    -e EXPERT -d X86_16BIT -d MODIFY_LDT_SYSCALL --set-str LOCALVERSION "-modern"
make -C "$KSRC" LLVM=1 O="$BMOD" olddefconfig

# build everything with clang/lld
make -C "$KSRC" LLVM=1 O="$BMOD" -j"$(nproc)" all

Results

Metric Value
Build result exit 0
Build time 18 min 55 s (12 cores, clang)
Peak build RAM 7.7 GiB
bzImage 20 MB (20 406 272 B), ZSTD-compressed
vmlinux 467 MB (large because of BTF + DWARF5 debug info)
Enabled =y symbols 1852
Modules built 25 (.ko, incl. amdgpu/xe/nvme-tcp/ntsync)
Version string 7.1.0-modern … clang 18.1.3 … #1 SMP PREEMPT_DYNAMIC

Boot test — q35, KVM, -cpu host, 2 vCPUs

QEMU exited rc=0 (clean ACPI power-off), proving a full boot to userspace. Evidence pulled from the captured dmesg (artifacts/modern/boot.log):

Linux version 7.1.0-modern … (Ubuntu clang version 18.1.3, Ubuntu LLD 18.1.3) #1 SMP PREEMPT_DYNAMIC
Spectre V2 : Mitigation: Retpolines
Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
SMP alternatives: CFI: Using rehashed retpoline kCFI      <-- kCFI is live
landlock: Up and running.                                  <-- Landlock LSM
SELinux:  Initializing.                                    <-- SELinux LSM
smpboot: Total of 2 processors activated (14399.67 BogoMIPS)
Run /init as init process
reboot: Power down                                          <-- clean ACPI poweroff

The single most important line is CFI: Using rehashed retpoline kCFI — the flagship Clang-only Control-Flow-Integrity protection is actually active in the running kernel, alongside Landlock + SELinux + Spectre retpolines/IBPB. The image boots SMP on the modern q35 chipset and powers off cleanly.