Add additional functionality

SecAI-Hub · SecAI-Hub · commit 7b4b35b63b19 · 2026-03-07T12:18:00.000-08:00
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # SecAI OS
 
-A bootable, local-first AI appliance with defense-in-depth security for consumer RTX workstations and Apple Silicon.
+A bootable, local-first AI appliance with defense-in-depth security. Supports NVIDIA, AMD, Intel, and Apple Silicon GPUs — all compute stays on-device.
 
 Built on [uBlue](https://universal-blue.org/) (Fedora Atomic / Silverblue) with an immutable OS, encrypted vault, and sealed runtime where sensitive data never leaves the device by default.
 
@@ -37,18 +37,41 @@ Built on [uBlue](https://universal-blue.org/) (Fedora Atomic / Silverblue) with
 | Tool Firewall | 8475 | Go | Policy-gated tool invocation gateway |
 | Web UI | 8480 | Python | Chat, image/video generation, model management |
 | Airlock | 8490 | Go | Sanitized egress proxy (disabled by default) |
-| Inference Worker | 8465 | llama.cpp | LLM inference (CUDA + Metal) |
-| Diffusion Worker | 8455 | Python | Image and video generation (Stable Diffusion) |
+| Inference Worker | 8465 | llama.cpp | LLM inference (CUDA / ROCm / Vulkan / Metal / CPU) |
+| Diffusion Worker | 8455 | Python | Image and video generation (CUDA / ROCm / XPU / MPS / CPU) |
 | Quarantine | -- | Python | 7-stage verify, scan, and promote pipeline |
 
 ## Hardware Support
 
-| Platform | GPU Acceleration | Notes |
-|----------|-----------------|-------|
-| NVIDIA RTX 5080 | CUDA (full offload) | Primary target; uses nvidia-open drivers |
-| NVIDIA RTX 4090/4080/3090 | CUDA (full offload) | Any RTX card with sufficient VRAM |
-| Apple M4 / M3 / M2 / M1 | Metal (via llama.cpp) | CPU-only container, Metal on host |
-| Any x86_64 | CPU fallback | Slower but functional |
+GPU is **auto-detected at first boot** — no manual configuration needed. The `detect-gpu.sh` script identifies your hardware and writes the optimal settings.
+
+### Supported GPUs
+
+| Vendor | GPUs | Backend | LLM (llama.cpp) | Diffusion (PyTorch) |
+|--------|------|---------|-----------------|-------------------|
+| **NVIDIA** | RTX 5090/5080/4090/4080/3090/3080, any CUDA GPU | CUDA | Full offload | Full offload |
+| **AMD** | RX 7900 XTX/XT, RX 7800/7700, RX 6900/6800, any RDNA/CDNA | ROCm (HIP) | Full offload | Full offload |
+| **Intel** | Arc A770/A750/A580, Arc B-series, Data Center Max | XPU (oneAPI) | Via Vulkan | Via IPEX |
+| **Apple** | M4/M3/M2/M1 (Pro/Max/Ultra) | Metal / MPS | Full offload | MPS acceleration |
+| **Any CPU** | x86_64 (AVX2/AVX-512), ARM64 (NEON) | CPU | Optimized | Functional |
+
+### Backend Priority
+
+The system auto-selects the best available backend in this order:
+1. **CUDA** (NVIDIA) — highest throughput for both LLM and diffusion
+2. **ROCm** (AMD) — near-CUDA performance on RDNA3/CDNA
+3. **MPS** (Apple Silicon) — Metal acceleration on macOS
+4. **XPU** (Intel Arc) — oneAPI/SYCL for discrete Intel GPUs
+5. **Vulkan** (cross-vendor) — universal GPU compute fallback for llama.cpp
+6. **CPU** — AVX2/AVX-512/NEON auto-vectorized, works on everything
+
+### Security Note
+
+All GPU backends run locally with the same sandboxing:
+- `PrivateNetwork=yes` — no network access regardless of GPU vendor
+- `DeviceAllow` restricts access to only the specific GPU device nodes needed
+- AMD ROCm uses `/dev/kfd` + `/dev/dri/*`; NVIDIA uses `/dev/nvidia*`; Intel uses `/dev/dri/*`
+- No cloud compute, no driver telemetry endpoints (blocked by nftables default-deny)
 
 **Minimum requirements:**
 
@@ -202,6 +225,9 @@ cd services/tool-firewall && go build -o ../../bin/tool-firewall . && cd ../..
 cd services/airlock && go build -o ../../bin/airlock . && cd ../..
 
 # Install Python dependencies
+# For NVIDIA: pip install torch --index-url https://download.pytorch.org/whl/cu124
+# For AMD:    pip install torch --index-url https://download.pytorch.org/whl/rocm6.1
+# For CPU:    pip install torch --index-url https://download.pytorch.org/whl/cpu
 pip install flask requests pyyaml diffusers transformers accelerate torch safetensors
 
 # Run the UI (Flask)
@@ -396,7 +422,7 @@ Every model — whether downloaded from the catalog or imported by the user —
 | **Tools** | Default-deny policy, path allowlisting, traversal protection, rate limiting |
 | **Egress** | Airlock disabled by default, PII/credential scanning, destination allowlist |
 | **Services** | Systemd sandboxing: ProtectSystem=strict, PrivateNetwork, syscall filters |
-| **GPU Isolation** | Diffusion worker sandboxed with explicit DeviceAllow for GPU access only |
+| **GPU Isolation** | Vendor-specific DeviceAllow (NVIDIA `/dev/nvidia*`, AMD `/dev/kfd`, Intel `/dev/dri/*`), PrivateNetwork on all |
 | **Emergency** | Panic switch: instant network kill + route flush + service stop |
 
 ### Systemd Sandboxing
@@ -412,10 +438,13 @@ Every service runs with defense-in-depth sandboxing:
 - `SystemCallFilter=@system-service` — restricted syscalls
 - `MemoryDenyWriteExecute=yes` — no JIT/RWX memory
 
-The diffusion worker has additional GPU-specific sandboxing:
-- `DeviceAllow=/dev/nvidia* rw` and `DeviceAllow=/dev/dri/* rw` — explicit GPU access
+Both inference and diffusion workers have GPU-specific sandboxing:
+- `DeviceAllow=/dev/nvidia* rw` — NVIDIA CUDA access
+- `DeviceAllow=/dev/kfd rw` — AMD ROCm compute access
+- `DeviceAllow=/dev/dri/* rw` — AMD/Intel DRI render nodes
 - `ReadWritePaths=/var/lib/secure-ai/vault/outputs` — write only to outputs directory
 - `ReadOnlyPaths=/var/lib/secure-ai/registry` — read-only model access
+- Unused GPU device nodes are harmless — systemd silently ignores DeviceAllow for non-existent devices
 
 ### Verify Image Signatures
 
@@ -440,6 +469,12 @@ All configuration lives in `/etc/secure-ai/` (baked into the image, read-only at
 
 ### Key Configuration Options
 
+**GPU backend** (`config/appliance.yaml`):
+```yaml
+gpu:
+  backend: "auto"   # auto | cuda | rocm | xpu | vulkan | mps | cpu
+```
+
 **Inference settings** (`config/appliance.yaml`):
 ```yaml
 inference:
@@ -601,14 +636,28 @@ mount | grep secure-ai
 ### GPU not detected
 
 ```bash
-# Check NVIDIA driver
-nvidia-smi
+# Re-run GPU detection
+sudo /usr/libexec/secure-ai/detect-gpu.sh
 
-# If not loaded, check kernel modules
+# Check what was detected
+cat /var/lib/secure-ai/inference.env
+
+# NVIDIA: check driver
+nvidia-smi
 lsmod | grep nvidia
 
-# For Apple Silicon, GPU acceleration runs on the host (not in container)
-# Verify Metal support:
+# AMD: check ROCm
+rocminfo
+ls -la /dev/kfd /dev/dri/renderD128
+
+# Intel: check DRI
+ls -la /dev/dri/renderD128
+cat /sys/class/drm/card0/device/vendor  # should be 0x8086
+
+# Vulkan (any vendor)
+vulkaninfo --summary
+
+# Apple Silicon (Metal runs on host, not in container)
 system_profiler SPDisplaysDataType
 ```
 
diff --git a/files/scripts/build-services.sh b/files/scripts/build-services.sh
@@ -56,6 +56,13 @@ WRAPPER
 chmod +x "${INSTALL_DIR}/ui"
 echo "  -> ${INSTALL_DIR}/ui"
 
+# Diffusion worker
+echo "Installing: diffusion-worker"
+DIFFUSION_DIR="/opt/secure-ai/services/diffusion-worker"
+mkdir -p "$DIFFUSION_DIR"
+cp /tmp/services/diffusion-worker/app.py "$DIFFUSION_DIR/app.py"
+echo "  -> ${DIFFUSION_DIR}/app.py"
+
 # Cleanup build artifacts
 rm -rf "$SRC_DIR"
 dnf remove -y golang 2>/dev/null || true
diff --git a/files/system/etc/secure-ai/config/appliance.yaml b/files/system/etc/secure-ai/config/appliance.yaml
@@ -12,6 +12,11 @@ paths:
   tmpdir: "/run/secure-ai/tmp"
   outputs: "/var/lib/secure-ai/vault/outputs"
 
+gpu:
+  # Auto-detected at first boot by detect-gpu.sh. Override here if needed.
+  # backend: auto | cuda | rocm | xpu | vulkan | mps | cpu
+  backend: "auto"
+
 inference:
   engine: "llama-cpp"
   bind: "127.0.0.1:8465"
diff --git a/files/system/usr/lib/systemd/system/secure-ai-diffusion.service b/files/system/usr/lib/systemd/system/secure-ai-diffusion.service
@@ -15,6 +15,7 @@ Environment=OUTPUTS_DIR=/var/lib/secure-ai/vault/outputs
 Environment=APPLIANCE_CONFIG=/etc/secure-ai/config/appliance.yaml
 Environment=MAX_RESOLUTION=2048
 Environment=MAX_STEPS=100
+EnvironmentFile=-/var/lib/secure-ai/inference.env
 
 # Sandboxing
 ProtectSystem=strict
@@ -29,9 +30,13 @@ ProtectControlGroups=yes
 NoNewPrivileges=yes
 RestrictSUIDSGID=yes
 MemoryDenyWriteExecute=no
-# GPU access requires broader syscalls and device access
+# GPU access — NVIDIA (CUDA), AMD (ROCm), Intel (DRI/XPU)
 DeviceAllow=/dev/nvidia* rw
+DeviceAllow=/dev/nvidiactl rw
+DeviceAllow=/dev/nvidia-uvm rw
+DeviceAllow=/dev/nvidia-uvm-tools rw
 DeviceAllow=/dev/dri/* rw
+DeviceAllow=/dev/kfd rw
 SupplementaryGroups=video render
 
 [Install]
diff --git a/files/system/usr/lib/systemd/system/secure-ai-inference.service b/files/system/usr/lib/systemd/system/secure-ai-inference.service
@@ -58,12 +58,13 @@ SystemCallFilter=~@privileged @mount @clock @debug @swap @reboot @module @cpu-em
 SystemCallArchitectures=native
 SystemCallErrorNumber=EPERM
 
-# GPU access — do NOT set PrivateDevices (need /dev/nvidia*, /dev/dri)
+# GPU access — NVIDIA (CUDA), AMD (ROCm via /dev/kfd + /dev/dri), Intel (DRI)
 DeviceAllow=/dev/nvidia* rw
 DeviceAllow=/dev/dri/* rw
 DeviceAllow=/dev/nvidiactl rw
 DeviceAllow=/dev/nvidia-uvm rw
 DeviceAllow=/dev/nvidia-uvm-tools rw
+DeviceAllow=/dev/kfd rw
 
 # Resource limits — generous for inference
 MemoryMax=32G
diff --git a/files/system/usr/libexec/secure-ai/detect-gpu.sh b/files/system/usr/libexec/secure-ai/detect-gpu.sh
@@ -0,0 +1,77 @@
+#!/bin/bash
+#
+# Detect available GPU compute backends and write results to inference.env.
+# Called by secure-ai-firstboot.service and can be re-run manually.
+# Writes: GPU_BACKEND, GPU_NAME, GPU_LAYERS to /var/lib/secure-ai/inference.env
+#
+set -euo pipefail
+
+ENV_FILE="/var/lib/secure-ai/inference.env"
+BACKEND="cpu"
+GPU_NAME="CPU (no GPU detected)"
+GPU_LAYERS="0"
+
+echo "=== SecAI GPU Detection ==="
+
+# --- NVIDIA (CUDA) ---
+if command -v nvidia-smi &>/dev/null && nvidia-smi &>/dev/null; then
+    BACKEND="cuda"
+    GPU_NAME=$(nvidia-smi --query-gpu=name --format=csv,noheader,nounits | head -1)
+    GPU_LAYERS="-1"
+    echo "Detected NVIDIA GPU: ${GPU_NAME}"
+
+# --- AMD (ROCm) ---
+elif [ -e /dev/kfd ] && [ -e /dev/dri/renderD128 ]; then
+    BACKEND="rocm"
+    # Try rocminfo first, fall back to DRI
+    if command -v rocminfo &>/dev/null; then
+        GPU_NAME=$(rocminfo 2>/dev/null | grep -m1 "Marketing Name" | sed 's/.*: *//' || echo "AMD GPU")
+    else
+        GPU_NAME=$(cat /sys/class/drm/card0/device/product_name 2>/dev/null || echo "AMD GPU")
+    fi
+    GPU_LAYERS="-1"
+    echo "Detected AMD GPU (ROCm): ${GPU_NAME}"
+
+# --- Intel (XPU / Arc / integrated) ---
+elif [ -e /dev/dri/renderD128 ]; then
+    # Check if it's an Intel GPU via sysfs
+    DRM_VENDOR=$(cat /sys/class/drm/card0/device/vendor 2>/dev/null || echo "")
+    if [ "$DRM_VENDOR" = "0x8086" ]; then
+        BACKEND="xpu"
+        GPU_NAME=$(cat /sys/class/drm/card0/device/product_name 2>/dev/null || echo "Intel GPU")
+        # Intel Arc discrete GPUs get full offload; integrated gets partial
+        if command -v intel_gpu_top &>/dev/null || [[ "$GPU_NAME" == *"Arc"* ]]; then
+            GPU_LAYERS="-1"
+        else
+            GPU_LAYERS="0"  # integrated Intel — CPU inference is usually faster
+        fi
+        echo "Detected Intel GPU: ${GPU_NAME}"
+    else
+        echo "DRI device found but vendor ${DRM_VENDOR} not recognized for compute"
+    fi
+fi
+
+# --- Vulkan fallback check ---
+if [ "$BACKEND" = "cpu" ] && command -v vulkaninfo &>/dev/null; then
+    VULKAN_GPU=$(vulkaninfo --summary 2>/dev/null | grep -m1 "deviceName" | sed 's/.*= *//' || echo "")
+    if [ -n "$VULKAN_GPU" ]; then
+        BACKEND="vulkan"
+        GPU_NAME="$VULKAN_GPU (Vulkan)"
+        GPU_LAYERS="-1"
+        echo "Detected Vulkan-capable GPU: ${GPU_NAME}"
+    fi
+fi
+
+echo "Result: backend=${BACKEND} gpu=${GPU_NAME} layers=${GPU_LAYERS}"
+
+# Write environment file for inference and diffusion services
+mkdir -p "$(dirname "$ENV_FILE")"
+cat > "$ENV_FILE" <<EOF
+# Auto-detected by detect-gpu.sh — re-run to update
+GPU_BACKEND=${BACKEND}
+GPU_NAME=${GPU_NAME}
+GPU_LAYERS=${GPU_LAYERS}
+EOF
+
+echo "Written to ${ENV_FILE}"
+echo "=== GPU Detection Complete ==="
diff --git a/files/system/usr/libexec/secure-ai/firstboot.sh b/files/system/usr/libexec/secure-ai/firstboot.sh
@@ -79,6 +79,17 @@ EOF
     chmod 644 "${SECURE_AI_ROOT}/registry/manifest.json"
 fi
 
+# Detect GPU and write inference.env
+log "Running GPU detection..."
+/usr/libexec/secure-ai/detect-gpu.sh 2>&1 | while IFS= read -r line; do log "$line"; done || {
+    log "WARNING: GPU detection failed. Defaulting to CPU."
+    cat > "${SECURE_AI_ROOT}/inference.env" <<'GPUEOF'
+GPU_BACKEND=cpu
+GPU_NAME=CPU (detection failed)
+GPU_LAYERS=0
+GPUEOF
+}
+
 # Disable swap (belt-and-suspenders alongside kernel arg)
 log "Ensuring swap is disabled..."
 swapoff -a 2>/dev/null || true
diff --git a/recipes/recipe.yml b/recipes/recipe.yml
@@ -7,7 +7,7 @@ base-image: ghcr.io/ublue-os/silverblue-main
 image-version: 42
 
 modules:
-  # 1) Install required packages.
+  # 1) Install required packages (includes GPU compute libraries).
   - type: rpm-ostree
     install:
       - nftables
@@ -20,6 +20,13 @@ modules:
       - python3-flask
       - python3-requests
       - golang
+      # GPU compute support
+      - mesa-dri-drivers          # Intel/AMD OpenGL + Vulkan
+      - mesa-vulkan-drivers       # Vulkan for Intel/AMD
+      - vulkan-loader             # Vulkan ICD loader
+      - vulkan-tools              # vulkaninfo for diagnostics
+      - libdrm                    # DRM library (all GPUs)
+      - clinfo                    # OpenCL diagnostics
 
   # 2) Copy appliance config, systemd units, firewall rules, sysctl into image.
   - type: files
@@ -32,18 +39,23 @@ modules:
     scripts:
       - build-services.sh
 
-  # 4) Install NVIDIA kernel modules (nvidia-open for RTX 5080).
+  # 4) NVIDIA kernel modules (nvidia-open for RTX 5080+).
+  #    AMD and Intel use in-kernel drivers (amdgpu, i915/xe) — no akmods needed.
   - type: akmods
     base: main
     install: []
     nvidia-driver: nvidia-open
 
-  # 5) Kernel args: NVIDIA setup + disable swap + security hardening.
+  # 5) Kernel args: GPU setup + disable swap + security hardening.
   - type: kargs
     append:
+      # NVIDIA
       - rd.driver.blacklist=nouveau
       - modprobe.blacklist=nouveau
       - nvidia-drm.modeset=1
+      # AMD — amdgpu is in-kernel, just ensure it's preferred
+      - amdgpu.dc=1
+      # Security hardening
       - systemd.swap=0
       - slab_nomerge
       - init_on_alloc=1
@@ -65,6 +77,7 @@ modules:
         - secure-ai-ui.service
         - secure-ai-quarantine-watcher.service
         - secure-ai-inference.service
+        - secure-ai-diffusion.service
         - nftables.service
         - secure-ai-firstboot.service
         - secure-ai-tmpdir.mount
diff --git a/services/diffusion-worker/Containerfile b/services/diffusion-worker/Containerfile
@@ -1,5 +1,8 @@
 # Diffusion worker: image and video generation via diffusers library.
-# Build arg COMPUTE selects CUDA or CPU-only.
+# Build arg COMPUTE selects the GPU backend:
+#   cuda  - NVIDIA GPUs (CUDA 12.4)
+#   rocm  - AMD GPUs (ROCm 6.1)
+#   cpu   - CPU only (AVX2/AVX-512 optimized, works everywhere)
 ARG COMPUTE=cuda
 
 FROM python:3.12-slim AS base
@@ -10,9 +13,11 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
     libgl1-mesa-glx libglib2.0-0 && \
     rm -rf /var/lib/apt/lists/*
 
-# Install PyTorch (CUDA or CPU)
+# Install PyTorch with the appropriate compute backend
 RUN if [ "$COMPUTE" = "cuda" ]; then \
         pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cu124; \
+    elif [ "$COMPUTE" = "rocm" ]; then \
+        pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/rocm6.1; \
     else \
         pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cpu; \
     fi
@@ -26,6 +31,11 @@ RUN pip install --no-cache-dir \
     pyyaml \
     Pillow
 
+# Intel XPU support (install IPEX when building for CPU — it auto-detects Intel GPUs)
+RUN if [ "$COMPUTE" = "cpu" ]; then \
+        pip install --no-cache-dir intel-extension-for-pytorch 2>/dev/null || true; \
+    fi
+
 COPY app.py /app/app.py
 
 WORKDIR /app
diff --git a/services/diffusion-worker/app.py b/services/diffusion-worker/app.py
diff --git a/services/inference-worker/Containerfile b/services/inference-worker/Containerfile