docs(ci): explain the GGUF model cache (purpose, no flag, vs sccache)

claude · claude · commit 9a1d4931c966 · 2026-06-20T14:40:39.000Z
Expand the inline comment on the model-cache step: it exists to avoid re-downloading ~5 GB of GGUF test models from HuggingFace every run (and to dodge HF rate-limits). It is always ON by design — no on/off flag — unlike the sccache compiler cache, which the use_cache input / USE_CACHE env toggles. Notes it uses GitHub's free cache, not Depot. Comment-only; no behaviour change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LjWiKSyNzqqpobSKYRiew5
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -586,14 +586,18 @@ jobs:
         with:
           name: Linux-x86_64-libraries
           path: ${{ github.workspace }}/src/main/resources/net/ladenthin/llama/
+      # GGUF model cache — introduced to stop re-downloading ~5 GB of test models from
+      # HuggingFace on every run (also dodges HF rate-limits). Complements the sccache compiler
+      # cache but is always ON: there is intentionally NO on/off flag for it (it is GitHub's
+      # free cache, safe + free), whereas the sccache cache is toggled by the `use_cache`
+      # workflow_dispatch input / USE_CACHE env. Not Depot — GB-scale blobs are usage-priced
+      # there and its file cache needs Depot-hosted runners. See CLAUDE.md.
       - name: Cache GGUF models (GitHub Actions cache; avoids re-downloading from HuggingFace)
         uses: actions/cache@v5
         with:
           path: models/
-          # Shared, stable key across all test jobs (GGUF files are platform-independent, so
-          # ubuntu + macOS share one entry). Bump the suffix when the model set/URLs change.
-          # Uses GitHub's free 10 GB/repo cache — NOT Depot: these are GB-scale blobs and Depot
-          # is usage-priced + its file cache needs Depot-hosted runners (see CLAUDE.md).
+          # GGUF is platform-independent, so ubuntu + macOS share one entry;
+          # bump the suffix when the model set / URLs change.
           key: gguf-models-v1
       - name: Download text generation model
         run: test -f models/${MODEL_NAME} || curl -L --proto =https --proto-redir =https --fail --retry 5 --retry-all-errors ${MODEL_URL} --create-dirs -o models/${MODEL_NAME}
@@ -719,14 +723,18 @@ jobs:
         with:
           name: macos-14-libraries
           path: ${{ github.workspace }}/src/main/resources/net/ladenthin/llama/
+      # GGUF model cache — introduced to stop re-downloading ~5 GB of test models from
+      # HuggingFace on every run (also dodges HF rate-limits). Complements the sccache compiler
+      # cache but is always ON: there is intentionally NO on/off flag for it (it is GitHub's
+      # free cache, safe + free), whereas the sccache cache is toggled by the `use_cache`
+      # workflow_dispatch input / USE_CACHE env. Not Depot — GB-scale blobs are usage-priced
+      # there and its file cache needs Depot-hosted runners. See CLAUDE.md.
       - name: Cache GGUF models (GitHub Actions cache; avoids re-downloading from HuggingFace)
         uses: actions/cache@v5
         with:
           path: models/
-          # Shared, stable key across all test jobs (GGUF files are platform-independent, so
-          # ubuntu + macOS share one entry). Bump the suffix when the model set/URLs change.
-          # Uses GitHub's free 10 GB/repo cache — NOT Depot: these are GB-scale blobs and Depot
-          # is usage-priced + its file cache needs Depot-hosted runners (see CLAUDE.md).
+          # GGUF is platform-independent, so ubuntu + macOS share one entry;
+          # bump the suffix when the model set / URLs change.
           key: gguf-models-v1
       - name: Download text generation model
         run: test -f models/${MODEL_NAME} || curl -L --proto =https --proto-redir =https --fail --retry 5 --retry-all-errors ${MODEL_URL} --create-dirs -o models/${MODEL_NAME}
@@ -795,14 +803,18 @@ jobs:
         with:
           name: macos-15-libraries
           path: ${{ github.workspace }}/src/main/resources/net/ladenthin/llama/
+      # GGUF model cache — introduced to stop re-downloading ~5 GB of test models from
+      # HuggingFace on every run (also dodges HF rate-limits). Complements the sccache compiler
+      # cache but is always ON: there is intentionally NO on/off flag for it (it is GitHub's
+      # free cache, safe + free), whereas the sccache cache is toggled by the `use_cache`
+      # workflow_dispatch input / USE_CACHE env. Not Depot — GB-scale blobs are usage-priced
+      # there and its file cache needs Depot-hosted runners. See CLAUDE.md.
       - name: Cache GGUF models (GitHub Actions cache; avoids re-downloading from HuggingFace)
         uses: actions/cache@v5
         with:
           path: models/
-          # Shared, stable key across all test jobs (GGUF files are platform-independent, so
-          # ubuntu + macOS share one entry). Bump the suffix when the model set/URLs change.
-          # Uses GitHub's free 10 GB/repo cache — NOT Depot: these are GB-scale blobs and Depot
-          # is usage-priced + its file cache needs Depot-hosted runners (see CLAUDE.md).
+          # GGUF is platform-independent, so ubuntu + macOS share one entry;
+          # bump the suffix when the model set / URLs change.
           key: gguf-models-v1
       - name: Download text generation model
         run: test -f models/${MODEL_NAME} || curl -L --proto =https --proto-redir =https --fail --retry 5 --retry-all-errors ${MODEL_URL} --create-dirs -o models/${MODEL_NAME}
@@ -871,14 +883,18 @@ jobs:
         with:
           name: macos-15-metal-libraries
           path: ${{ github.workspace }}/src/main/resources/net/ladenthin/llama/
+      # GGUF model cache — introduced to stop re-downloading ~5 GB of test models from
+      # HuggingFace on every run (also dodges HF rate-limits). Complements the sccache compiler
+      # cache but is always ON: there is intentionally NO on/off flag for it (it is GitHub's
+      # free cache, safe + free), whereas the sccache cache is toggled by the `use_cache`
+      # workflow_dispatch input / USE_CACHE env. Not Depot — GB-scale blobs are usage-priced
+      # there and its file cache needs Depot-hosted runners. See CLAUDE.md.
       - name: Cache GGUF models (GitHub Actions cache; avoids re-downloading from HuggingFace)
         uses: actions/cache@v5
         with:
           path: models/
-          # Shared, stable key across all test jobs (GGUF files are platform-independent, so
-          # ubuntu + macOS share one entry). Bump the suffix when the model set/URLs change.
-          # Uses GitHub's free 10 GB/repo cache — NOT Depot: these are GB-scale blobs and Depot
-          # is usage-priced + its file cache needs Depot-hosted runners (see CLAUDE.md).
+          # GGUF is platform-independent, so ubuntu + macOS share one entry;
+          # bump the suffix when the model set / URLs change.
           key: gguf-models-v1
       - name: Download text generation model
         run: test -f models/${MODEL_NAME} || curl -L --proto =https --proto-redir =https --fail --retry 5 --retry-all-errors ${MODEL_URL} --create-dirs -o models/${MODEL_NAME}