Skip to content

Commit 369c50a

Browse files
authored
fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc (#9423)
* fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc The upstream PR #21203 (server: respect the ignore_eos flag) has been merged into the TheTom/llama-cpp-turboquant feature/turboquant-kv-cache branch. With the fix now in-tree, 0001-server-respect-the-ignore-eos-flag.patch no longer applies (git apply sees its additions already present) and the nightly turboquant bump fails. Retire the patch and bump the pin to the first fork revision that carries the merged fix (tag feature-turboquant-kv-cache-b8967-627ebbc). This matches the contract in apply-patches.sh: drop patches once the fork catches up. * fix(turboquant): patch out get_media_marker() call in grpc-server copy CI turboquant docker build was failing with: grpc-server.cpp:2825:40: error: use of undeclared identifier 'get_media_marker' The call was added by 7809c5f (PR #9412) to propagate the mtmd random per-server media marker upstream landed in ggml-org/llama.cpp#21962. The TheTom/llama-cpp-turboquant fork branched before that PR, so its server-common.cpp has no such symbol. Extend patch-grpc-server.sh to substitute get_media_marker() with the legacy "<__media__>" literal in the build-time grpc-server.cpp copy under turboquant-<flavor>-build/. The fork's mtmd_default_marker() returns exactly that string, and the Go layer falls back to the same sentinel when media_marker is empty, so behavior on the turboquant path is unchanged. Patched copy only — the shared source under backend/cpp/llama-cpp/ keeps compiling against vanilla upstream. Verified by running `make docker-build-turboquant` locally end-to-end: all five flavors (avx, avx2, avx512, fallback, grpc+rpc-server) now compile past the previous failure and the image tags successfully.
1 parent 75a63f8 commit 369c50a

3 files changed

Lines changed: 58 additions & 118 deletions

File tree

backend/cpp/turboquant/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
33
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
4-
TURBOQUANT_VERSION?=45f8a066ed5f5bb38c695cec532f6cef9f4efa9d
4+
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
55
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
66

77
CMAKE_ARGS?=
Lines changed: 57 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,22 @@
11
#!/bin/bash
2-
# Augment the shared backend/cpp/llama-cpp/grpc-server.cpp allow-list of KV-cache
3-
# types so the gRPC `LoadModel` call accepts the TurboQuant-specific
4-
# `turbo2` / `turbo3` / `turbo4` cache types.
2+
# Patch the shared backend/cpp/llama-cpp/grpc-server.cpp *copy* used by the
3+
# turboquant build to account for two gaps between upstream and the fork:
54
#
6-
# We do this on the *copy* sitting in turboquant-<flavor>-build/, never on the
7-
# original under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps
8-
# compiling against vanilla upstream which does not know about GGML_TYPE_TURBO*.
5+
# 1. Augment the kv_cache_types[] allow-list so `LoadModel` accepts the
6+
# fork-specific `turbo2` / `turbo3` / `turbo4` cache types.
7+
# 2. Replace `get_media_marker()` (added upstream in ggml-org/llama.cpp#21962,
8+
# server-side random per-instance marker) with the legacy "<__media__>"
9+
# literal. The fork branched before that PR, so server-common.cpp has no
10+
# get_media_marker symbol. The fork's mtmd_default_marker() still returns
11+
# "<__media__>", and Go-side tooling falls back to that sentinel when the
12+
# backend does not expose media_marker, so substituting the literal keeps
13+
# behavior identical on the turboquant path.
914
#
10-
# Idempotent: skips the insertion if the marker is already present (so re-runs
15+
# We patch the *copy* sitting in turboquant-<flavor>-build/, never the original
16+
# under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps compiling
17+
# against vanilla upstream.
18+
#
19+
# Idempotent: skips each insertion if its marker is already present (so re-runs
1120
# of the same build dir don't double-insert).
1221

1322
set -euo pipefail
@@ -25,33 +34,47 @@ if [[ ! -f "$SRC" ]]; then
2534
fi
2635

2736
if grep -q 'GGML_TYPE_TURBO2_0' "$SRC"; then
28-
echo "==> $SRC already has TurboQuant cache types, skipping"
29-
exit 0
30-
fi
37+
echo "==> $SRC already has TurboQuant cache types, skipping KV allow-list patch"
38+
else
39+
echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"
3140

32-
echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"
33-
34-
# Insert the three TURBO entries right after the first ` GGML_TYPE_Q5_1,`
35-
# line (the kv_cache_types[] allow-list). Using awk because the builder image
36-
# does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
37-
awk '
38-
/^ GGML_TYPE_Q5_1,$/ && !done {
39-
print
40-
print " // turboquant fork extras — added by patch-grpc-server.sh"
41-
print " GGML_TYPE_TURBO2_0,"
42-
print " GGML_TYPE_TURBO3_0,"
43-
print " GGML_TYPE_TURBO4_0,"
44-
done = 1
45-
next
46-
}
47-
{ print }
48-
END {
49-
if (!done) {
50-
print "patch-grpc-server.sh: anchor ` GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
51-
exit 1
41+
# Insert the three TURBO entries right after the first ` GGML_TYPE_Q5_1,`
42+
# line (the kv_cache_types[] allow-list). Using awk because the builder image
43+
# does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
44+
awk '
45+
/^ GGML_TYPE_Q5_1,$/ && !done {
46+
print
47+
print " // turboquant fork extras — added by patch-grpc-server.sh"
48+
print " GGML_TYPE_TURBO2_0,"
49+
print " GGML_TYPE_TURBO3_0,"
50+
print " GGML_TYPE_TURBO4_0,"
51+
done = 1
52+
next
5253
}
53-
}
54-
' "$SRC" > "$SRC.tmp"
55-
mv "$SRC.tmp" "$SRC"
54+
{ print }
55+
END {
56+
if (!done) {
57+
print "patch-grpc-server.sh: anchor ` GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
58+
exit 1
59+
}
60+
}
61+
' "$SRC" > "$SRC.tmp"
62+
mv "$SRC.tmp" "$SRC"
63+
64+
echo "==> KV allow-list patch OK"
65+
fi
66+
67+
if grep -q 'get_media_marker()' "$SRC"; then
68+
echo "==> patching $SRC to replace get_media_marker() with legacy \"<__media__>\" literal"
69+
# Only one call site today (ModelMetadata), but replace all occurrences to
70+
# stay robust if upstream adds more. Use a temp file to avoid relying on
71+
# sed -i portability (the builder image uses GNU sed, but keeping this
72+
# consistent with the awk block above).
73+
sed 's/get_media_marker()/"<__media__>"/g' "$SRC" > "$SRC.tmp"
74+
mv "$SRC.tmp" "$SRC"
75+
echo "==> get_media_marker() substitution OK"
76+
else
77+
echo "==> $SRC has no get_media_marker() call, skipping media-marker patch"
78+
fi
5679

57-
echo "==> patched OK"
80+
echo "==> all patches applied"

backend/cpp/turboquant/patches/0001-server-respect-the-ignore-eos-flag.patch

Lines changed: 0 additions & 83 deletions
This file was deleted.

0 commit comments

Comments
 (0)