Skip to content

Commit 4e1bd3d

Browse files
kvc0samuraieng
authored andcommitted
server: (anthropic API) fix prefix caching (ggml-org#21793)
When testing claude code against llama.cpp, I noticed that only n_past 18577 was used even when context was 60k or more. The log in llama-server says: ``` slot update_slots: id 3 | task 10342 | old: ... ; cch= | defa0;You are slot update_slots: id 3 | task 10342 | new: ... ; cch= | 1c8b4; ``` I observed that the cch value changed every time. Reading about that, the x-anthropic-billing-header system message seems to be specially handled inside of the anthropic api. I could remove it, but there is a meaningful string sometimes included at the end. So instead, I just replace the changing cch checksum with fffff. I'm treating this as an anthropic message body API detail - I think this is the right way to do this, but by all means please correct me! It's always 5 hexadecimal characters, but I've written the replacement defensively in case they change the protocol.
1 parent a23af0f commit 4e1bd3d

1 file changed

Lines changed: 40 additions & 1 deletion

File tree

tools/server/server-chat.cpp

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,42 @@ json server_chat_convert_responses_to_chatcmpl(const json & response_body) {
281281
return chatcmpl_body;
282282
}
283283

284+
// Edits the cch section of an "x-anthropic-billing-header" system prompt.
285+
// Does nothing to any other prompt.
286+
//
287+
// This is a claude message with a "cch=ef01a" attribute that breaks prefix caching.
288+
// The cch stamp is a whitebox end-to-end integrity hint. It's not meaningful as a
289+
// system prompt data, particularly to llama.cpp, but its presence means the prefix
290+
// cache will not get past it: It changes on each request.
291+
//
292+
// Reference: https://github.com/ggml-org/llama.cpp/pull/21793
293+
// Example header:
294+
// ```
295+
// x-anthropic-billing-header: cc_version=2.1.101.e51; cc_entrypoint=cli; cch=a5145;You are Claude Code, Anthropic's official CLI for Claude.
296+
// ^^^^^
297+
// ```
298+
static void normalize_anthropic_billing_header(std::string & system_text) {
299+
if (system_text.rfind("x-anthropic-billing-header:", 0) != 0) {
300+
return;
301+
}
302+
303+
const size_t header_prefix_length = strlen("x-anthropic-billing-header:");
304+
const size_t cch_length = 5;
305+
const size_t index_cch = system_text.find("cch=", header_prefix_length);
306+
if (index_cch == std::string::npos) {
307+
return;
308+
}
309+
310+
const size_t index_replace = index_cch + 4;
311+
if (index_replace + cch_length < system_text.length() && system_text[index_replace + cch_length] == ';') {
312+
for (size_t i = 0; i < cch_length; ++i) {
313+
system_text[index_replace + i] = 'f';
314+
}
315+
} else {
316+
LOG_ERR("anthropic string not as expected: %s", system_text.c_str());
317+
}
318+
}
319+
284320
json server_chat_convert_anthropic_to_oai(const json & body) {
285321
json oai_body;
286322

@@ -292,10 +328,13 @@ json server_chat_convert_anthropic_to_oai(const json & body) {
292328

293329
if (system_param.is_string()) {
294330
system_content = system_param.get<std::string>();
331+
normalize_anthropic_billing_header(system_content);
295332
} else if (system_param.is_array()) {
296333
for (const auto & block : system_param) {
297334
if (json_value(block, "type", std::string()) == "text") {
298-
system_content += json_value(block, "text", std::string());
335+
auto system_text = json_value(block, "text", std::string());
336+
normalize_anthropic_billing_header(system_text);
337+
system_content += system_text;
299338
}
300339
}
301340
}

0 commit comments

Comments
 (0)