Skip to content

Commit ffba4f2

Browse files
authored
examples : add debug utility/example (#18464)
* examples : add debug utility/example This commit introduces a new example named llama-debug which is a utility that is intended to be used to assist with developing/debugging a converted model. The motivation for this utilitiy is to assist in model conversion work to verify that the model produces the expected outputs. It is intended to replace logits.cpp in examples/model-conversion. Example usage: ```console ./build/bin/llama-debug \ -m models/Qwen2.5-0.5B-Instruct.gguf \ --prompt "Hello, my name is" \ --save-logits ... Model add_bos: false Input prompt: "Hello, my name is" Token ids (5): Hello(9707) ,(11) my(847) name(829) is(374) Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.bin Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.txt Prompt saved to data/llamacpp-Qwen2.5-0.5B-Instruct-prompt.txt Tokens saved to data/llamacpp-Qwen2.5-0.5B-Instruct-tokens.bin ``` For more details about the options available for this example, please refer to examples/debug/README.md. * throw runtime error instead of logging error * remove params.warmup and enable the warmup/nowarmup option * model-conversion : remove logits.cpp This commit removes logits.cpp in favor of using llama-debug for generating logits and embeddings. * examples : remove model-conversion directory This was missed in the previous commit. * model-conversion : add support for saving prompt and token ids This commit add support for storing the prompt and the token ids for the prompt when running the original models. The motivation for this is that this will allow us to compare the prompt and the tokens generated for the prompt when verifing the converted model. Currently it is possible that even if the same prompt is used that the tokens generated are different if there is a difference in the tokenization between the original and converted model which would currently go unnoticed (the verification will most likely fail but it might not be obvious why). * squash! model-conversion : add support for saving prompt and token ids fix pyright errors. * model-conversion : add compare_tokens utility This commit adds a script to compare token outputs between original and converted models. Example usage: ```console (venv) $ ./scripts/utils/compare_tokens.py pytorch-gemma-3-270m-it llamacpp-gemma-3-270m-it-bf16 Comparing tokens between: Original : pytorch-gemma-3-270m-it (6 tokens) Converted: llamacpp-gemma-3-270m-it-bf16 (6 tokens) ✅ All 6 tokens match! ``` And there is a verbose flag that will also print out the prompts: ```console (venv) $ ./scripts/utils/compare_tokens.py pytorch-gemma-3-270m-it llamacpp-gemma-3-270m-it-bf16 -v Original model prompt (pytorch-gemma-3-270m-it): prompt: Hello, my name is n_tokens: 6 token ids: 2, 9259, 236764, 1041, 1463, 563 Converted model prompt (llamacpp-gemma-3-270m-it-bf16): prompt: Hello, my name is n_tokens: 6 token ids: 2, 9259, 236764, 1041, 1463, 563 Comparing tokens between: Original : pytorch-gemma-3-270m-it (6 tokens) Converted: llamacpp-gemma-3-270m-it-bf16 (6 tokens) ✅ All 6 tokens match! ``` * model-conversion : add token comparison to verifiction scripts This commit add the calling of the compare_tokens function in compare-logits.py and semantic_check.py to ensure that the token ids that the tokenizers procoduce are the same before proceeding with verifying the logits/embeddings. Placing them in the existing scripts instead calling them separately ensures that the token comparison is always done prior to the logit/embedding verifications. Follow up commit/pr could refactor the causal logits verification into a single script instead of the two that exist now. This would reduce the code and make it consistent with the embeddings verficiation which only has a single script. * debug : use llama_model_n_embd_out This commit updates the debug example to use the new function llama_model_n_embd_out instead of llama_model_n_embd. The motivation for this change is to support late interation retriever models, like LFM2-ColBert-350M, where the output embeddings are down projected to a lower dimension. * debug : add print_usage function This commit adds a print_usage function that is passed to the common_params_parse. The motivation for this is that this enables a specific usage message which will be printed after all the options, for example: ```console example usage: Print tensors: ./build/bin/llama-debug -m model.gguf -p "Hello my name is" --verbose The tensors to be printed can be filtered with --tensor-filter option. Save logits/embeddings: ./build/bin/llama-debug -m model.gguf -p "Hello my name is" --save-logits Add --embedding to save embeddings ```
1 parent 3333951 commit ffba4f2

17 files changed

Lines changed: 725 additions & 319 deletions

common/arg.cpp

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1445,7 +1445,7 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
14451445
[](common_params & params, bool value) {
14461446
params.warmup = value;
14471447
}
1448-
).set_examples({LLAMA_EXAMPLE_COMPLETION, LLAMA_EXAMPLE_CLI, LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_MTMD, LLAMA_EXAMPLE_EMBEDDING, LLAMA_EXAMPLE_RETRIEVAL, LLAMA_EXAMPLE_PERPLEXITY}));
1448+
).set_examples({LLAMA_EXAMPLE_COMPLETION, LLAMA_EXAMPLE_CLI, LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_MTMD, LLAMA_EXAMPLE_EMBEDDING, LLAMA_EXAMPLE_RETRIEVAL, LLAMA_EXAMPLE_PERPLEXITY, LLAMA_EXAMPLE_DEBUG}));
14491449
add_opt(common_arg(
14501450
{"--spm-infill"},
14511451
string_format(
@@ -1761,7 +1761,7 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
17611761
else if (value == "rank") { params.pooling_type = LLAMA_POOLING_TYPE_RANK; }
17621762
else { throw std::invalid_argument("invalid value"); }
17631763
}
1764-
).set_examples({LLAMA_EXAMPLE_EMBEDDING, LLAMA_EXAMPLE_RETRIEVAL, LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_POOLING"));
1764+
).set_examples({LLAMA_EXAMPLE_EMBEDDING, LLAMA_EXAMPLE_RETRIEVAL, LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_DEBUG}).set_env("LLAMA_ARG_POOLING"));
17651765
add_opt(common_arg(
17661766
{"--attention"}, "{causal,non-causal}",
17671767
"attention type for embeddings, use model default if unspecified",
@@ -2609,7 +2609,7 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
26092609
[](common_params & params, int value) {
26102610
params.embd_normalize = value;
26112611
}
2612-
).set_examples({LLAMA_EXAMPLE_EMBEDDING}));
2612+
).set_examples({LLAMA_EXAMPLE_EMBEDDING, LLAMA_EXAMPLE_DEBUG}));
26132613
add_opt(common_arg(
26142614
{"--embd-output-format"}, "FORMAT",
26152615
"empty = default, \"array\" = [[],[]...], \"json\" = openai style, \"json+\" = same \"json\" + cosine similarity matrix, \"raw\" = plain whitespace-delimited output (one embedding per line)",
@@ -2687,7 +2687,7 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
26872687
[](common_params & params) {
26882688
params.embedding = true;
26892689
}
2690-
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_EMBEDDINGS"));
2690+
).set_examples({LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_DEBUG}).set_env("LLAMA_ARG_EMBEDDINGS"));
26912691
add_opt(common_arg(
26922692
{"--rerank", "--reranking"},
26932693
string_format("enable reranking endpoint on server (default: %s)", "disabled"),
@@ -3378,6 +3378,27 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
33783378
}
33793379
}
33803380
).set_examples({ LLAMA_EXAMPLE_FINETUNE }));
3381+
add_opt(common_arg(
3382+
{"--save-logits"},
3383+
string_format("save final logits to files for verification (default: %s)", params.save_logits ? "true" : "false"),
3384+
[](common_params & params) {
3385+
params.save_logits = true;
3386+
}
3387+
).set_examples({LLAMA_EXAMPLE_DEBUG}));
3388+
add_opt(common_arg(
3389+
{"--logits-output-dir"}, "PATH",
3390+
string_format("directory for saving logits output files (default: %s)", params.logits_output_dir.c_str()),
3391+
[](common_params & params, const std::string & value) {
3392+
params.logits_output_dir = value;
3393+
}
3394+
).set_examples({LLAMA_EXAMPLE_DEBUG}));
3395+
add_opt(common_arg(
3396+
{"--tensor-filter"}, "REGEX",
3397+
"filter tensor names for debug output (regex pattern, can be specified multiple times)",
3398+
[](common_params & params, const std::string & value) {
3399+
params.tensor_filter.push_back(value);
3400+
}
3401+
).set_examples({LLAMA_EXAMPLE_DEBUG}));
33813402

33823403
// presets
33833404
add_opt(common_arg(

common/common.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ int32_t cpu_get_num_math();
8080
//
8181

8282
enum llama_example {
83+
LLAMA_EXAMPLE_DEBUG,
8384
LLAMA_EXAMPLE_COMMON,
8485
LLAMA_EXAMPLE_SPECULATIVE,
8586
LLAMA_EXAMPLE_COMPLETION,
@@ -372,6 +373,11 @@ struct common_params {
372373
std::string lookup_cache_dynamic = ""; // path of dynamic ngram cache file for lookup decoding // NOLINT
373374
std::string logits_file = ""; // file for saving *all* logits // NOLINT
374375

376+
// llama-debug specific options
377+
std::string logits_output_dir = "data"; // directory for saving logits output files // NOLINT
378+
bool save_logits = false; // whether to save logits to files // NOLINT
379+
std::vector<std::string> tensor_filter; // filter tensor names for debug output (regex) // NOLINT
380+
375381
std::vector<std::string> in_files; // all input files
376382
std::vector<std::string> antiprompt; // strings upon which more user input is prompted (a.k.a. reverse prompts)
377383
std::vector<llama_model_kv_override> kv_overrides;

examples/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ llama_add_compile_flags()
1515
if (EMSCRIPTEN)
1616
else()
1717
add_subdirectory(batched)
18+
add_subdirectory(debug)
1819
add_subdirectory(embedding)
1920
add_subdirectory(eval-callback)
2021

@@ -34,7 +35,6 @@ else()
3435
add_subdirectory(gen-docs)
3536
add_subdirectory(training)
3637
add_subdirectory(diffusion)
37-
add_subdirectory(model-conversion)
3838
if (NOT GGML_BACKEND_DL)
3939
add_subdirectory(convert-llama2c-to-ggml)
4040
# these examples use the backends directly and cannot be built with dynamic loading
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
set(TARGET llama-logits)
2-
add_executable(${TARGET} logits.cpp)
1+
set(TARGET llama-debug)
2+
add_executable(${TARGET} debug.cpp)
33
install(TARGETS ${TARGET} RUNTIME)
44
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
55
target_compile_features(${TARGET} PRIVATE cxx_std_17)

examples/debug/README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# llama.cpp/examples/debug
2+
3+
This is a utility intended to help debug a model by registering a callback that
4+
logs GGML operations and tensor data. It can also store the generated logits or
5+
embeddings as well as the prompt and token ids for comparision with the original
6+
model.
7+
8+
### Usage
9+
10+
```shell
11+
llama-debug \
12+
--hf-repo ggml-org/models \
13+
--hf-file phi-2/ggml-model-q4_0.gguf \
14+
--model phi-2-q4_0.gguf \
15+
--prompt hello \
16+
--save-logits \
17+
--verbose
18+
```
19+
The tensor data is logged as debug and required the --verbose flag. The reason
20+
for this is that while useful for a model with many layers there can be a lot of
21+
output. You can filter the tensor names using the `--tensor-filter` option.
22+
23+
A recommended approach is to first run without `--verbose` and see if the
24+
generated logits/embeddings are close to the original model. If they are not,
25+
then it might be required to inspect tensor by tensor and in that case it is
26+
useful to enable the `--verbose` flag along with `--tensor-filter` to focus on
27+
specific tensors.
28+
29+
### Options
30+
This example supports all standard `llama.cpp` options and also accepts the
31+
following options:
32+
```console
33+
$ llama-debug --help
34+
...
35+
36+
----- example-specific params -----
37+
38+
--save-logits save final logits to files for verification (default: false)
39+
--logits-output-dir PATH directory for saving logits output files (default: data)
40+
--tensor-filter REGEX filter tensor names for debug output (regex pattern, can be specified multiple times)
41+
```
42+
43+
### Output Files
44+
45+
When `--save-logits` is enabled, the following files are created in the output
46+
directory:
47+
48+
* `llamacpp-<model>[-embeddings].bin` - Binary output (logits or embeddings)
49+
* `llamacpp-<model>[-embeddings].txt` - Text output (logits or embeddings, one per line)
50+
* `llamacpp-<model>[-embeddings]-prompt.txt` - Prompt text and token IDs
51+
* `llamacpp-<model>[-embeddings]-tokens.bin` - Binary token IDs for programmatic comparison
52+
53+
These files can be compared against the original model's output to verify the
54+
converted model.

0 commit comments

Comments
 (0)