Conversation
- Updates the llama.cpp submodule to the latest version. - Adapts the code to the new llama.cpp API. - Fixes the build process. - Updates the tests to reflect the changes in the embeddings.
|
Fixes #18 |
|
Worked for me on Ubuntu as a minimal test script. If anyone else sees the size difference and is confused, that seems(?) to be expected (2MB -> 76kb) I wasnt able to figure out cross-compiling it to windows sadly. Thanks for this @rodydavis |
|
Excellent work on this PR @rodydavis ! The API adaptations are spot-on. One thing worth noting: the llama.cpp update changes the embedding values produced by BERT models. The test uses "alex garcia" with all-MiniLM-L6-v2: Old llama.cpp (2b33896): first float ≈ -0.092 I investigated the cause. It appears to be due to ggml-org/llama.cpp@6562e5a ("context: allow cache-less context for embeddings") which optimizes BERT models to skip KV cache allocation. A side effect is that llama_decode() now redirects to llama_encode() for these models, which returns the [CLS] token embedding instead of the last token embedding. Claude advises me that "this new behavior is actually more correct for BERT models [CLS] is the designated sentence-level representation". Absolute values have changed, but semantic similarity is preserved. Nonetheless, embeddings from old and new versions aren't directly comparable. The lesson is: users need to be aware that updates to llama.cpp have the potential to affect embeddings, which may require stored embeddings to be regenerated. |
|
@rodydavis you might also be interested in checking out PR#21 |
Updates the llama.cpp submodule and adapts code to the new API: - llama_tokenize() now takes vocab from llama_model_get_vocab() - llama_n_embd() -> llama_model_n_embd() - llama_kv_cache_clear() -> llama_memory_clear(llama_get_memory(), false) - llama_token_get_score() -> llama_vocab_get_score() - llama_token_to_piece() now takes vocab and additional parameter - llama_load_model_from_file() -> llama_model_load_from_file() - llama_new_context_with_model() -> llama_init_from_model() - llama_free_model() -> llama_model_free() - ggml_static -> ggml in CMakeLists.txt - Remove seed from context_options (no longer supported) Based on PR asg017#19 by @rodydavis. Co-Authored-By: Rody Davis <rody.davis.jr@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>
- Add Darwin arm64/x86_64 architecture detection in Makefile - Add tests/__pycache__/ to .gitignore From PR asg017#19 by @rodydavis. Co-Authored-By: Rody Davis <rody.davis.jr@gmail.com>
llama.cppsubmodule to the latest version.llama.cppAPI.