Skip to content

Commit fa7c133

Browse files
Merge pull request #497 from janhq/update-dev-from-master-2026-04-26-01-01
Sync master with upstream release b8933
2 parents ee1b17c + dcad77c commit fa7c133

29 files changed

Lines changed: 5195 additions & 1985 deletions

.github/pull_request_template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<!-- You can provide more details and link related discussions here. Delete this section if not applicable -->
88

9-
# Requirements
9+
## Requirements
1010

1111
<!-- IMPORTANT: Please do NOT delete this section, otherwise your PR may be rejected -->
1212

.gitignore

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@
3434
/.vscode/
3535
/nppBackup
3636

37-
3837
# Coverage
3938

4039
/gcovr-report/
@@ -74,6 +73,7 @@
7473
!/models/templates
7574

7675
# Zig
76+
7777
/zig-out/
7878
/zig-cache/
7979

@@ -93,6 +93,7 @@
9393
!/examples/sycl/*.sh
9494

9595
# Server Web UI temporary files
96+
9697
/tools/server/webui/node_modules
9798
/tools/server/webui/dist
9899
# we no longer use gz for index.html
@@ -106,9 +107,11 @@ __pycache__/
106107
poetry.toml
107108

108109
# Nix
110+
109111
/result
110112

111113
# Test binaries
114+
112115
/tests/test-backend-ops
113116
/tests/test-double-float
114117
/tests/test-grad0
@@ -124,6 +127,7 @@ poetry.toml
124127
/tests/test-tokenizer-1-spm
125128

126129
# Scripts
130+
127131
!/scripts/install-oneapi.bat
128132

129133
# Generated by scripts
@@ -132,18 +136,24 @@ poetry.toml
132136
/wikitext-2-raw/
133137

134138
# Test models for lora adapters
139+
135140
/lora-tests
136141

137142
# Local scripts
143+
138144
/run-vim.sh
139145
/run-chat.sh
140146
/run-spec.sh
141147
/.ccache/
142148

143149
# IDE
150+
144151
/*.code-workspace
145152
/.windsurf/
146153
# emscripten
147154
a.out.*
148155

156+
# AGENTS
157+
149158
AGENTS.local.md
159+
.pi/SYSTEM.md

.pi/gg/SYSTEM.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
You are a coding agent. Here are some very important rules that you must follow:
2+
3+
General:
4+
- By very precise and concise when writing code, comments, explanations, etc.
5+
- PR and commit titles format: `<module> : <title>`. Lookup recents for examples
6+
- Don't try to build or run the code unless you are explicitly asked to do so
7+
8+
Coding:
9+
- When in doubt, always refer to the CONTRIBUTING.md file of the project
10+
- When referencing issues or PRs in comments, use the format:
11+
- C/C++ code: `// ref: <url>`
12+
- Other (CMake, etc.): `# ref: <url>`
13+
14+
Pull requests (PRs):
15+
- New branch names are prefixed with "gg/"
16+
- Before opening a pull request, ask the user to confirm the description
17+
- When creating a pull request, look for the repository's PR template and follow it
18+
- For the AI usage disclosure section, write "YES. llama.cpp + pi"
19+
- Always create the pull requests in draft mode
20+
21+
Commits:
22+
- On every commit that you make, include a "Assisted-by: llama.cpp:local pi" tag
23+
- Do not explicitly set the git author in commits - rely on the default git config
24+
25+
Resources (read on demand):
26+
- [CONTRIBUTING.md](CONTRIBUTING.md)
27+
- [Build documentation](docs/build.md)
28+
- [Server usage documentation](tools/server/README.md)
29+
- [Server development documentation](tools/server/README-dev.md)
30+
- [PEG parser](docs/development/parsing.md)
31+
- [Auto parser](docs/autoparser.md)
32+
- [Jinja engine](common/jinja/README.md)
33+
- [PR template](.github/pull_request_template.md)

common/chat-diff-analyzer.cpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,7 @@ void analyze_reasoning::compare_reasoning_presence() {
296296
return p.literal(reasoning_content) + p.space() + p.optional(p.tag("post", (p.marker() + p.space())) + p.rest());
297297
});
298298
auto parser_wrapped = build_tagged_peg_parser([&](common_peg_parser_builder &p) {
299-
return p.tag("pre", p.marker() + p.space()) + p.literal(reasoning_content) + p.space() + p.tag("post", (p.marker() + p.space())) + p.rest();
299+
return p.tag("pre", p.marker() + p.space()) + p.literal(reasoning_content) + p.tag("post", (p.space() + p.marker() + p.space())) + p.rest();
300300
});
301301
// try the more aggressive parse first, if it fails, fall back to the delimiter one
302302
auto result = parser_wrapped.parse_anywhere_and_extract(comparison->output_B);
@@ -306,11 +306,11 @@ void analyze_reasoning::compare_reasoning_presence() {
306306
if (result.result.success()) {
307307
if (!result.tags["pre"].empty() && !result.tags["post"].empty()) {
308308
mode = reasoning_mode::TAG_BASED;
309-
start = trim_leading_whitespace(result.tags["pre"]);
310-
end = trim_trailing_whitespace(result.tags["post"]);
309+
start = result.tags["pre"];
310+
end = result.tags["post"];
311311
} else if (!result.tags["post"].empty()) {
312312
mode = reasoning_mode::TAG_BASED;
313-
end = trim_trailing_whitespace(result.tags["post"]);
313+
end = result.tags["post"];
314314
}
315315
}
316316
}

common/speculative.cpp

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -61,18 +61,26 @@ static bool common_speculative_are_compatible(
6161
LOG_DBG("%s: vocab_type dft: %d\n", __func__, vocab_type_dft);
6262

6363
if (vocab_type_tgt != vocab_type_dft) {
64-
LOG_DBG("%s: draft model vocab type must match target model to use speculation but ", __func__);
65-
LOG_DBG("vocab_type_dft = %d while vocab_type_tgt = %d\n", vocab_type_dft, vocab_type_tgt);
64+
LOG_WRN("%s: draft model vocab type must match target model to use speculation but "
65+
"vocab_type_dft = %d while vocab_type_tgt = %d\n", __func__, vocab_type_dft, vocab_type_tgt);
6666
return false;
6767
}
6868

69-
if (
70-
llama_vocab_get_add_bos(vocab_tgt) != llama_vocab_get_add_bos(vocab_dft) ||
71-
llama_vocab_get_add_eos(vocab_tgt) != llama_vocab_get_add_eos(vocab_dft) ||
72-
llama_vocab_bos(vocab_tgt) != llama_vocab_bos(vocab_dft) ||
73-
llama_vocab_eos(vocab_tgt) != llama_vocab_eos(vocab_dft)
74-
) {
75-
LOG_DBG("%s: draft model special tokens must match target model to use speculation\n", __func__);
69+
if (llama_vocab_get_add_bos(vocab_tgt) != llama_vocab_get_add_bos(vocab_dft) ||
70+
(llama_vocab_get_add_bos(vocab_tgt) && llama_vocab_bos(vocab_tgt) != llama_vocab_bos(vocab_dft))) {
71+
LOG_WRN("%s: draft model bos tokens must match target model to use speculation. add: %d - %d, id: %d - %d)\n",
72+
__func__,
73+
llama_vocab_get_add_bos(vocab_tgt), llama_vocab_get_add_bos(vocab_dft),
74+
llama_vocab_bos(vocab_tgt), llama_vocab_bos(vocab_dft));
75+
return false;
76+
}
77+
78+
if (llama_vocab_get_add_eos(vocab_tgt) != llama_vocab_get_add_eos(vocab_dft) ||
79+
(llama_vocab_get_add_eos(vocab_tgt) && llama_vocab_eos(vocab_tgt) != llama_vocab_eos(vocab_dft))) {
80+
LOG_WRN("%s: draft model eos tokens must match target model to use speculation. add: %d - %d, id: %d - %d)\n",
81+
__func__,
82+
llama_vocab_get_add_eos(vocab_tgt), llama_vocab_get_add_eos(vocab_dft),
83+
llama_vocab_eos(vocab_tgt), llama_vocab_eos(vocab_dft));
7684
return false;
7785
}
7886

docs/backend/SYCL.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,12 @@ The packages for FP32 and FP16 would have different accuracy and performance on
5151

5252
## News
5353

54+
- 2026.04
55+
56+
- Optimize mul_mat by reorder feature for data type: Q4_K, Q5_K, Q_K, Q8_0.
57+
- Fused MoE.
58+
- Upgrate CI and built package for oneAPI 2025.3.3, support Ubuntu 24.04 built package.
59+
5460
- 2026.03
5561
- Support Flash-Attention: less memory usage, performance impact depends on LLM.
5662

@@ -349,6 +355,12 @@ Choose one of following methods to run.
349355
./examples/sycl/test.sh
350356
```
351357

358+
- Run llama-server:
359+
360+
```sh
361+
./examples/sycl/start-svr.sh -m PATH/MODEL_FILE
362+
```
363+
352364
2. Command line
353365
Launch inference
354366

@@ -637,10 +649,18 @@ Choose one of following methods to run.
637649

638650
1. Script
639651

652+
- Run test:
653+
640654
```
641655
examples\sycl\win-test.bat
642656
```
643657

658+
- Run llama-server:
659+
660+
```
661+
examples\sycl\win-start-svr.bat -m PATH\MODEL_FILE
662+
```
663+
644664
2. Command line
645665

646666
Launch inference

docs/ops.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Legend:
2626
| CLAMP |||||| 🟡 | 🟡 | 🟡 ||||
2727
| CONCAT |||| 🟡 || 🟡 ||||||
2828
| CONT || 🟡 |||| 🟡 | 🟡 || 🟡 |||
29-
| CONV_2D ||||||||| |||
29+
| CONV_2D ||||||||| |||
3030
| CONV_2D_DW ||||||||||||
3131
| CONV_3D ||||||||||||
3232
| CONV_TRANSPOSE_1D ||||||||||||
@@ -60,7 +60,7 @@ Legend:
6060
| GROUP_NORM ||||||||||||
6161
| HARDSIGMOID |||| 🟡 |||| 🟡 ||||
6262
| HARDSWISH |||| 🟡 |||| 🟡 ||||
63-
| IM2COL ||||||||| |||
63+
| IM2COL ||||||||| |||
6464
| IM2COL_3D ||||||||||||
6565
| L2_NORM ||||||||||||
6666
| LEAKY_RELU ||||| 🟡 ||| 🟡 ||||
@@ -105,7 +105,7 @@ Legend:
105105
| SQR ||||||| 🟡 | 🟡 ||||
106106
| SQRT ||||||| 🟡 | 🟡 ||||
107107
| SSM_CONV ||||||||||||
108-
| SSM_SCAN |||||||| 🟡 | |||
108+
| SSM_SCAN |||||||| 🟡 | |||
109109
| STEP |||| 🟡 |||| 🟡 ||||
110110
| SUB ||||| 🟡 |||||||
111111
| SUM || 🟡 || 🟡 | 🟡 || 🟡 | 🟡 | 🟡 |||

0 commit comments

Comments
 (0)