Skip to content

Commit fa3eedc

Browse files
PMZFXArberSephirotheca
authored andcommitted
[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (ggml-org#22035)
* [SYCL] Fix reorder MMVQ assert on unaligned vocab sizes The reorder mul_mat_vec_q dispatchers for Q4_0, Q8_0, Q4_K, and Q6_K asserted that block_num_y was a multiple of 16 subgroups. Models with a vocab size not divisible by 16 (for example HY-MT at 120818) aborted on model load when the output projection tripped the assert. I replaced the assert with padding: block_num_y now rounds up to a whole number of subgroup-sized workgroups. The kernel already has the row bounds check (`if (row >= nrows) return;`) so the extra padded threads early-exit cleanly. Row values are uniform across a subgroup so the collective reduce stays safe. For aligned vocab sizes the padded block_num_y equals the old value, so the kernel launch is identical and there is no regression. Thanks to @arthw for flagging the relationship to ggml-org#21527. Fixes ggml-org#22020. AI assisted coding, tested on Intel B70 hardware. * sycl: use WARP_SIZE for num_subgroups in reorder MMVQ launches Replaces the hardcoded 16 with WARP_SIZE in the four reorder_mul_mat_vec launch helpers (Q4_0, Q8_0, Q4_K, Q6_K). Compile-time no-op on the Intel target where WARP_SIZE is 16, but makes the relationship to subgroup size explicit. Per review by @NeoZhangJianyu on ggml-org#22035. Assisted by Claude.
1 parent ae1f5b8 commit fa3eedc

1 file changed

Lines changed: 12 additions & 12 deletions

File tree

ggml/src/ggml-sycl/mmvq.cpp

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -537,9 +537,9 @@ static void mul_mat_vec_q_iq4_xs_q8_1(const void *__restrict__ vx,
537537
static void reorder_mul_mat_vec_q4_0_q8_1_sycl(const void * vx, const void * vy, float * dst, const int ncols,
538538
const int nrows, dpct::queue_ptr stream) {
539539
GGML_ASSERT(ncols % QK4_0 == 0);
540-
const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y);
541-
constexpr size_t num_subgroups = 16;
542-
GGML_ASSERT(block_num_y % num_subgroups == 0);
540+
// Round up to a whole number of subgroup-sized workgroups; out-of-range rows are skipped inside the kernel.
541+
constexpr size_t num_subgroups = WARP_SIZE;
542+
const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y * (int) num_subgroups) * (int) num_subgroups;
543543

544544
const sycl::range<3> global_size(1, GGML_SYCL_MMV_Y, (block_num_y * WARP_SIZE));
545545
const sycl::range<3> workgroup_size(1, GGML_SYCL_MMV_Y, num_subgroups * WARP_SIZE);
@@ -682,9 +682,9 @@ static void mul_mat_vec_q5_1_q8_1_sycl(const void *vx, const void *vy,
682682
static void reorder_mul_mat_vec_q8_0_q8_1_sycl(const void * vx, const void * vy, float * dst, const int ncols,
683683
const int nrows, dpct::queue_ptr stream) {
684684
GGML_ASSERT(ncols % QK8_0 == 0);
685-
const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y);
686-
constexpr size_t num_subgroups = 16;
687-
GGML_ASSERT(block_num_y % num_subgroups == 0);
685+
// Round up to a whole number of subgroup-sized workgroups; out-of-range rows are skipped inside the kernel.
686+
constexpr size_t num_subgroups = WARP_SIZE;
687+
const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y * (int) num_subgroups) * (int) num_subgroups;
688688

689689
const sycl::range<3> global_size(1, GGML_SYCL_MMV_Y, (block_num_y * WARP_SIZE));
690690
const sycl::range<3> workgroup_size(1, GGML_SYCL_MMV_Y, num_subgroups * WARP_SIZE);
@@ -798,9 +798,9 @@ static void reorder_mul_mat_vec_q4_k_q8_1_sycl(const void * vx, const void * vy,
798798
const int nrows, dpct::queue_ptr stream) {
799799
GGML_ASSERT(ncols % QK_K == 0);
800800

801-
const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y);
802-
constexpr size_t num_subgroups = 16;
803-
GGML_ASSERT(block_num_y % num_subgroups == 0);
801+
// Round up to a whole number of subgroup-sized workgroups; out-of-range rows are skipped inside the kernel.
802+
constexpr size_t num_subgroups = WARP_SIZE;
803+
const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y * (int) num_subgroups) * (int) num_subgroups;
804804

805805
const sycl::range<3> global_size(1, GGML_SYCL_MMV_Y, block_num_y * WARP_SIZE);
806806
const sycl::range<3> workgroup_size(1, GGML_SYCL_MMV_Y, num_subgroups * WARP_SIZE);
@@ -842,9 +842,9 @@ static void mul_mat_vec_q5_K_q8_1_sycl(const void *vx, const void *vy,
842842
static void reorder_mul_mat_vec_q6_k_q8_1_sycl(const void * vx, const void * vy, float * dst, const int ncols,
843843
const int nrows, dpct::queue_ptr stream) {
844844
GGML_ASSERT(ncols % QK_K == 0);
845-
const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y);
846-
constexpr size_t num_subgroups = 16;
847-
GGML_ASSERT(block_num_y % num_subgroups == 0);
845+
// Round up to a whole number of subgroup-sized workgroups; out-of-range rows are skipped inside the kernel.
846+
constexpr size_t num_subgroups = WARP_SIZE;
847+
const int block_num_y = ceil_div(nrows, GGML_SYCL_MMV_Y * (int) num_subgroups) * (int) num_subgroups;
848848

849849
const sycl::range<3> global_size(1, GGML_SYCL_MMV_Y, block_num_y * WARP_SIZE);
850850
const sycl::range<3> workgroup_size(1, GGML_SYCL_MMV_Y, num_subgroups * WARP_SIZE);

0 commit comments

Comments
 (0)