|
9 | 9 | - [Linux](#linux) |
10 | 10 | - [Windows](#windows) |
11 | 11 | - [Environment Variable](#environment-variable) |
| 12 | +- [Design Rule](#design-rule) |
12 | 13 | - [Known Issue](#known-issues) |
13 | 14 | - [Q&A](#qa) |
14 | 15 | - [TODO](#todo) |
@@ -41,6 +42,9 @@ The following releases are verified and recommended: |
41 | 42 |
|
42 | 43 | ## News |
43 | 44 |
|
| 45 | +- 2026.03 |
| 46 | + - Support Flash-Attention: less memory usage, performance impact depends on LLM. |
| 47 | + |
44 | 48 | - 2026.02 |
45 | 49 | - Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nvidia & AMD GPU is unavailable: download/installation channels are out of work. User can't build up the software for Nvidia & AMD GPU. |
46 | 50 |
|
@@ -685,13 +689,36 @@ use 1 SYCL GPUs: [0] with Max compute units:512 |
685 | 689 | | Name | Value | Function | |
686 | 690 | |-------------------|------------------|---------------------------------------------------------------------------------------------------------------------------| |
687 | 691 | | GGML_SYCL_DEBUG | 0 (default) or 1 | Enable log function by macro: GGML_SYCL_DEBUG | |
| 692 | +| GGML_SYCL_ENABLE_FLASH_ATTN | 1 (default) or 0| Enable Flash-Attention. It can reduce memory usage. The performance impact depends on the LLM.| |
688 | 693 | | GGML_SYCL_DISABLE_OPT | 0 (default) or 1 | Disable optimize features for Intel GPUs. (Recommended to 1 for intel devices older than Gen 10) | |
689 | 694 | | GGML_SYCL_DISABLE_GRAPH | 0 or 1 (default) | Disable running computations through SYCL Graphs feature. Disabled by default because SYCL Graph is still on development, no better performance. | |
690 | 695 | | GGML_SYCL_DISABLE_DNN | 0 (default) or 1 | Disable running computations through oneDNN and always use oneMKL. | |
691 | 696 | | ZES_ENABLE_SYSMAN | 0 (default) or 1 | Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer | |
692 | 697 | | UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS | 0 (default) or 1 | Support malloc device memory more than 4GB.| |
693 | 698 |
|
| 699 | +## Design Rule |
| 700 | + |
| 701 | +- Open to all contributors. |
| 702 | + |
| 703 | +- All code change should be useful to user: |
| 704 | + - Fix bug. |
| 705 | + - Add new function. |
| 706 | + - Improve the performance/usage. |
| 707 | + - Make code be easy to maintain. |
| 708 | + - ... |
| 709 | + |
| 710 | +- Don't accept the codes of following cases: |
| 711 | + - Break legacy function. |
| 712 | + - Reduce the performance of legacy case in default. |
| 713 | + - Not completed work/the functionality cannot be demonstrated. |
| 714 | + |
| 715 | +- Encourage to use environment variable to control features to be opened/closed. |
| 716 | + - User can evaluate the feature without rebuild the code. |
| 717 | + - Recommend the best features to user by setting them be opened as default. |
| 718 | + |
| 719 | +- Design the code based on the published official releases of oneAPI packages: compiler, library, driver, OS kernel. |
694 | 720 |
|
| 721 | +- Developers need to maintain the code they submit. |
695 | 722 |
|
696 | 723 | ## Known Issues |
697 | 724 |
|
|
0 commit comments