Skip to content

Commit b2c5e1b

Browse files
committed
resolve comments
Signed-off-by: Ceng23333 <441651826@qq.com>
1 parent 9ab03c7 commit b2c5e1b

File tree

13 files changed

+991
-351
lines changed

13 files changed

+991
-351
lines changed

.gitmodules

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,3 @@
55
path = third_party/nlohmann_json
66
url = https://github.com/nlohmann/json.git
77
branch = master
8-
[submodule "third_party/infllmv2_cuda_impl"]
9-
path = third_party/infllmv2_cuda_impl
10-
url = https://github.com/Ceng23333/infllmv2_cuda_impl.git
11-
branch = minicpm_sala_patches

README.md

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ API 定义以及使用方式详见 [`InfiniCore文档`](https://github.com/Infin
3939

4040
### 一、克隆项目
4141

42-
由于仓库中含有子模块,所以在克隆时请添加 `--recursive``--recurse-submodules`,如:
42+
由于仓库中含有子模块(如 `spdlog` / `nlohmann_json`,所以在克隆时请添加 `--recursive``--recurse-submodules`,如:
4343

4444
```shell
4545
git clone --recursive https://github.com/InfiniTensor/InfiniCore.git
@@ -51,6 +51,10 @@ git clone --recursive https://github.com/InfiniTensor/InfiniCore.git
5151
git submodule update --init --recursive
5252
```
5353

54+
> 注:InfLLM-V2 CUDA kernels(`infllmv2_cuda_impl`)为**可选依赖**,不会随仓库子模块默认拉取。
55+
> 如需启用 `--infllmv2`(见下文),请自行在任意目录克隆/编译该项目,并将生成的 `infllm_v2/*.so` 路径传给 xmake;
56+
> 或者将其手动放到 `InfiniCore/third_party/infllmv2_cuda_impl` 后再使用 `--infllmv2=y` 走自动探测。
57+
5458
配置`INFINI_ROOT``LD_LIBRARY_PATH` 环境变量。
5559
默认`INFINI_ROOT``$HOME/.infini`,可以使用以下命令自动配置:
5660

@@ -108,6 +112,8 @@ python scripts/install.py [XMAKE_CONFIG_FLAGS]
108112
| `--ninetoothed=[y\|n]` | 是否编译九齿实现 | n
109113
| `--ccl=[y\|n]` | 是否编译 InfiniCCL 通信库接口实现 | n
110114
| `--graph=[y\|n]` | 是否编译 cuda graph 接口实现 | n
115+
| `--aten=[y\|n]` | 是否链接 ATen / PyTorch(用于部分算子/对比测试) | n
116+
| `--infllmv2=[y\|PATH]` | **可选**:启用 InfLLM-V2 attention(需 `--aten=y`)。值为 `y`(探测 `third_party/infllmv2_cuda_impl`)或指向 `libinfllm_v2.so` / `infllmv2_cuda_impl` 根目录 | (空)
111117

112118
##### 手动安装底层库
113119

@@ -174,6 +180,64 @@ python scripts/install.py [XMAKE_CONFIG_FLAGS]
174180
175181
```
176182

183+
##### 试验功能 -- 使用 InfLLM-V2 CUDA kernels(可选)
184+
185+
InfLLM-V2 的 varlen/kvcache attention 需要额外的 CUDA kernels(`infllm_v2/*.so`)。该依赖为**可选**,需要你自行克隆并编译。
186+
187+
如果你希望将 `infllmv2_cuda_impl` 放在本仓库 `third_party/` 下(但不作为子模块管理),可以按以下方式拉取并编译,然后使用 `--infllmv2=y` 让 xmake 自动探测:
188+
189+
```bash
190+
cd InfiniCore
191+
192+
# Core submodules only (InfLLM-v2 不作为子模块强制拉取)
193+
git submodule sync third_party/spdlog third_party/nlohmann_json
194+
git submodule update --init third_party/spdlog third_party/nlohmann_json
195+
196+
# Fetch InfLLM-v2 into third_party if missing (NOT a git submodule).
197+
INFLLMV2_DIR="$PWD/third_party/infllmv2_cuda_impl"
198+
if [ ! -d "$INFLLMV2_DIR/.git" ]; then
199+
rm -rf "$INFLLMV2_DIR"
200+
git clone --depth 1 -b minicpm_sala_patches --recurse-submodules \
201+
https://github.com/Ceng23333/infllmv2_cuda_impl.git "$INFLLMV2_DIR"
202+
fi
203+
204+
cd "$INFLLMV2_DIR"
205+
git submodule update --init --recursive
206+
python3 setup.py install
207+
208+
cd ..
209+
python3 scripts/install.py --root --nv-gpu=y --cuda_arch=sm_80 --aten=y --infllmv2=y --ccl=y
210+
xmake build -r _infinicore
211+
xmake install _infinicore
212+
213+
export PYTHONPATH="$PWD/test/infinicore:$PWD/python:${PYTHONPATH:-}"
214+
python3 "$PWD/test/infinicore/ops/infllmv2_attention.py" --nvidia
215+
python3 "$PWD/test/infinicore/ops/simple_gla_prefill.py" --nvidia
216+
python3 "$PWD/test/infinicore/ops/simple_gla_decode_recurrent.py" --nvidia
217+
```
218+
219+
1. 构建 `infllmv2_cuda_impl`(示例,路径可自定义):
220+
221+
```shell
222+
git clone <your infllmv2_cuda_impl repo url> /abs/path/to/infllmv2_cuda_impl
223+
cd /abs/path/to/infllmv2_cuda_impl
224+
python setup.py install
225+
```
226+
227+
2. 配置并编译 InfiniCore(需要 `--aten=y`):
228+
229+
```shell
230+
# 方式 A:直接给 .so 的绝对路径(推荐,更明确)
231+
xmake f --nv-gpu=y --aten=y --infllmv2=/abs/path/to/libinfllm_v2.so -cv
232+
xmake build && xmake install
233+
234+
# 方式 B:给 infllmv2_cuda_impl 根目录(会探测 build/lib.*/infllm_v2/*.so)
235+
xmake f --nv-gpu=y --aten=y --infllmv2=/abs/path/to/infllmv2_cuda_impl -cv
236+
xmake build && xmake install
237+
```
238+
239+
运行时需要能找到该 `libinfllm_v2.so`(例如它的目录已在 rpath / `LD_LIBRARY_PATH` 中)。本项目在链接时会尝试写入 rpath 到对应目录,因此通常无需 `LD_PRELOAD`
240+
177241
2. 编译安装
178242

179243
默认安装路径为 `$HOME/.infini`
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
/**
2+
* Vendor API declarations for InfLLM-v2 attention kernels.
3+
*
4+
* This header is intentionally placed under `infinicore/adaptor/` because it
5+
* declares symbols provided by an external InfLLM-v2 shared library.
6+
*
7+
* NOTE: The vendor functions are declared in the global namespace to match the
8+
* upstream InfLLM-v2 entrypoints (e.g. `entry.cu`) and to keep linkage stable.
9+
*/
10+
#pragma once
11+
12+
#if defined(ENABLE_INFLLMV2) && defined(ENABLE_ATEN)
13+
14+
#include <ATen/ATen.h>
15+
#include <c10/util/Optional.h>
16+
#include <vector>
17+
18+
/** Varlen forward: unpadded Q/K/V with cu_seqlens. Returns {out, softmax_lse, ...}. */
19+
std::vector<at::Tensor> mha_varlen_fwd(
20+
at::Tensor &q,
21+
const at::Tensor &k,
22+
const at::Tensor &v,
23+
c10::optional<at::Tensor> &out_,
24+
const at::Tensor &cu_seqlens_q,
25+
const at::Tensor &cu_seqlens_k,
26+
c10::optional<at::Tensor> &seqused_k,
27+
c10::optional<const at::Tensor> &leftpad_k_,
28+
c10::optional<at::Tensor> &block_table_,
29+
c10::optional<at::Tensor> &alibi_slopes_,
30+
int max_seqlen_q,
31+
int max_seqlen_k,
32+
float p_dropout,
33+
float softmax_scale,
34+
bool zero_tensors,
35+
bool is_causal,
36+
int window_size_left,
37+
int window_size_right,
38+
float softcap,
39+
bool return_softmax,
40+
c10::optional<at::Generator> gen_,
41+
c10::optional<at::Tensor> &blockmask_);
42+
43+
/** KV-cache forward (decode). Returns {out, softmax_lse}. */
44+
std::vector<at::Tensor> mha_fwd_kvcache(
45+
at::Tensor &q,
46+
const at::Tensor &kcache,
47+
const at::Tensor &vcache,
48+
c10::optional<const at::Tensor> &k_,
49+
c10::optional<const at::Tensor> &v_,
50+
c10::optional<const at::Tensor> &seqlens_k_,
51+
c10::optional<const at::Tensor> &rotary_cos_,
52+
c10::optional<const at::Tensor> &rotary_sin_,
53+
c10::optional<const at::Tensor> &cache_batch_idx_,
54+
c10::optional<const at::Tensor> &leftpad_k_,
55+
c10::optional<at::Tensor> &block_table_,
56+
c10::optional<at::Tensor> &alibi_slopes_,
57+
c10::optional<at::Tensor> &out_,
58+
float softmax_scale,
59+
bool is_causal,
60+
int window_size_left,
61+
int window_size_right,
62+
float softcap,
63+
bool is_rotary_interleaved,
64+
int num_splits,
65+
c10::optional<at::Tensor> &blockmask_);
66+
67+
#endif // ENABLE_INFLLMV2 && ENABLE_ATEN
68+
Lines changed: 8 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,12 @@
11
/**
2-
* C++ API declarations for InfLLM-V2 attention kernels.
3-
* When ENABLE_INFLLMV2 is defined, link against the InfLLM-V2 library
4-
* (e.g. from infllmv2_cuda_impl) that provides these symbols.
5-
* Requires ENABLE_ATEN for at::Tensor.
6-
* Symbols are in global namespace to match entry.cu.
2+
* Backward-compatible include for the InfLLM-v2 vendor shim.
3+
*
4+
* The InfLLM-v2 entrypoints are provided by an external shared library and are
5+
* now declared under `infinicore/adaptor/infllmv2_api.hpp` to make the
6+
* dependency boundary explicit.
7+
*
8+
* The vendor symbols themselves remain in the global namespace.
79
*/
810
#pragma once
911

10-
#if defined(ENABLE_INFLLMV2) && defined(ENABLE_ATEN)
11-
12-
#include <ATen/ATen.h>
13-
#include <c10/util/Optional.h>
14-
#include <vector>
15-
16-
/** Varlen forward: unpadded Q/K/V with cu_seqlens. Returns {out, softmax_lse, ...}. */
17-
std::vector<at::Tensor> mha_varlen_fwd(
18-
at::Tensor &q,
19-
const at::Tensor &k,
20-
const at::Tensor &v,
21-
c10::optional<at::Tensor> &out_,
22-
const at::Tensor &cu_seqlens_q,
23-
const at::Tensor &cu_seqlens_k,
24-
c10::optional<at::Tensor> &seqused_k,
25-
c10::optional<const at::Tensor> &leftpad_k_,
26-
c10::optional<at::Tensor> &block_table_,
27-
c10::optional<at::Tensor> &alibi_slopes_,
28-
int max_seqlen_q,
29-
int max_seqlen_k,
30-
float p_dropout,
31-
float softmax_scale,
32-
bool zero_tensors,
33-
bool is_causal,
34-
int window_size_left,
35-
int window_size_right,
36-
float softcap,
37-
bool return_softmax,
38-
c10::optional<at::Generator> gen_,
39-
c10::optional<at::Tensor> &blockmask_);
40-
41-
/** KV-cache forward (decode). Returns {out, softmax_lse}. */
42-
std::vector<at::Tensor> mha_fwd_kvcache(
43-
at::Tensor &q,
44-
const at::Tensor &kcache,
45-
const at::Tensor &vcache,
46-
c10::optional<const at::Tensor> &k_,
47-
c10::optional<const at::Tensor> &v_,
48-
c10::optional<const at::Tensor> &seqlens_k_,
49-
c10::optional<const at::Tensor> &rotary_cos_,
50-
c10::optional<const at::Tensor> &rotary_sin_,
51-
c10::optional<const at::Tensor> &cache_batch_idx_,
52-
c10::optional<const at::Tensor> &leftpad_k_,
53-
c10::optional<at::Tensor> &block_table_,
54-
c10::optional<at::Tensor> &alibi_slopes_,
55-
c10::optional<at::Tensor> &out_,
56-
float softmax_scale,
57-
bool is_causal,
58-
int window_size_left,
59-
int window_size_right,
60-
float softcap,
61-
bool is_rotary_interleaved,
62-
int num_splits,
63-
c10::optional<at::Tensor> &blockmask_);
64-
65-
#endif // ENABLE_INFLLMV2 && ENABLE_ATEN
12+
#include "infinicore/adaptor/infllmv2_api.hpp"

0 commit comments

Comments
 (0)