Skip to content

Commit 58b7fd4

Browse files
authored
fix ci and update docs (#940)
1 parent 81ba0ea commit 58b7fd4

16 files changed

Lines changed: 493 additions & 48 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ LightLLM is a Python-based LLM (Large Language Model) inference and serving fram
2929

3030
- [Install LightLLM](https://lightllm-en.readthedocs.io/en/latest/getting_started/installation.html)
3131
- [Quick Start](https://lightllm-en.readthedocs.io/en/latest/getting_started/quickstart.html)
32-
- [TuTorial](https://lightllm-en.readthedocs.io/en/latest/tutorial/)
32+
- [TuTorial](https://lightllm-en.readthedocs.io/en/latest/tutorial/deepseek_deployment.html)
3333

3434

3535
## Performance

docker/Dockerfile.deepep

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,6 @@ RUN chmod 777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-ge
1717
git && \
1818
rm -rf /var/lib/apt/lists/*
1919

20-
ENV http_proxy=http://devsft:d0663c03baee@10.119.176.202:3128
21-
ENV https_proxy=http://devsft:d0663c03baee@10.119.176.202:3128
2220
RUN case ${TARGETPLATFORM} in \
2321
"linux/arm64") MAMBA_ARCH=aarch64 ;; \
2422
*) MAMBA_ARCH=x86_64 ;; \
@@ -40,10 +38,9 @@ WORKDIR /root
4038
COPY ./requirements.txt /lightllm/requirements.txt
4139
RUN pip install -r /lightllm/requirements.txt --no-cache-dir --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124
4240

43-
RUN pip install --no-cache-dir nvidia-nccl-cu12==2.25.1 # for allreduce hang issues in multinode H100
41+
RUN pip install --no-cache-dir https://github.com/ModelTC/flash-attn-3-build/releases/download/v2.7.4.post1/flash_attn-3.0.0b1-cp39-cp39-linux_x86_64.whl
4442

45-
RUN git clone https://github.com/Dao-AILab/flash-attention.git -b v2.7.4.post1
46-
RUN cd flash-attention/hopper && FLASH_ATTN_CUDA_ARCHS=90 NVCC_THREADS=128 python setup.py install
43+
RUN pip install --no-cache-dir nvidia-nccl-cu12==2.25.1 # for allreduce hang issues in multinode H100
4744

4845
RUN git clone --recursive https://github.com/deepseek-ai/DeepGEMM.git
4946
RUN cd DeepGEMM && python setup.py install

docs/CN/source/getting_started/installation.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,6 @@ Lightllm 是一个纯python开发的推理框架,其中的算子使用triton
7474
$ # 安装lightllm
7575
$ python setup.py install
7676
77-
NOTE: 如果您出于一些原因使用了cuda 11.x的torch, 请运行 `pip install nvidia-nccl-cu12==2.20.5` 以支持 torch cuda graph.
78-
7977
.. note::
8078

8179
Lightllm 的代码在多种GPU上都进行了测试,包括 V100, A100, A800, 4090, 和 H800。

docs/CN/source/tutorial/deepseek_deployment.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
199199
--disable_cudagraph \
200200
--pd_master_ip $pd_master_ip \
201201
--pd_master_port 60011
202-
# if you want to enable microbatch overlap, you can uncomment the following lines
202+
# 如果需要启用微批次重叠,可以取消注释以下行
203203
#--enable_prefill_microbatch_overlap
204204
205205
**步骤 3: 启动 Decode 服务**
@@ -223,7 +223,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
223223
--disable_cudagraph \
224224
--pd_master_ip $pd_master_ip \
225225
--pd_master_port 60011
226-
# if you want to enable microbatch overlap, you can uncomment the following lines
226+
# 如果需要启用微批次重叠,可以取消注释以下行
227227
#--enable_decode_microbatch_overlap
228228
229229
3.2 多 PD Master 模式
@@ -291,7 +291,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
291291
--disable_cudagraph \
292292
--config_server_host $config_server_host \
293293
--config_server_port 60088
294-
# if you want to enable microbatch overlap, you can uncomment the following lines
294+
# 如果需要启用微批次重叠,可以取消注释以下行
295295
#--enable_prefill_microbatch_overlap
296296
297297
# Decode 服务
@@ -309,7 +309,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
309309
--enable_fa3 \
310310
--config_server_host $config_server_host \
311311
--config_server_port 60088
312-
# if you want to enable microbatch overlap, you can uncomment the following lines
312+
# 如果需要启用微批次重叠,可以取消注释以下行
313313
#--enable_decode_microbatch_overlap
314314
315315
4. 测试和验证

docs/EN/.readthedocs.yaml

100644100755
File mode changed.

docs/EN/source/framework/framework.rst

100644100755
File mode changed.

docs/EN/source/framework/router.rst

100644100755
File mode changed.

docs/EN/source/framework/token_attention.rst

100644100755
File mode changed.

docs/EN/source/getting_started/benchmark.rst

100644100755
File mode changed.

docs/EN/source/getting_started/installation.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,6 @@ You can also install Lightllm from source:
7474
$ # Install Lightllm
7575
$ python setup.py install
7676
77-
NOTE: If you use torch with cuda 11.x for some reason, please run `pip install nvidia-nccl-cu12==2.20.5` to support torch cuda graph.
78-
7977
.. note::
8078

8179
Lightllm code has been tested on various GPUs including V100, A100, A800, 4090, and H800.

0 commit comments

Comments
 (0)