-
Notifications
You must be signed in to change notification settings - Fork 334
pd with nixl backend (rebase main) #1002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
45db8f2
2499931
5927f27
7c2b42a
7bdc600
7ea9f99
26ef9f8
f12ebcf
bbca373
87a4982
605f661
5ad5c63
b1ee003
408d1e2
dcee14f
1d3f209
8a09e8c
ffccb07
7edb0d2
4f41934
70e7a86
cb3b84d
352f65a
1764ca8
d3e7848
9c2574d
3f3596d
2370548
643bca5
3e1380c
7671d13
d8cf4de
6450d72
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| ARG CUDA_VERSION=12.6.1 | ||
| FROM nvidia/cuda:${CUDA_VERSION}-cudnn-devel-ubuntu22.04 | ||
| ARG PYTHON_VERSION=3.10 | ||
| ARG MAMBA_VERSION=24.7.1-0 | ||
| ARG TARGETPLATFORM | ||
| ENV PATH=/opt/conda/bin:$PATH \ | ||
| CONDA_PREFIX=/opt/conda | ||
|
|
||
| RUN chmod 777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ | ||
| ca-certificates \ | ||
| libssl-dev \ | ||
| curl \ | ||
| g++ \ | ||
| make \ | ||
| git && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN case ${TARGETPLATFORM} in \ | ||
| "linux/arm64") MAMBA_ARCH=aarch64 ;; \ | ||
| *) MAMBA_ARCH=x86_64 ;; \ | ||
| esac && \ | ||
| curl -fsSL -o ~/mambaforge.sh -v "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \ | ||
| bash ~/mambaforge.sh -b -p /opt/conda && \ | ||
| rm ~/mambaforge.sh | ||
|
|
||
| RUN case ${TARGETPLATFORM} in \ | ||
| "linux/arm64") exit 1 ;; \ | ||
| *) /opt/conda/bin/conda update -y conda && \ | ||
| /opt/conda/bin/conda install -y "python=${PYTHON_VERSION}" ;; \ | ||
| esac && \ | ||
| /opt/conda/bin/conda clean -ya | ||
|
|
||
|
|
||
| WORKDIR /root | ||
|
|
||
| COPY ./requirements.txt /lightllm/requirements.txt | ||
| RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124 | ||
|
|
||
| RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly | ||
| RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cloning from the |
||
|
|
||
| RUN apt-get update && apt-get install -y libnuma-dev # for sgl_kernel | ||
|
|
||
| RUN apt-get update && apt-get install -y cmake automake autotools-dev libtool libz-dev && \ | ||
| DEBIAN_FRONTEND=noninteractive apt-get -y install --reinstall libibverbs-dev rdma-core ibverbs-utils libibumad-dev; \ | ||
| rm -rf /usr/lib/ucx && \ | ||
| rm -rf /opt/hpcx/ucx && \ | ||
| cd /usr/local/src && \ | ||
| git clone https://github.com/openucx/ucx.git && \ | ||
| cd ucx && \ | ||
| git checkout v1.19.x && \ | ||
| ./autogen.sh && ./configure \ | ||
| --enable-shared \ | ||
| --disable-static \ | ||
| --disable-doxygen-doc \ | ||
| --enable-optimizations \ | ||
| --enable-cma \ | ||
| --enable-devel-headers \ | ||
| --with-cuda=/usr/local/cuda \ | ||
| --with-verbs=yes \ | ||
| --with-dm \ | ||
| --with-gdrcopy=/usr/local \ | ||
| --with-efa \ | ||
| --enable-mt && \ | ||
| make -j && \ | ||
| make -j install-strip && \ | ||
| ldconfig; | ||
|
|
||
| RUN apt-get update && apt-get install -y pkg-config tmux net-tools; \ | ||
| cd /usr/local/src; \ | ||
| pip install --upgrade meson pybind11 patchelf; \ | ||
| git clone https://github.com/ai-dynamo/nixl.git -b main && \ | ||
| cd nixl && \ | ||
| rm -rf build && \ | ||
| mkdir build && \ | ||
| meson setup build/ --prefix=/usr/local/nixl --buildtype=release && \ | ||
| cd build && \ | ||
| ninja && \ | ||
| ninja install && \ | ||
| cd .. && pip install . --no-deps; | ||
|
|
||
| COPY . /lightllm | ||
| RUN pip install -e /lightllm --no-cache-dir | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| ARG CUDA_VERSION=12.6.1 | ||
| FROM nvidia/cuda:${CUDA_VERSION}-cudnn-devel-ubuntu22.04 | ||
|
|
||
| ARG PYTHON_VERSION=3.10 | ||
| ARG MAMBA_VERSION=24.7.1-0 | ||
| ARG TARGETPLATFORM | ||
|
|
||
| ENV PATH=/opt/conda/bin:$PATH \ | ||
| CONDA_PREFIX=/opt/conda | ||
|
|
||
| RUN chmod 777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ | ||
| ca-certificates \ | ||
| libssl-dev \ | ||
| curl \ | ||
| g++ \ | ||
| make \ | ||
| git && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
Comment on lines
+11
to
+18
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This
It's recommended to address these for a more secure and efficient Docker image. |
||
|
|
||
| RUN case ${TARGETPLATFORM} in \ | ||
| "linux/arm64") MAMBA_ARCH=aarch64 ;; \ | ||
| *) MAMBA_ARCH=x86_64 ;; \ | ||
| esac && \ | ||
| curl -fsSL -o ~/mambaforge.sh -v "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \ | ||
| bash ~/mambaforge.sh -b -p /opt/conda && \ | ||
| rm ~/mambaforge.sh | ||
|
|
||
| RUN case ${TARGETPLATFORM} in \ | ||
| "linux/arm64") exit 1 ;; \ | ||
| *) /opt/conda/bin/conda update -y conda && \ | ||
| /opt/conda/bin/conda install -y "python=${PYTHON_VERSION}" ;; \ | ||
| esac && \ | ||
| /opt/conda/bin/conda clean -ya | ||
|
|
||
|
|
||
| WORKDIR /root | ||
|
|
||
| COPY ./requirements.txt /lightllm/requirements.txt | ||
| RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124 | ||
|
|
||
| RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly | ||
| RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v . | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cloning from the |
||
|
|
||
| RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms | ||
| RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev | ||
|
|
||
| ENV CUDA_HOME=/usr/local/cuda \ | ||
| GDRCOPY_HOME=/usr/src/gdrdrv-2.4.4/ | ||
|
|
||
| RUN mkdir -p /tmp/gdrcopy && cd /tmp \ | ||
| && git clone https://github.com/NVIDIA/gdrcopy.git -b v2.4.4 \ | ||
| && cd gdrcopy/packages \ | ||
| && CUDA=/usr/local/cuda ./build-deb-packages.sh \ | ||
| && dpkg -i gdrdrv-dkms_*.deb libgdrapi_*.deb gdrcopy-tests_*.deb gdrcopy_*.deb \ | ||
| && cd / && rm -rf /tmp/gdrcopy | ||
|
|
||
| # Fix DeepEP IBGDA symlink | ||
| RUN ln -sf /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so | ||
|
|
||
| RUN wget https://developer.download.nvidia.com/compute/redist/nvshmem/3.3.9/source/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \ | ||
| && tar -xf nvshmem_src_cuda12-all-all-3.3.9.tar.gz && mv nvshmem_src nvshmem \ | ||
| && cd nvshmem \ | ||
| && rm -f /root/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \ | ||
| && NVSHMEM_SHMEM_SUPPORT=0 \ | ||
| NVSHMEM_UCX_SUPPORT=0 \ | ||
| NVSHMEM_USE_NCCL=0 \ | ||
| NVSHMEM_MPI_SUPPORT=0 \ | ||
| NVSHMEM_IBGDA_SUPPORT=1 \ | ||
| NVSHMEM_PMIX_SUPPORT=0 \ | ||
| NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \ | ||
| NVSHMEM_USE_GDRCOPY=1 \ | ||
| cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/root/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 \ | ||
| && cmake --build build --target install -j64 | ||
|
|
||
| ARG DEEPEP_COMMIT=b6ce310bb0b75079682d09bc2ebc063a074fbd58 | ||
| RUN git clone https://github.com/deepseek-ai/DeepEP.git && cd DeepEP && git checkout ${DEEPEP_COMMIT} && cd .. | ||
|
|
||
| WORKDIR /root/DeepEP | ||
| ENV NVSHMEM_DIR=/root/nvshmem/install | ||
| RUN NVSHMEM_DIR=/root/nvshmem/install python setup.py install | ||
|
|
||
| RUN apt-get update && apt-get install -y cmake automake autotools-dev libtool libz-dev && \ | ||
| DEBIAN_FRONTEND=noninteractive apt-get -y install --reinstall libibverbs-dev rdma-core ibverbs-utils libibumad-dev; \ | ||
| rm -rf /usr/lib/ucx && \ | ||
| rm -rf /opt/hpcx/ucx && \ | ||
| cd /usr/local/src && \ | ||
| git clone https://github.com/openucx/ucx.git && \ | ||
| cd ucx && \ | ||
| git checkout v1.19.x && \ | ||
| ./autogen.sh && ./configure \ | ||
| --enable-shared \ | ||
| --disable-static \ | ||
| --disable-doxygen-doc \ | ||
| --enable-optimizations \ | ||
| --enable-cma \ | ||
| --enable-devel-headers \ | ||
| --with-cuda=/usr/local/cuda \ | ||
| --with-verbs=yes \ | ||
| --with-dm \ | ||
| --with-gdrcopy=/usr/local \ | ||
| --with-efa \ | ||
| --enable-mt && \ | ||
| make -j && \ | ||
| make -j install-strip && \ | ||
| ldconfig; | ||
|
|
||
| RUN apt-get update && apt-get install -y pkg-config tmux net-tools ; \ | ||
| cd /usr/local/src; \ | ||
| pip install --upgrade meson pybind11 patchelf; \ | ||
| git clone https://github.com/ai-dynamo/nixl.git -b main && \ | ||
| cd nixl && \ | ||
| rm -rf build && \ | ||
| mkdir build && \ | ||
| meson setup build/ --prefix=/usr/local/nixl --buildtype=release && \ | ||
| cd build && \ | ||
| ninja && \ | ||
| ninja install && \ | ||
| cd .. && pip install . --no-deps; | ||
|
|
||
| COPY . /lightllm | ||
| RUN pip install -e /lightllm --no-cache-dir | ||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -96,6 +96,14 @@ def alloc_kv_move_buffer(self, max_req_total_len): | |||||||||
| self.token_dim_size = self.kv_move_buffer.shape[-2] * self.kv_move_buffer.shape[-1] | ||||||||||
| return | ||||||||||
|
|
||||||||||
| def alloc_paged_kv_move_buffer(self, page_num, page_size): | ||||||||||
| if isinstance(self, MemoryManager) and type(self) != MemoryManager: | ||||||||||
| raise NotImplementedError("subclass need reimpl this method") | ||||||||||
|
Comment on lines
+100
to
+101
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The check Using Python's
Suggested change
|
||||||||||
| self.kv_move_buffer = torch.empty( | ||||||||||
| (page_num, page_size, self.layer_num, 2 * self.head_num, self.head_dim), dtype=self.dtype, device="cuda" | ||||||||||
| ) | ||||||||||
| return | ||||||||||
|
|
||||||||||
| def send_to_decode_node( | ||||||||||
| self, | ||||||||||
| move_tasks: List[KVMoveTask], | ||||||||||
|
|
||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| from .sampling_params import SamplingParams | ||
| from .req import Req, FinishStatus | ||
| from .req import Req, FinishStatus, PDNIXLChunkedPrefillReq | ||
| from .shm_req_manager import ShmReqManager | ||
| from .rpc_shm import RpcShmParams, RpcShmResults, ShmSyncStatusArray | ||
| from .start_args_type import StartArgs |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,8 @@ | ||
| from .group_req import GroupReqIndexes, GroupReqObjs, AbortedReqCmd, StopStrMatchedReqCmd | ||
| from .group_req import ( | ||
| GroupReqIndexes, | ||
| GroupReqObjs, | ||
| AbortedReqCmd, | ||
| StopStrMatchedReqCmd, | ||
| NIXLRemotePrefillDoneCmd, | ||
| ReqCmd, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This
RUNcommand has a couple of issues that go against Docker best practices:chmod 777 -R /tmp: This is insecure as it gives world-writable permissions to the/tmpdirectory.apt-get updatecalls: This Dockerfile contains multipleRUN apt-get updatecommands (here and on lines 42, 44, 69). This is inefficient and can lead to caching problems.It's recommended to address these for a more secure and efficient Docker image.