Skip to content

Commit d5356e0

Browse files
MaxTretikovtohtana
andauthored
Auto-detect CUTLASS for EvoformerAttention (#8000)
DS4Sci EvoformerAttention currently depends on CUTLASS, but requiring users to manually set `CUTLASS_PATH` creates unnecessary friction for an otherwise standard extension build flow. This change makes CUTLASS discovery automatic while preserving `CUTLASS_PATH` as the explicit override. The discovery approach is based on PyTorch's CUDA detection pattern in `torch.utils.cpp_extension`: honor the explicit environment variable first, then infer from installed packages and conventional filesystem locations, and only fail with an actionable message when discovery cannot succeed. This improves first-run usability, CI behavior, editable installs, and package-based environments where CUTLASS may already be installed in a discoverable location. It also reduces setup divergence between users who clone CUTLASS manually and users who install NVIDIA's `nvidia-cutlass` package. DeepSpeed should already have had this because EvoformerAttention is part of DeepSpeed's extension-builder system, and extension builders should locate common build dependencies using predictable heuristics instead of requiring users to export paths manually. CUDA itself is not treated as "you must always set CUDA_HOME"; PyTorch attempts discovery first and uses the env var as a fallback. CUTLASS should follow the same principle here. --------- Signed-off-by: Max Tretikov <max@tretikov.com> Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Co-authored-by: Masahiro Tanaka <mtanaka@anyscale.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
1 parent b01a091 commit d5356e0

4 files changed

Lines changed: 199 additions & 37 deletions

File tree

docs/_tutorials/ds4sci_evoformerattention.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,13 @@ tags: training inference
1515

1616
`DS4Sci_EvoformerAttention` is released as part of DeepSpeed >= 0.10.3.
1717

18-
`DS4Sci_EvoformerAttention` is implemented based on [CUTLASS](https://github.com/NVIDIA/cutlass). You need to clone the CUTLASS repository and specify the path to it in the environment variable `CUTLASS_PATH`.
18+
`DS4Sci_EvoformerAttention` is implemented based on [CUTLASS](https://github.com/NVIDIA/cutlass). DeepSpeed automatically looks for CUTLASS in the [nvidia-cutlass](https://pypi.org/project/nvidia-cutlass/) Python package, Python environment and CMake prefixes, compiler include path environment variables, a `cutlass` checkout next to DeepSpeed or in the current working directory, and common system install prefixes such as `/usr/local`.
1919
CUTLASS setup detection can be ignored by setting ```CUTLASS_PATH="DS_IGNORE_CUTLASS_DETECTION"```, which is useful if you have a well setup compiler (e.g., compiling in a conda package with cutlass and the cuda compilers installed).
20-
CUTLASS location can be automatically inferred using pypi's [nvidia-cutlass](https://pypi.org/project/nvidia-cutlass/) package by setting ```CUTLASS_PATH="DS_USE_CUTLASS_PYTHON_BINDINGS"```. Note that this is discouraged as ```nvidia-cutlass``` is not maintained anymore and outdated.
20+
If automatic detection does not find the intended installation, set `CUTLASS_PATH` to either the CUTLASS checkout root or its `include` directory.
2121

22-
You can always simply clone cutlass and setup ```CUTLASS_PATH```:
22+
You can always simply clone cutlass next to DeepSpeed:
2323
```shell
2424
git clone https://github.com/NVIDIA/cutlass
25-
export CUTLASS_PATH=/path/to/cutlass
2625
```
2726
The kernels will be compiled when `DS4Sci_EvoformerAttention` is called for the first time.
2827

@@ -43,7 +42,6 @@ Evoformer now supports mixed-architecture packaging directly via
4342
Example:
4443

4544
```shell
46-
CUTLASS_PATH=/path/to/cutlass \
4745
TORCH_CUDA_ARCH_LIST='7.0;8.0' \
4846
DS_BUILD_OPS=0 DS_BUILD_EVOFORMER_ATTN=1 \
4947
pip install -e .

op_builder/evoformer_attn.py

Lines changed: 123 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,23 @@
44
# DeepSpeed Team
55

66
from .builder import CUDAOpBuilder, installed_cuda_version
7+
import importlib
78
import os
89
from pathlib import Path
10+
import sys
911

1012

1113
class EvoformerAttnBuilder(CUDAOpBuilder):
1214
BUILD_VAR = "DS_BUILD_EVOFORMER_ATTN"
1315
NAME = "evoformer_attn"
16+
CUTLASS_IGNORE = "DS_IGNORE_CUTLASS_DETECTION"
17+
CUTLASS_PYTHON_BINDINGS = "DS_USE_CUTLASS_PYTHON_BINDINGS"
1418

1519
def __init__(self, name=None):
1620
name = self.NAME if name is None else name
1721
super().__init__(name=name)
1822
self.cutlass_path = os.environ.get("CUTLASS_PATH")
23+
self._resolved_cutlass_path = None
1924

2025
def absolute_name(self):
2126
return f"deepspeed.ops.{self.NAME}_op"
@@ -57,21 +62,20 @@ def is_compatible(self, verbose=False):
5762
self.warning("Please install torch if trying to pre-compile kernels")
5863
return False
5964

60-
if self.cutlass_path is None:
61-
if verbose:
62-
self.warning("Please specify CUTLASS location directory as environment variable CUTLASS_PATH")
63-
self.warning(
64-
"Possible values are: a path, DS_IGNORE_CUTLASS_DETECTION and DS_USE_CUTLASS_PYTHON_BINDINGS")
65-
return False
66-
67-
if self.cutlass_path != "DS_IGNORE_CUTLASS_DETECTION":
65+
if self.cutlass_path != self.CUTLASS_IGNORE:
6866
try:
6967
self.include_paths()
70-
except (RuntimeError, ImportError):
68+
except (RuntimeError, ImportError) as exc:
69+
if verbose:
70+
self.warning(str(exc))
7171
return False
7272
# Check version in case it is a CUTLASS_PATH points to a CUTLASS checkout
73-
if os.path.exists(f"{self.cutlass_path}/CHANGELOG.md"):
74-
with open(f"{self.cutlass_path}/CHANGELOG.md", "r") as f:
73+
if self._resolved_cutlass_path is not None:
74+
changelog_path = self._resolved_cutlass_path / "CHANGELOG.md"
75+
else:
76+
changelog_path = None
77+
if changelog_path is not None and changelog_path.exists():
78+
with open(changelog_path, "r") as f:
7579
if "3.1.0" not in f.read():
7680
if verbose:
7781
self.warning("Please use CUTLASS version >= 3.1.0")
@@ -94,26 +98,114 @@ def is_compatible(self, verbose=False):
9498
cuda_okay = False
9599
return super().is_compatible(verbose) and cuda_okay
96100

101+
@staticmethod
102+
def _repo_root():
103+
return Path(__file__).resolve().parents[1]
104+
105+
@staticmethod
106+
def _dedupe_paths(paths):
107+
deduped = []
108+
seen = set()
109+
for path in paths:
110+
path = Path(path).expanduser()
111+
key = str(path)
112+
if key not in seen:
113+
seen.add(key)
114+
deduped.append(path)
115+
return deduped
116+
117+
@staticmethod
118+
def _env_paths(*names):
119+
paths = []
120+
for name in names:
121+
value = os.environ.get(name)
122+
if not value:
123+
continue
124+
paths.extend(Path(path) for path in value.split(os.pathsep) if path)
125+
return paths
126+
127+
@staticmethod
128+
def _python_package_cutlass_paths():
129+
try:
130+
cutlass_library = importlib.import_module("cutlass_library")
131+
except ImportError:
132+
return []
133+
134+
candidates = []
135+
source_path = getattr(cutlass_library, "source_path", None)
136+
if source_path is not None:
137+
candidates.append(Path(source_path))
138+
139+
package_file = getattr(cutlass_library, "__file__", None)
140+
if package_file is not None:
141+
package_dir = Path(package_file).resolve().parent
142+
candidates.extend([package_dir / "source", package_dir.parent, package_dir])
143+
return candidates
144+
145+
def _candidate_cutlass_paths(self):
146+
if self.cutlass_path == self.CUTLASS_PYTHON_BINDINGS:
147+
candidates = self._python_package_cutlass_paths()
148+
if candidates:
149+
return candidates
150+
self.warning("Please pip install nvidia-cutlass")
151+
raise ImportError("Unable to locate CUTLASS from the nvidia-cutlass Python package")
152+
153+
if self.cutlass_path:
154+
return [Path(self.cutlass_path)]
155+
156+
repo_root = self._repo_root()
157+
python_prefixes = self._dedupe_paths([Path(sys.prefix), Path(sys.exec_prefix), Path(sys.base_prefix)])
158+
prefix_paths = self._env_paths("CUTLASS_ROOT", "CUTLASS_HOME", "CONDA_PREFIX", "VIRTUAL_ENV",
159+
"CMAKE_PREFIX_PATH", "CUDA_HOME", "CUDA_PATH")
160+
include_paths = self._env_paths("CPATH", "CPLUS_INCLUDE_PATH", "C_INCLUDE_PATH")
161+
162+
return self._dedupe_paths([
163+
*self._python_package_cutlass_paths(),
164+
*prefix_paths,
165+
*python_prefixes,
166+
*include_paths,
167+
Path.cwd() / "cutlass",
168+
repo_root / "cutlass",
169+
repo_root.parent / "cutlass",
170+
Path("/usr/local/cutlass"),
171+
Path("/opt/cutlass"),
172+
Path("/usr/local"),
173+
Path("/usr"),
174+
])
175+
176+
@staticmethod
177+
def _cutlass_include_dirs(cutlass_path):
178+
cutlass_path = cutlass_path.expanduser().resolve()
179+
if not cutlass_path.is_dir():
180+
return []
181+
182+
if (cutlass_path / "include" / "cutlass" / "cutlass.h").is_file():
183+
include_root = cutlass_path / "include"
184+
util_include = cutlass_path / "tools" / "util" / "include"
185+
elif (cutlass_path / "cutlass" / "cutlass.h").is_file():
186+
include_root = cutlass_path
187+
util_include = cutlass_path.parent / "tools" / "util" / "include"
188+
else:
189+
return []
190+
191+
include_dirs = [include_root]
192+
if util_include.is_dir():
193+
include_dirs.append(util_include)
194+
return [str(include_dir) for include_dir in include_dirs]
195+
97196
def include_paths(self):
98197
# Assume the user knows best and CUTLASS location is already setup externally
99-
if self.cutlass_path == "DS_IGNORE_CUTLASS_DETECTION":
198+
if self.cutlass_path == self.CUTLASS_IGNORE:
100199
return []
101-
# Use header files vendored with deprecated python packages
102-
if self.cutlass_path == "DS_USE_CUTLASS_PYTHON_BINDINGS":
103-
try:
104-
import cutlass_library
105-
cutlass_path = Path(cutlass_library.__file__).parent / "source"
106-
except ImportError:
107-
self.warning("Please pip install nvidia-cutlass (note that this is deprecated and likely outdated)")
108-
raise
109-
# Use hardcoded path in CUTLASS_PATH
110-
else:
111-
cutlass_path = Path(self.cutlass_path)
112-
cutlass_path = cutlass_path.resolve()
113-
if not cutlass_path.is_dir():
114-
raise RuntimeError(f"CUTLASS_PATH {cutlass_path} does not exist")
115-
include_dirs = cutlass_path / "include", cutlass_path / "tools" / "util" / "include"
116-
include_dirs = [str(include_dir) for include_dir in include_dirs if include_dir.is_dir()]
117-
if not include_dirs:
118-
raise RuntimeError(f"CUTLASS_PATH {cutlass_path} does not contain any include directories")
119-
return include_dirs
200+
201+
for cutlass_path in self._candidate_cutlass_paths():
202+
include_dirs = self._cutlass_include_dirs(cutlass_path)
203+
if include_dirs:
204+
self._resolved_cutlass_path = cutlass_path.expanduser().resolve()
205+
return include_dirs
206+
207+
if self.cutlass_path:
208+
raise RuntimeError(f"CUTLASS_PATH {self.cutlass_path} does not contain CUTLASS headers")
209+
210+
raise RuntimeError("Unable to locate CUTLASS. Install nvidia-cutlass, clone CUTLASS next to DeepSpeed, "
211+
"or set CUTLASS_PATH to the CUTLASS checkout.")

tests/benchmarks/DS4Sci_EvoformerAttention_bench.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
This script is to test the performance of the DS4Sci_EvoformerAttention op.
77
To run the script,
88
1. Clone the CUTLASS repo. E.g. git clone https://github.com/NVIDIA/cutlass.git
9-
2. Specify the CUTLASS_PATH environment variable. E.g. export CUTLASS_PATH=$(pwd)/cutlass
9+
2. DeepSpeed will detect a local or installed CUTLASS. If needed, set CUTLASS_PATH explicitly.
1010
3. Run the script. E.g. python DS4Sci_EvoformerAttention_bench.py
1111
"""
1212

tests/unit/ops/deepspeed4science/test_evoformer_attn_builder.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,22 @@
66
from pathlib import Path
77
from unittest.mock import patch
88

9+
import pytest
10+
911
from deepspeed.ops.op_builder.builder import CUDAOpBuilder
1012
# Import the concrete builder class instead of the accelerator-dispatched alias.
1113
from deepspeed.ops.op_builder.evoformer_attn import EvoformerAttnBuilder
1214

1315

16+
def make_cutlass_checkout(path):
17+
include_dir = path / "include" / "cutlass"
18+
include_dir.mkdir(parents=True)
19+
(include_dir / "cutlass.h").write_text("// cutlass marker\n")
20+
util_dir = path / "tools" / "util" / "include"
21+
util_dir.mkdir(parents=True)
22+
return path
23+
24+
1425
def test_filter_ccs_removes_below_70_and_keeps_ptx_suffix():
1526
builder = EvoformerAttnBuilder()
1627
result = builder.filter_ccs(["6.0", "6.1", "7.0", "8.0+PTX"])
@@ -44,3 +55,64 @@ def test_no_cuda_arch_in_checkarch():
4455
end = text.index("};", start) + 2
4556
block = text[start:end]
4657
assert "__CUDA_ARCH__" not in block
58+
59+
60+
def test_include_paths_uses_cutlass_path_env(tmp_path):
61+
cutlass_path = make_cutlass_checkout(tmp_path / "cutlass")
62+
63+
with patch.dict("os.environ", {"CUTLASS_PATH": str(cutlass_path)}, clear=False):
64+
builder = EvoformerAttnBuilder()
65+
66+
assert builder.include_paths() == [
67+
str(cutlass_path / "include"),
68+
str(cutlass_path / "tools" / "util" / "include"),
69+
]
70+
71+
72+
def test_include_paths_finds_python_package_candidate_without_env(tmp_path):
73+
cutlass_path = make_cutlass_checkout(tmp_path / "python_package_cutlass")
74+
75+
with patch.dict("os.environ", {}, clear=True):
76+
builder = EvoformerAttnBuilder()
77+
78+
with patch.object(EvoformerAttnBuilder, "_python_package_cutlass_paths", return_value=[cutlass_path]):
79+
assert builder.include_paths()[0] == str(cutlass_path / "include")
80+
81+
82+
def test_include_paths_finds_cutlass_from_cmake_prefix_path(tmp_path):
83+
cutlass_path = make_cutlass_checkout(tmp_path / "prefix")
84+
85+
with patch.dict("os.environ", {"CMAKE_PREFIX_PATH": str(cutlass_path)}, clear=True):
86+
builder = EvoformerAttnBuilder()
87+
with patch.object(EvoformerAttnBuilder, "_python_package_cutlass_paths", return_value=[]):
88+
assert builder.include_paths()[0] == str(cutlass_path / "include")
89+
90+
91+
def test_include_paths_finds_cutlass_from_compiler_include_path(tmp_path):
92+
cutlass_path = make_cutlass_checkout(tmp_path / "prefix")
93+
94+
with patch.dict("os.environ", {"CPATH": str(cutlass_path / "include")}, clear=True):
95+
builder = EvoformerAttnBuilder()
96+
with patch.object(EvoformerAttnBuilder, "_python_package_cutlass_paths", return_value=[]):
97+
assert builder.include_paths()[0] == str(cutlass_path / "include")
98+
99+
100+
def test_include_paths_accepts_cutlass_include_dir_directly(tmp_path):
101+
cutlass_path = make_cutlass_checkout(tmp_path / "cutlass")
102+
103+
with patch.dict("os.environ", {"CUTLASS_PATH": str(cutlass_path / "include")}, clear=False):
104+
builder = EvoformerAttnBuilder()
105+
106+
assert builder.include_paths() == [
107+
str(cutlass_path / "include"),
108+
str(cutlass_path / "tools" / "util" / "include"),
109+
]
110+
111+
112+
def test_include_paths_reports_missing_cutlass(tmp_path):
113+
with patch.dict("os.environ", {}, clear=True):
114+
builder = EvoformerAttnBuilder()
115+
116+
with patch.object(builder, "_candidate_cutlass_paths", return_value=[tmp_path / "missing"]):
117+
with pytest.raises(RuntimeError, match="Unable to locate CUTLASS"):
118+
builder.include_paths()

0 commit comments

Comments
 (0)