Skip to content

Commit 687ceea

Browse files
committed
minor
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
1 parent 3036e8f commit 687ceea

File tree

3 files changed

+14
-5
lines changed

3 files changed

+14
-5
lines changed

examples/vllm_serve/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This is a simple example to demonstrate calibrating and serving ModelOpt fakequa
44

55
Compared with realquant, fakequant is 2-5x slower, but doesn't require dedicated kernel support and facilitates research.
66

7-
This example is tested with vllm 0.9.0 and 0.11.2
7+
This example is tested with vllm 0.9.0 and 0.19.1
88

99
## Prepare environment
1010

examples/vllm_serve/fakequant_worker.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -133,11 +133,14 @@ def determine_available_memory(self) -> int:
133133
with disable_compilation(model):
134134
return super().determine_available_memory()
135135

136-
def compile_or_warm_up_model(self) -> None:
136+
def compile_or_warm_up_model(self) -> float:
137137
if (
138138
quant_config["quant_cfg"]
139139
or quant_config["kv_quant_cfg"]
140140
or quant_config["modelopt_state_path"]
141141
):
142142
_fakequant_run_prolog_worker(self)
143-
super().compile_or_warm_up_model()
143+
# Must return the base worker's compilation time (seconds). Returning None
144+
# breaks vLLM V1 executor: initialize_from_config does max(compilation_times)
145+
# across TP workers.
146+
return super().compile_or_warm_up_model()

examples/vllm_serve/vllm_serve_fakequant.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,6 @@
6666
from vllm.utils import FlexibleArgumentParser
6767
else:
6868
from vllm.utils.argparse_utils import FlexibleArgumentParser
69-
from vllm.v1.executor.ray_executor import RayDistributedExecutor
7069

7170

7271
# Adding the envs you want to pass to the workers
@@ -81,7 +80,14 @@
8180
"TRUST_REMOTE_CODE",
8281
}
8382

84-
RayDistributedExecutor.ADDITIONAL_ENV_VARS.update(additional_env_vars)
83+
if vllm_version <= version.parse("0.11.0"):
84+
RayDistributedExecutor.ADDITIONAL_ENV_VARS.update(additional_env_vars)
85+
else:
86+
from vllm.platforms import current_platform
87+
88+
for _name in additional_env_vars:
89+
if _name not in current_platform.additional_env_vars:
90+
current_platform.additional_env_vars.append(_name)
8591

8692

8793
def main():

0 commit comments

Comments
 (0)