Skip to content

Commit 29ac412

Browse files
authored
benchmark_serving: fail run when request failure rate exceeds 5% (#1379)
Gate the benchmark after results are written so the artifact still uploads, then exit non-zero if (num_prompts - completed) / num_prompts > 0.05. Surfaces partial-failure runs that currently get reported as successful jobs.
1 parent 766b097 commit 29ac412

1 file changed

Lines changed: 10 additions & 0 deletions

File tree

utils/bench_serving/benchmark_serving.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -900,6 +900,16 @@ def main(args: argparse.Namespace):
900900
json.dump(result_json, outfile)
901901
save_to_pytorch_benchmark_format(args, result_json, file_name)
902902

903+
max_failure_rate = 0.05
904+
completed = benchmark_result["completed"]
905+
failure_rate = 1 - completed / args.num_prompts
906+
if failure_rate > max_failure_rate:
907+
raise SystemExit(
908+
f"FAIL: request failure rate {failure_rate:.1%} exceeds "
909+
f"{max_failure_rate:.0%} threshold "
910+
f"({completed}/{args.num_prompts} completed)"
911+
)
912+
903913

904914
if __name__ == "__main__":
905915
parser = FlexibleArgumentParser(

0 commit comments

Comments
 (0)