Skip to content

Commit ebae5a6

Browse files
abrichrclaude
andauthored
fix: remove --entrypoint override so evaluate_server.py starts automatically (#133)
The docker run commands used --entrypoint /bin/bash which overrode the Dockerfile ENTRYPOINT (start_with_evaluate.sh). This prevented evaluate_server.py from starting on port 5050, making /evaluate and /task/<id> endpoints unavailable. Fix: remove --entrypoint, pass entry.sh as a command argument instead. Also publish port 5050 in all three docker run locations. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e45c2b5 commit ebae5a6

1 file changed

Lines changed: 8 additions & 8 deletions

File tree

openadapt_evals/benchmarks/vm_cli.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1722,8 +1722,8 @@ def cmd_start(args):
17221722
getattr(args, "som_origin", "oss")
17231723
getattr(args, "a11y_backend", "uia")
17241724

1725-
# The vanilla windowsarena/winarena:latest image uses --entrypoint /bin/bash
1726-
# and requires entry.sh as the command argument
1725+
# Our waa-auto:latest image uses a custom ENTRYPOINT (start_with_evaluate.sh)
1726+
# that starts evaluate_server.py on port 5050 before running entry.sh
17271727
docker_cmd = f"""docker run -d \\
17281728
--name winarena \\
17291729
--device=/dev/kvm \\
@@ -1737,9 +1737,9 @@ def cmd_start(args):
17371737
-e RAM_SIZE={ram_size} \\
17381738
-e CPU_CORES={cpu_cores} \\
17391739
-e DISK_SIZE=64G \\
1740-
--entrypoint /bin/bash \\
1740+
-p 5050:5050 \\
17411741
{DOCKER_IMAGE} \\
1742-
-c './entry.sh --prepare-image false --start-client false'"""
1742+
./entry.sh --prepare-image false --start-client false"""
17431743
# Note: --start-client false means just boot Windows + Flask server
17441744
# The benchmark client is started separately by the 'run' command
17451745

@@ -1861,9 +1861,9 @@ def cmd_test_golden_image(args):
18611861
-e RAM_SIZE={ram_size} \\
18621862
-e CPU_CORES={cpu_cores} \\
18631863
-e DISK_SIZE=64G \\
1864-
--entrypoint /bin/bash \\
1864+
-p 5050:5050 \\
18651865
{DOCKER_IMAGE} \\
1866-
-c './entry.sh --prepare-image false --start-client false'"""
1866+
./entry.sh --prepare-image false --start-client false"""
18671867

18681868
result = ssh_run(ip, docker_cmd)
18691869
if result.returncode != 0:
@@ -4943,9 +4943,9 @@ class CreateArgs:
49434943
-e RAM_SIZE={ram_size} \\
49444944
-e CPU_CORES={cpu_cores} \\
49454945
-e DISK_SIZE=64G \\
4946-
--entrypoint /bin/bash \\
4946+
-p 5050:5050 \\
49474947
waa-auto:latest \\
4948-
-c './entry.sh --prepare-image false --start-client false'"""
4948+
./entry.sh --prepare-image false --start-client false"""
49494949
# Note: --start-client false for setup - just boot Windows + Flask server
49504950
# Azure ML compute instances run the benchmark separately via run_entry.py
49514951

0 commit comments

Comments
 (0)