fix(runtime): inject CPU/memory/port defaults; skip endpoints on --no-wait; silence SDK validation warnings

Sodawyx · claude · Sodawyx · commit 4b96e87482ef · 2026-05-20T10:54:48.000+08:00
Three problems surfaced when running the README's minimal example
`ar runtime apply -f runtime.yaml`:

1. HTTP 400 "CPU is required; Memory is required; Port is required"
   The CLI passed cpu=null/memory=null/port=null through to the SDK
   even though the docs already promised default 2 cores / 4096 MB /
   9000. Defaults were never actually applied.

   Add DEFAULT_CPU / DEFAULT_MEMORY_MB / DEFAULT_PORT in
   runtime_constants and inject them in to_runtime_create_input /
   to_runtime_update_input. `spec.container.port` keeps its documented
   precedence over `spec.port`, both fall back to DEFAULT_PORT.

2. HTTP 400 "runtime must be in READY status to create endpoints"
   under --no-wait
   apply_cmd unconditionally called reconcile_endpoints after
   reconcile_runtime — fine under --wait (we'd already polled runtime
   to READY), but under --no-wait the runtime is still CREATING and the
   backend rejects endpoint create.

   Gate reconcile_endpoints + poll_many_parallel on `wait`. Under
   --no-wait we just submit the runtime; an interactive run prints a
   stderr notice telling the user to re-apply once the runtime is
   READY (TTY-only so it doesn't pollute scripted JSON output).

3. SDK pydantic warning spam
   Every `list_all()` call deserializes every runtime in the workspace,
   and the SDK emits "validate type failed" WARNINGs whenever a
   server-side record doesn't match its current schema (other people's
   runtimes with codeConfiguration.language=java17, empty
   logConfiguration, etc.). A single apply emitted ~10 lines of noise.

   Install a logging.Filter on the `agentrun-logger` logger that drops
   exactly the "validate type failed" message. `--debug` removes the
   filter so debugging shows full logs.

Docs:
- runtime.md (en + zh) apply Options table: document the new --no-wait
  semantics; add a paragraph explaining the auto-injected resource
  defaults.

Tests:
- test_create_input_user_values_override_defaults — explicit values win.
- test_create_input_container_port_wins_over_spec_port — precedence.
- test_update_input_applies_same_defaults — symmetry with create.
- Existing test_create_input_injects_system_tag_and_container_artifact
  now also asserts cpu=2.0 / memory=4096 / port=9000.
- test_apply_create_happy_path tightened: under --no-wait,
  create_endpoint MUST NOT be called and endpoints list must be empty.
- test_apply_update_path tightened: under --wait, create_endpoint IS
  called after the runtime reaches READY.

Local gate: ruff + mypy clean, 525/525 tests pass, coverage 95.25%.

Signed-off-by: Sodawyx &lt;sodawyx@126.com&gt;
Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
Signed-off-by: Sodawyx &lt;sodawyx@126.com&gt;
diff --git a/docs/en/runtime.md b/docs/en/runtime.md
@@ -43,10 +43,14 @@ ar runtime apply -f FILE [--wait/--no-wait] [--timeout DURATION]
 | Flag | Type | Required | Default | Description |
 |------|------|----------|---------|-------------|
 | `-f`, `--file` | path | yes |  | YAML file path (supports multi-document). |
-| `--wait/--no-wait` | flag | no | `--wait` | Poll runtime + endpoints to final status. |
+| `--wait/--no-wait` | flag | no | `--wait` | Poll runtime + endpoints to final status. Under `--no-wait` the runtime is submitted but **endpoints are not reconciled** — the backend rejects endpoint create/update while the runtime is still `CREATING`/`UPDATING`. Re-run apply once it reaches `READY`. |
 | `--timeout` | duration | no | `10m` | Polling timeout. Accepts `Ns`, `Nm`, `Nh`, or bare seconds. |
 | `--prune-endpoints/--no-prune-endpoints` | flag | no | `--prune-endpoints` | Delete remote endpoints absent from the YAML. |
 
+The CLI injects sensible defaults for `cpu` (2 cores), `memory` (4096 MB) and
+`port` (9000) when the YAML omits them — the backend rejects null values for
+these three fields with HTTP 400.
+
 ### Examples
 
 ```bash
diff --git a/docs/zh/runtime.md b/docs/zh/runtime.md
@@ -41,10 +41,13 @@ ar runtime apply -f FILE [--wait/--no-wait] [--timeout DURATION]
 | Flag | Type | Required | Default | Description |
 |------|------|----------|---------|-------------|
 | `-f`, `--file` | path | yes |  | YAML 文件路径（支持多文档）。 |
-| `--wait/--no-wait` | flag | no | `--wait` | 轮询 runtime + endpoints 到终态。 |
+| `--wait/--no-wait` | flag | no | `--wait` | 轮询 runtime + endpoints 到终态。`--no-wait` 时仅提交 runtime 创建/更新，**不会 reconcile endpoint** —— 后端在 runtime 处于 `CREATING`/`UPDATING` 时会拒绝 endpoint create/update。等 runtime 到 `READY` 后再 apply 一次即可。 |
 | `--timeout` | duration | no | `10m` | 轮询超时。支持 `Ns` / `Nm` / `Nh` 或裸秒数。 |
 | `--prune-endpoints/--no-prune-endpoints` | flag | no | `--prune-endpoints` | 删除远端存在但 YAML 缺失的 endpoint。 |
 
+YAML 中省略 `cpu` / `memory` / `port` 时，CLI 会自动注入合理默认值（2 核 /
+4096 MB / 9000）—— 后端对这三个字段的 null 会回复 HTTP 400。
+
 ### Examples
 
 ```bash
diff --git a/src/agentrun_cli/_utils/runtime_constants.py b/src/agentrun_cli/_utils/runtime_constants.py
@@ -14,6 +14,13 @@
 DEFAULT_ENDPOINT_NAME = "default"
 DEFAULT_TARGET_VERSION = "LATEST"
 
+# Resource defaults — the backend rejects CreateAgentRuntime with HTTP 400
+# "CPU is required; Memory is required; Port is required" when these are null.
+# Injecting them in the render layer keeps the minimal YAML example runnable.
+DEFAULT_CPU = 2.0  # cores
+DEFAULT_MEMORY_MB = 4096
+DEFAULT_PORT = 9000
+
 POLL_INITIAL_INTERVAL = 3.0  # seconds
 POLL_MAX_INTERVAL = 10.0  # seconds (cap of exponential backoff)
 POLL_BACKOFF_FACTOR = 1.5
diff --git a/src/agentrun_cli/_utils/runtime_render.py b/src/agentrun_cli/_utils/runtime_render.py
@@ -17,7 +17,10 @@
 )
 from agentrun_cli._utils.runtime_constants import (
     ARTIFACT_TYPE_CONTAINER,
+    DEFAULT_CPU,
     DEFAULT_ENDPOINT_NAME,
+    DEFAULT_MEMORY_MB,
+    DEFAULT_PORT,
     DEFAULT_TARGET_VERSION,
     SYSTEM_TAG_CLI,
 )
@@ -119,6 +122,16 @@ def _build_container(p: ParsedContainer, m):
     )
 
 
+def _resolve_port(p: ParsedAgentRuntime) -> int:
+    """container.port > spec.port > DEFAULT_PORT — matches the documented
+    precedence and prevents the backend's 'Port is required' 400."""
+    if p.container.port is not None:
+        return p.container.port
+    if p.port is not None:
+        return p.port
+    return DEFAULT_PORT
+
+
 def to_runtime_create_input(p: ParsedAgentRuntime):
     m = _sdk_models()
     return m["create_input"](
@@ -129,9 +142,9 @@ def to_runtime_create_input(p: ParsedAgentRuntime):
         artifact_type=ARTIFACT_TYPE_CONTAINER,
         system_tags=[SYSTEM_TAG_CLI],
         container_configuration=_build_container(p.container, m),
-        cpu=p.cpu,
-        memory=p.memory,
-        port=p.port,
+        cpu=p.cpu if p.cpu is not None else DEFAULT_CPU,
+        memory=p.memory if p.memory is not None else DEFAULT_MEMORY_MB,
+        port=_resolve_port(p),
         disk_size=p.disk_size,
         enable_session_isolation=p.enable_session_isolation,
         protocol_configuration=_build_protocol(p.protocol, m),
@@ -157,9 +170,9 @@ def to_runtime_update_input(p: ParsedAgentRuntime):
         artifact_type=ARTIFACT_TYPE_CONTAINER,
         system_tags=[SYSTEM_TAG_CLI],
         container_configuration=_build_container(p.container, m),
-        cpu=p.cpu,
-        memory=p.memory,
-        port=p.port,
+        cpu=p.cpu if p.cpu is not None else DEFAULT_CPU,
+        memory=p.memory if p.memory is not None else DEFAULT_MEMORY_MB,
+        port=_resolve_port(p),
         disk_size=p.disk_size,
         enable_session_isolation=p.enable_session_isolation,
         protocol_configuration=_build_protocol(p.protocol, m),
diff --git a/src/agentrun_cli/commands/runtime/apply_cmd.py b/src/agentrun_cli/commands/runtime/apply_cmd.py
@@ -116,21 +116,27 @@ def apply_cmd(ctx, file_path, wait, timeout, prune_endpoints):
         rt_res = reconcile_runtime(parsed, client=runtime_cls)
         runtime = rt_res.runtime
 
+        ep_actions: list = []
         if wait:
             poll_until_final(
                 runtime,
                 resource_kind="AgentRuntime",
                 cfg=poll_cfg,
                 on_tick=lambda r, e, p=parsed: _progress(sys.stderr, p, r, e),
             )
+            # Endpoint create/update is rejected by the backend with HTTP 400
+            # ("runtime must be in READY status") whenever the runtime isn't
+            # READY yet — so we only reconcile endpoints after the runtime has
+            # reached a final status. Under --no-wait the runtime is still in
+            # CREATING/UPDATING when we return, so we skip endpoint
+            # reconciliation entirely and the user can re-run apply once the
+            # runtime is READY.
+            ep_actions = reconcile_endpoints(
+                runtime,
+                desired=parsed.endpoints,
+                prune=prune_endpoints,
+            )
 
-        ep_actions = reconcile_endpoints(
-            runtime,
-            desired=parsed.endpoints,
-            prune=prune_endpoints,
-        )
-
-        if wait:
             in_flight = [
                 a.endpoint
                 for a in ep_actions
@@ -143,6 +149,12 @@ def apply_cmd(ctx, file_path, wait, timeout, prune_endpoints):
                 concurrency=ENDPOINT_POLL_CONCURRENCY,
                 on_tick=lambda r, e, p=parsed: _progress(sys.stderr, p, r, e),
             )
+        elif sys.stderr.isatty():
+            sys.stderr.write(
+                f"[runtime {parsed.name}] --no-wait: runtime submitted; "
+                "endpoints will be reconciled on a subsequent apply once the "
+                "runtime reaches READY.\n"
+            )
 
         results.append(
             {
diff --git a/src/agentrun_cli/main.py b/src/agentrun_cli/main.py
@@ -10,6 +10,7 @@
     agentrun super-agent run
 """
 
+import logging
 import os
 
 import click
@@ -26,6 +27,24 @@
 from agentrun_cli.commands.tool_cmd import tool_group
 
 
+class _DropSdkValidationWarnings(logging.Filter):
+    """Drop the SDK's pydantic 'validate type failed' WARNINGs.
+
+    They fire from ``agentrun.utils.model.from_object`` whenever the SDK
+    deserializes a server-side record whose shape doesn't match its current
+    pydantic schema (e.g. a runtime someone else created with
+    ``codeConfiguration.language=java17`` or with an empty ``logConfiguration``).
+    That noise is not actionable for the CLI user — a single ``ar runtime list``
+    can emit a dozen of them. ``--debug`` re-enables full logging.
+    """
+
+    def filter(self, record: logging.LogRecord) -> bool:
+        return "validate type failed" not in record.getMessage()
+
+
+logging.getLogger("agentrun-logger").addFilter(_DropSdkValidationWarnings())
+
+
 class AliasGroup(click.Group):
     """Click Group that supports hidden command aliases."""
 
@@ -95,9 +114,13 @@ def cli(ctx: click.Context, profile, region, output, debug):
     ctx.obj["output"] = output
 
     if debug:
-        import logging
-
         logging.basicConfig(level=logging.DEBUG)
+        # In debug mode users want to see the SDK's validation warnings, so
+        # strip the filter we installed at import time.
+        sdk_logger = logging.getLogger("agentrun-logger")
+        for f in list(sdk_logger.filters):
+            if isinstance(f, _DropSdkValidationWarnings):
+                sdk_logger.removeFilter(f)
 
 
 # Register sub-command groups
diff --git a/tests/integration/test_runtime_cmd.py b/tests/integration/test_runtime_cmd.py
@@ -174,6 +174,10 @@ def _refresh(self=None, *a, **k):
     assert out[0]["action"] == "create"
     assert out[0]["runtime"]["name"] == "my-agent"
     fake_runtime_cls.create.assert_called_once()
+    # --no-wait must not touch endpoints — the backend rejects endpoint
+    # create while the runtime is CREATING/UPDATING.
+    created.create_endpoint.assert_not_called()
+    assert out[0]["endpoints"] == []
 
 
 def test_apply_update_path(monkeypatch):
@@ -205,6 +209,8 @@ def test_apply_update_path(monkeypatch):
     assert result.exit_code == 0, result.output
     out = json.loads(result.output)
     assert out[0]["action"] == "update"
+    # Default --wait path reconciles endpoints after runtime reaches READY.
+    existing.create_endpoint.assert_called_once()
 
 
 def test_apply_runtime_failed_exits_5(monkeypatch):
diff --git a/tests/unit/test_runtime_render.py b/tests/unit/test_runtime_render.py
@@ -53,6 +53,40 @@ def test_create_input_injects_system_tag_and_container_artifact():
     assert inp.container_configuration.image == "img:v1"
     # code_configuration must not be set
     assert inp.code_configuration is None
+    # Defaults injected — backend rejects nulls for these three fields.
+    assert inp.cpu == 2.0
+    assert inp.memory == 4096
+    assert inp.port == 9000
+
+
+def test_create_input_user_values_override_defaults():
+    p = ParsedAgentRuntime(
+        name="my-agent",
+        container=ParsedContainer(image="img:v1"),
+        cpu=4,
+        memory=16384,
+        port=8080,
+    )
+    inp = to_runtime_create_input(p)
+    assert inp.cpu == 4
+    assert inp.memory == 16384
+    assert inp.port == 8080
+
+
+def test_create_input_container_port_wins_over_spec_port():
+    p = ParsedAgentRuntime(
+        name="my-agent",
+        container=ParsedContainer(image="img:v1", port=7777),
+        port=9000,
+    )
+    assert to_runtime_create_input(p).port == 7777
+
+
+def test_update_input_applies_same_defaults():
+    upd = to_runtime_update_input(_minimal_parsed())
+    assert upd.cpu == 2.0
+    assert upd.memory == 4096
+    assert upd.port == 9000
 
 
 def test_endpoints_none_injects_default():