Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -501,6 +501,9 @@ agent.tool.gr00t_inference(action="stop", port=8000)
| `ZENOH_CONNECT` | Comma-separated list of remote Zenoh endpoints to connect to | - |
| `ZENOH_LISTEN` | Comma-separated list of endpoints for the local Zenoh listener | - |
| `STRANDS_MESH_AUDIT_DIR` | Directory for the safety audit log (`mesh_audit.jsonl`) | `~/.strands_robots/` |
| `STRANDS_MESH_CA_PINS` | Comma-separated SHA-256 hex pins, **additive** to the bundled Amazon Root CA1 pin tuple. Operator break-glass for an AWS-side root rotation that arrives before the next `strands-robots` release ships the new pin. Must match `^[0-9a-fA-F]{64}$` per entry. | unset |
| `STRANDS_MESH_DISABLE_CA_PIN` | Set to `true` (case-insensitive) to skip CA pin verification on the *download* path only. The on-disk re-use path always raw-checks the pin regardless. Last-resort break-glass; prefer `STRANDS_MESH_CA_PINS` for rotations. | `false` |
| `STRANDS_MESH_CAMERA_PRESIGN_TTL` | Default TTL (seconds) for S3 presigned URLs emitted on the IoT camera-offload path. Capped at 3600s (1h); a `0` is clamped to 1s. Non-integer values fall back to the default with a WARNING. | `60` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc-code drift on the negative/zero clamp behaviour.

This row says 'a 0 is clamped to 1s' -- accurate for both kwarg and env-var paths. But the R7 fix in camera_offload.py:118-124 introduced an asymmetric WARNING posture: env-var path of 0 (or any < 1) emits a WARNING, while explicit presign_ttl=0 kwarg silently clamps to 1 (preserving the R1 sentinel for the deliberate-zero case).

The negative-value behaviour is similarly split: env =-99 -> WARN+clamp; kwarg =-99 -> silent clamp. R8 deferred this asymmetry to #259, which is fine, but the README is the operator-facing source of truth -- an operator who sees =-99 typo'd into the env file gets a WARNING and finds the row here. They'll be surprised that passing presign_ttl=-99 from a config-loader sees no warning.

Suggest tightening this row to: Capped at 3600s (1h); env-var values < 1 (including non-integer) fall back to 1s with a WARNING; explicit kwarg of 0 is silently clamped to 1s (deliberate sentinel for 'always-expired'). Negative kwarg values clamp silently -- tracked in #259.

Non-blocking; tracked. Flagging because the README is shipped to operators and the asymmetry is exactly the kind of surprise that wastes 20 minutes of debugging.

| `GROOT_API_TOKEN` | API token for GR00T inference service | - |
| `STRANDS_ROBOT_MODE` | Override `Robot()` factory mode detection (`sim`, `real`, `auto`) | `auto` |
| `STRANDS_TRUST_REMOTE_CODE` | Set to `1` to opt into HuggingFace `trust_remote_code` for `lerobot_local` policies | unset |
Expand Down
10 changes: 5 additions & 5 deletions strands_robots/mesh/iot/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

This subpackage owns the cloud-side concerns of the mesh:

- :mod:`provision` — single-Thing bootstrap (cert + policy + Thing).
- :mod:`bootstrap` — account-wide bootstrap (Rules + Lambda +
- :mod:`provision` — single-Thing bootstrap (cert + policy + Thing).
- :mod:`bootstrap` — account-wide bootstrap (Rules + Lambda +
DynamoDB audit + Fleet Provisioning template).
- :mod:`shadow` — Device Shadow named-shadow mirror of presence.
- :mod:`shadow` — Device Shadow named-shadow mirror of presence.
- :mod:`camera_offload` — S3-backed camera frame offload.

The wire-level transport (:class:`IotMqttTransport`) lives in
Expand All @@ -20,14 +20,14 @@
from strands_robots.mesh.iot import bootstrap_account, provision_robot

# 1. Once per AWS account/region: spin up Rules / Lambda / DynamoDB /
# Fleet Provisioning template / etc.
# Fleet Provisioning template / etc.
bootstrap_account()

# 2. Per robot: issue a cert + attach the strands-robot policy.
p = provision_robot("so100-arm-01")

# 3. Run the robot under the iot transport.
# (export the env vars from p.env_vars() and `Robot()` Just Works.)
# (export the env vars from p.env_vars() and `Robot()` Just Works.)

For a single-robot dev setup, step 1 is optional — without it E-stop
fan-out and DynamoDB audit just don't activate; everything else still
Expand Down
84 changes: 72 additions & 12 deletions strands_robots/mesh/iot/camera_offload.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,23 @@
``STRANDS_MESH_CAMERA_S3_PREFIX``
Optional prefix inside the bucket (defaults to ``""``).
``STRANDS_MESH_CAMERA_PRESIGN_TTL``
Seconds the presigned GET URL stays valid (default 3600).
Seconds the presigned GET URL stays valid. Defaults to
:data:`DEFAULT_PRESIGN_TTL_SECONDS` (60s); clamped at
:data:`MAX_PRESIGN_TTL_SECONDS` (1 hour) to prevent accidental
day- or week-long URLs. Pass ``presign_ttl=N`` to override
explicitly; the env-var fallback only applies when the kwarg is
``None``.

Bucket-ownership threat model
-----------------------------
The S3 PutObject path in :meth:`CameraOffloader._upload_frame` does
not pass an ``ACL=`` kwarg. The contract for the offload bucket is
that the operator configures it with object-ownership control
``BucketOwnerEnforced`` (and a bucket policy that denies public
ACLs); that enforcement is out of scope for this library because
deployments differ on whether the bucket is shared with non-mesh
producers. A future code-side ``ACL="private"`` + ``ChecksumAlgorithm``
hardening is tracked in #249.
"""

from __future__ import annotations
Expand All @@ -50,7 +66,13 @@
logger = logging.getLogger(__name__)


DEFAULT_PRESIGN_TTL_SECONDS = 3600
# Lifetime of the presigned GET URL we hand out for each camera frame.
# Kept deliberately short -- anyone who briefly captures a /ref MQTT message
# can use the URL inside this window. ``STRANDS_MESH_CAMERA_PRESIGN_TTL``
# overrides for higher-latency consumers, clamped to MAX_PRESIGN_TTL_SECONDS
# (1 hour) to prevent accidental day- or week-long URLs.
DEFAULT_PRESIGN_TTL_SECONDS = 60
MAX_PRESIGN_TTL_SECONDS = 3600


class CameraOffloader:
Expand All @@ -70,9 +92,37 @@ def __init__(
) -> None:
self.bucket = bucket or os.getenv("STRANDS_MESH_CAMERA_S3_BUCKET", "")
self.prefix = (prefix or os.getenv("STRANDS_MESH_CAMERA_S3_PREFIX") or "").strip("/")
self.presign_ttl = presign_ttl or int(
os.getenv("STRANDS_MESH_CAMERA_PRESIGN_TTL", str(DEFAULT_PRESIGN_TTL_SECONDS))
)
if presign_ttl is not None:
ttl_raw = presign_ttl
else:
raw_env = os.getenv("STRANDS_MESH_CAMERA_PRESIGN_TTL")
if raw_env is None or raw_env == "":
ttl_raw = DEFAULT_PRESIGN_TTL_SECONDS
else:
try:
Comment thread
cagataycali marked this conversation as resolved.
ttl_raw = int(raw_env)
except ValueError:
logger.warning(
"[camera_offload] STRANDS_MESH_CAMERA_PRESIGN_TTL=%r is not an integer; using default %ds",
raw_env,
DEFAULT_PRESIGN_TTL_SECONDS,
)
ttl_raw = DEFAULT_PRESIGN_TTL_SECONDS
if ttl_raw > MAX_PRESIGN_TTL_SECONDS:
Comment thread
cagataycali marked this conversation as resolved.
logger.warning(
"[camera_offload] STRANDS_MESH_CAMERA_PRESIGN_TTL=%d > %d cap; clamping",
ttl_raw,
MAX_PRESIGN_TTL_SECONDS,
)
ttl_raw = MAX_PRESIGN_TTL_SECONDS
if ttl_raw < 1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative-clamp warning is asymmetric: env-var triggers WARNING, kwarg-supplied negative is silent.

The R7 fix made the env-var path log a WARNING when STRANDS_MESH_CAMERA_PRESIGN_TTL=-99 resolves to < 1 (lines 119-123, gated on presign_ttl is None). The intent quoted in the PR description is "explicit presign_ttl=0 kwarg still clamps silently (caller is deliberate)".

But presign_ttl=0 and presign_ttl=-99 are NOT both "deliberate caller values". 0 is a documented edge that the R1 fix specifically pinned. -99 is unambiguously a bug at the call site (no caller deliberately wants a negative TTL clamped to 1). Treating both as "caller is deliberate" misses the intent.

Two options:

  1. WARN on any sub-1 value regardless of source (drop the if presign_ttl is None: gate).
  2. WARN on any sub-1 value except exactly 0 (which tests/mesh/test_presign_ttl_none_vs_zero.py documents as the kwarg-vs-env-precedence sentinel).

Option 2 preserves the R1 invariant. Either is better than the current asymmetry; the current code lets CameraOffloader(bucket='b', presign_ttl=-99) succeed silently, which is the behaviour the env-var WARNING was added to fix on the env-var side.

if presign_ttl is None:
logger.warning(
"[camera_offload] STRANDS_MESH_CAMERA_PRESIGN_TTL=%d < 1 floor; clamping to 1s",
ttl_raw,
)
ttl_raw = 1
self.presign_ttl = ttl_raw
self.region = region or os.getenv("AWS_REGION", os.getenv("AWS_DEFAULT_REGION"))
self._s3: Any | None = None

Expand Down Expand Up @@ -120,7 +170,7 @@ def upload_frame(self, peer_id: str, cam_name: str, jpeg_bytes: bytes, ts: float
Body=jpeg_bytes,
ContentType="image/jpeg",
)
except Exception as exc:
except Exception as exc: # noqa: BLE001 -- boto3 raises ClientError, EndpointConnectionError, NoCredentialsError, etc.; offload is best-effort
Comment thread
cagataycali marked this conversation as resolved.
logger.debug("[camera_offload] put_object %s failed: %s", key, exc)
return None

Expand All @@ -130,7 +180,7 @@ def upload_frame(self, peer_id: str, cam_name: str, jpeg_bytes: bytes, ts: float
Params={"Bucket": self.bucket, "Key": key},
ExpiresIn=self.presign_ttl,
)
except Exception as exc:
except Exception as exc: # noqa: BLE001 -- boto3 ClientError / NoCredentialsError; presign is best-effort
logger.debug("[camera_offload] presign %s failed: %s", key, exc)
url = None

Expand Down Expand Up @@ -181,7 +231,7 @@ def _publish_cameras_once_with_offload() -> None:
# call it to preserve any user customisation that might have been added).
try:
original()
except Exception as exc:
except Exception as exc: # noqa: BLE001 -- original is user-customised; offload must not block on user code
logger.debug("[camera_offload] original _publish_cameras_once raised: %s", exc)

# Now do the S3 offload + ref publish per camera.
Expand All @@ -195,13 +245,14 @@ def _publish_cameras_once_with_offload() -> None:

try:
obs = inner.get_observation()
except Exception:
except Exception as exc: # noqa: BLE001 -- LeRobot get_observation() may raise hardware-specific errors
logger.debug("[camera_offload] get_observation failed: %s", exc)
return

try:
import cv2
except Exception:
logger.debug("[camera_offload] cv2 unavailable skipping S3 upload")
except ImportError:
logger.debug("[camera_offload] cv2 unavailable -- skipping S3 upload")
return

transport = current_transport()
Expand Down Expand Up @@ -231,8 +282,17 @@ def _publish_cameras_once_with_offload() -> None:
if ref is None:
continue
ref["shape"] = list(shape)
# Publish the /ref topic via the transport layer.
# On ``iot`` the IoT Policy's AllowOwnTopics statement
# bounds writes to ``strands/<ThingName>/*`` (covers
# ``camera/*/ref`` via the trailing wildcard); on
# ``bridge`` the Zenoh ACL adds a LAN-side gate on top.
# ``enable_for_mesh`` early-returns unless the active
# backend is one of those two, so the publish reaches
# the wire only when at least one of these gates is in
# force.
transport.put(f"strands/{mesh.peer_id}/camera/{cam_name}/ref", ref)
except Exception as exc:
except Exception as exc: # noqa: BLE001 -- numpy / cv2 / transport.put can raise diverse errors per frame; offload is best-effort
logger.debug(
"[camera_offload] %s/%s offload failed: %s",
mesh.peer_id,
Expand Down
Loading
Loading