Skip to content

feat(gateway): accurate status read for apps and components#455

Merged
bburda merged 11 commits into
mainfrom
feat/454-accurate-status-read
Jun 22, 2026
Merged

feat(gateway): accurate status read for apps and components#455
bburda merged 11 commits into
mainfrom
feat/454-accurate-status-read

Conversation

@bburda

@bburda bburda commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

Make GET /{apps|components}/{id}/status report ready / notReady from real signals instead of
coarse ones. Read-only; the PUT lifecycle transitions stay 501 (actuation is tracked separately).

  • App status: for a managed lifecycle node (one exposing lifecycle_msgs get_state and
    change_state), the status is read from the actual lifecycle state via a GetState service call
    (active is ready; any other state, or an unreachable/timed-out read, is notReady). This
    subsumes liveness, since a dead node's service is gone. Plain (unmanaged) nodes keep using
    App::is_online (presence in the ROS 2 graph), which is the best signal available for an
    unmanaged node.
  • Component status: the synthetic host component (carrying host_metadata) is ready while the
    gateway is serving the request. Any other local component derives readiness from its hosted apps:
    it is notReady when it hosts apps and every one of them is offline (the subsystem is down), and
    ready otherwise, including when it hosts no apps. A single down hosted app is that app's own
    notReady, not enough to mark the component down. Remote components are unchanged (reached
    through aggregation forwarding).

The lifecycle read goes through a small injected LifecycleStateReader seam: a pure interface in
gateway_core (keeps the handler unit-testable with a stub and preserves core purity) plus a
gateway_ros2 implementation that calls GetState on a private node and private single-threaded
executor spun inline (mirrors the existing ros2_fault_service_transport pattern, so it never
blocks or races the gateway executor). A managed_lifecycle rclcpp_lifecycle demo node and an
integration test prove the path end to end.

The third LifecycleHandlers constructor parameter is defaulted, so existing call sites and
non-lifecycle apps are unchanged. No API break.

Issue

Type

  • New feature or tests

Testing

Unit (GTest):

  • Pure helpers: active to ready, other state or nullopt to notReady; the lifecycle-service
    detector requires both get_state and change_state.
  • Ros2LifecycleStateReader against an in-process GetState service: reads active / inactive,
    returns nullopt for an unreachable service.
  • Handler: a managed node whose state is inactive reports notReady even though it is online
    (the discriminator); an active managed node reports ready; a plain node still uses is_online.
    Component reads return ready for the host component and a zero-app component, and notReady for
    a component that hosts apps which are all offline.

Integration (launch_testing):

  • A managed_lifecycle node is launched and left unconfigured. It is present in the ROS 2 graph
    (so is_online alone would say ready), but GET /apps/managed_lifecycle/status returns
    notReady, proving the status is read from the lifecycle state via GetState. Tagged
    @verifies REQ_INTEROP_076.

clang-tidy is clean on the changed files.

Checklist

  • Breaking changes are clearly described (none - additive; the new constructor parameter is defaulted)
  • Tests were added or updated if needed
  • Docs were updated if behavior or public API changed (design doc + README)

Copilot AI review requested due to automatic review settings June 21, 2026 07:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes GET /api/v1/{apps|components}/{id}/status reflect more accurate readiness signals: lifecycle-managed apps report readiness from lifecycle_msgs/srv/GetState, and local components are always ready while the gateway is reachable (remote components still rely on aggregation forwarding).

Changes:

  • App status now uses lifecycle state when the app exposes both GetState and ChangeState services (activeready, otherwise/timeout → notReady), falling back to is_online for unmanaged nodes.
  • Component status (no provider) is simplified: any reachable local component returns ready independent of hosted app presence.
  • Adds a LifecycleStateReader seam (core interface + ROS2 implementation), plus unit/integration tests and a new lifecycle demo node.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/ros2_medkit_integration_tests/test/features/test_lifecycle.test.py Adds an integration test proving lifecycle-based status for an unconfigured lifecycle node.
src/ros2_medkit_integration_tests/ros2_medkit_test_utils/launch_helpers.py Registers the new managed_lifecycle demo node for launch tests.
src/ros2_medkit_integration_tests/package.xml Adds rclcpp_lifecycle dependency for the new demo node.
src/ros2_medkit_integration_tests/demo_nodes/managed_lifecycle_node.cpp New lifecycle demo node (defaults to unconfigured) used by integration tests.
src/ros2_medkit_integration_tests/CMakeLists.txt Builds/installs the managed_lifecycle demo executable and finds rclcpp_lifecycle.
src/ros2_medkit_gateway/include/ros2_medkit_gateway/core/status/lifecycle_state_reader.hpp Introduces core lifecycle reader interface + helper declarations (core-pure seam).
src/ros2_medkit_gateway/src/core/status/lifecycle_status_helpers.cpp Implements lifecycle-state → SOVD-status mapping and lifecycle-service detection helper.
src/ros2_medkit_gateway/include/ros2_medkit_gateway/ros2/status/ros2_lifecycle_state_reader.hpp Declares ROS2 GetState-backed lifecycle reader implementation.
src/ros2_medkit_gateway/src/ros2/status/ros2_lifecycle_state_reader.cpp Implements lifecycle state reads via a private node/executor and bounded timeout.
src/ros2_medkit_gateway/src/http/rest_server.cpp Wires Ros2LifecycleStateReader into LifecycleHandlers at server construction.
src/ros2_medkit_gateway/include/ros2_medkit_gateway/core/http/handlers/lifecycle_handlers.hpp Extends LifecycleHandlers constructor to accept an optional lifecycle reader (defaulted).
src/ros2_medkit_gateway/src/http/handlers/lifecycle_handlers.cpp Updates default status logic for apps/components (lifecycle-aware apps; local components always ready).
src/ros2_medkit_gateway/test/test_lifecycle_status_helpers.cpp Adds unit coverage for the helper mapping and lifecycle-service detection behavior.
src/ros2_medkit_gateway/test/test_ros2_lifecycle_state_reader.cpp Adds unit coverage for Ros2LifecycleStateReader against an in-process GetState service.
src/ros2_medkit_gateway/test/test_lifecycle_handlers.cpp Updates handler tests for new component semantics and adds stubbed-reader tests for lifecycle apps.
src/ros2_medkit_gateway/CMakeLists.txt Adds new tests/targets and lifecycle_msgs dependency; registers lifecycle reader source in gateway_ros2.
src/ros2_medkit_gateway/package.xml Adds lifecycle_msgs dependency required for GetState usage.
src/ros2_medkit_gateway/README.md Updates user-facing docs for the new status derivation rules.
src/ros2_medkit_gateway/design/lifecycle.rst Updates design doc to reflect lifecycle-based app status and new component readiness rule.

Comment thread src/ros2_medkit_gateway/CMakeLists.txt
@bburda bburda self-assigned this Jun 21, 2026
@bburda bburda requested a review from mfaferek93 June 21, 2026 09:10
Comment thread src/ros2_medkit_gateway/src/http/handlers/lifecycle_handlers.cpp
Comment thread src/ros2_medkit_gateway/src/ros2/status/ros2_lifecycle_state_reader.cpp Outdated
Comment thread src/ros2_medkit_gateway/src/ros2/status/ros2_lifecycle_state_reader.cpp Outdated
Comment thread src/ros2_medkit_gateway/src/http/handlers/lifecycle_handlers.cpp Outdated
Comment thread src/ros2_medkit_gateway/design/lifecycle.rst Outdated
bburda added 2 commits June 22, 2026 14:49
- Skip the GetState read when an app is offline: an offline node cannot be
  "active", and a crashed managed node whose get_state/change_state services
  still linger in the cached App::services would otherwise force a
  wait_for_service + spin to the read timeout on every /status poll, under the
  reader's global mutex.
- Run wait_for_service outside the reader mutex (it is backed by an independent
  graph listener) so a slow or unreachable node no longer head-of-line-blocks
  concurrent status reads; keep create_client, the spin, and the client
  teardown serialized under the mutex.
- Guard an empty service path and wrap create_client in try/catch so a
  malformed cached path degrades to notReady instead of escaping onto the HTTP
  handler thread.
- Lock the reader mutex during teardown so an in-flight read cannot race
  remove_node.
- Cover the offline short-circuit (unit) and the over-the-wire active->ready
  path (integration via an auto-activated lifecycle node).
- Add the pure helper test to the coverage target list and document the read
  serialization and offline short-circuit.
A local non-host component now derives readiness from its hosted apps: it is
notReady when it hosts apps and every one of them is offline (its subsystem is
down), and ready otherwise, including when it hosts no apps. The synthetic host
component (carrying host_metadata) stays ready while the gateway is serving the
request. This keeps a manifest-declared subsystem in the entity tree and lets
its status reflect that it is unavailable, instead of always reporting ready.

Runtime-only discovery is unaffected: it only ever produces the host component,
so the new derivation applies to manifest and hybrid components.
@bburda bburda merged commit ec38041 into main Jun 22, 2026
22 of 24 checks passed
@bburda bburda deleted the feat/454-accurate-status-read branch June 22, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accurate status read for apps and components

3 participants