Skip to content

Commit cf53871

Browse files
committed
fix: address self-review findings
- Wire kRobotResetFailed across ResetRobotStateAction failure branches (goal rejection + non-SUCCEEDED result for reset, unload-traj and load-traj steps). - Instrument Isaac TF transform timeouts with kIsaacFoundationPoseFailed to match the sibling no-message and no-valid-detection timeouts. - Replace silent FAILURE in GripperTrajAction::onRunning !goal_sent_ with BT::RuntimeError, matching the programmer-error policy. - Add reportFaultPassed on onHalted for WaitForInputAction and WaitForObjectAction so abandoned waits do not leave kSignalWaitInputTimeout / kObjectWaitTimeout lingering in FaultManager. - Drop orphan codes kPlannerMissingMoveId (programmer-error path uses throw) and kObjectExistsCheckFailed (CheckObjectExists is intentionally not instrumented; FAILURE is normal control flow there). - Pin medkit apt packages to MEDKIT_VERSION=0.4.0-1 so an upstream bump does not silently change CI behaviour. - Drop humble from the CI matrix until ros-humble-ros2-medkit-* debs appear on packages.ros.org (rosdistro entry exists; debs do not yet). - Update docs/FAULT_CODES.md: add kRobotResetFailed, expand the Isaac FoundationPose entry to cover TF timeouts, and document the reportFaultPassed-on-halt pairing for both wait codes.
1 parent ebb776d commit cf53871

7 files changed

Lines changed: 75 additions & 13 deletions

File tree

.github/workflows/ci.yml

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,12 @@ jobs:
1313
strategy:
1414
fail-fast: false
1515
matrix:
16-
ros_distro: [humble, jazzy]
16+
# Humble dropped from the matrix because ros-humble-ros2-medkit-*
17+
# debs are not yet available on packages.ros.org despite the
18+
# rosdistro entry. The fork targets the medkit integration path,
19+
# so a CI build without those debs is not meaningful. Re-add when
20+
# Humble debs land.
21+
ros_distro: [jazzy]
1722

1823
container:
1924
image: ros:${{ matrix.ros_distro }}-ros-base
@@ -26,6 +31,9 @@ jobs:
2631
CCACHE_BASEDIR: /tmp/ros_ws
2732
CCACHE_COMPRESS: "1"
2833
CCACHE_MAXSIZE: 5G
34+
# Pin medkit to the rosdistro-released version so an upstream bump
35+
# does not silently break this CI. Update deliberately when bumping.
36+
MEDKIT_VERSION: "0.4.0-1"
2937

3038
steps:
3139
- name: Checkout repository
@@ -63,10 +71,10 @@ jobs:
6371
ccache \
6472
ros-$ROS_DISTRO-topic-based-ros2-control \
6573
python3-colcon-common-extensions python3-rosdep git \
66-
ros-$ROS_DISTRO-ros2-medkit-fault-reporter \
67-
ros-$ROS_DISTRO-ros2-medkit-msgs \
68-
ros-$ROS_DISTRO-ros2-medkit-fault-manager \
69-
ros-$ROS_DISTRO-ros2-medkit-cmake \
74+
ros-$ROS_DISTRO-ros2-medkit-fault-reporter=$MEDKIT_VERSION* \
75+
ros-$ROS_DISTRO-ros2-medkit-msgs=$MEDKIT_VERSION* \
76+
ros-$ROS_DISTRO-ros2-medkit-fault-manager=$MEDKIT_VERSION* \
77+
ros-$ROS_DISTRO-ros2-medkit-cmake=$MEDKIT_VERSION* \
7078
ros-$ROS_DISTRO-moveit \
7179
ros-$ROS_DISTRO-ur \
7280
ros-$ROS_DISTRO-ur-moveit-config \

docs/FAULT_CODES.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ node sources; the source-of-truth string constants live in
3535
| `MANYMOVE_OBJECT_REMOVE_FAILED` | ERROR | `RemoveCollisionObjectAction` action result `success=false`. | - |
3636
| `MANYMOVE_OBJECT_ATTACH_FAILED` | ERROR | `AttachDetachObjectAction` action result `success=false`. | - |
3737
| `MANYMOVE_OBJECT_GET_POSE_FAILED` | ERROR | `GetObjectPoseAction` action result `success=false`. | - |
38-
| `MANYMOVE_OBJECT_WAIT_TIMEOUT` | WARN | `WaitForObjectAction` elapsed without observing the expected presence/absence. | `reportFaultPassed` when the wait condition is met. |
38+
| `MANYMOVE_OBJECT_WAIT_TIMEOUT` | WARN | `WaitForObjectAction` elapsed without observing the expected presence/absence. | `reportFaultPassed` when the wait condition is met or on `onHalted` (subtree abandoned the wait). |
3939

4040
`CheckObjectExistsAction` returning FAILURE for "object missing" is intentional
4141
control flow used by callers as a condition; it is not instrumented.
@@ -46,8 +46,9 @@ control flow used by callers as a condition; it is not instrumented.
4646
|------|----------|---------|------|
4747
| `MANYMOVE_SIGNAL_SET_OUTPUT_FAILED` | ERROR | `SetOutputAction` action result reports failure. | - |
4848
| `MANYMOVE_SIGNAL_GET_INPUT_FAILED` | ERROR | `GetInputAction` action result reports failure, or `WaitForInputAction` cannot connect to the action server within 5s. | - |
49-
| `MANYMOVE_SIGNAL_WAIT_INPUT_TIMEOUT` | WARN | `WaitForInputAction` timeout elapsed without observing the desired value. | `reportFaultPassed` on success. |
49+
| `MANYMOVE_SIGNAL_WAIT_INPUT_TIMEOUT` | WARN | `WaitForInputAction` timeout elapsed without observing the desired value. | `reportFaultPassed` on success or on `onHalted` (subtree abandoned the wait). |
5050
| `MANYMOVE_ROBOT_NOT_READY` | CRITICAL | `CheckRobotStateAction` reports `ready=false`. | `reportFaultPassed` once the robot reports `ready=true`. |
51+
| `MANYMOVE_ROBOT_RESET_FAILED` | ERROR | `ResetRobotStateAction` fails any of its three steps (unload trajectory controller, reset robot state, load trajectory controller); raised on goal rejection or non-SUCCEEDED result. | - |
5152

5253
## Logic / TF (`action_nodes_logic.cpp`)
5354

@@ -67,7 +68,7 @@ control flow used by callers as a condition; it is not instrumented.
6768

6869
| Code | Severity | Trigger | Pair |
6970
|------|----------|---------|------|
70-
| `MANYMOVE_ISAAC_FOUNDATION_POSE_FAILED` | ERROR | `FoundationPoseAlignmentNode` timed out waiting for detections, or no detection passed the `target_id`/`min_score` filter within the configured timeout. | - |
71+
| `MANYMOVE_ISAAC_FOUNDATION_POSE_FAILED` | ERROR | `FoundationPoseAlignmentNode` timed out waiting for detections, no detection passed the `target_id`/`min_score` filter within the configured timeout, or TF transform of the detection pose to the alignment / planning frame timed out. | - |
7172

7273
`GetEntityPoseNode`/`SetEntityPoseNode` service errors and Isaac TF timeouts
7374
are not yet instrumented; they will gain dedicated codes when the

manymove_cpp_trees/include/manymove_cpp_trees/fault_codes.hpp

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,6 @@ inline constexpr char kPlannerRetryAttempt[] = "MANYMOVE_PLANNER_RETRY_ATTEMPT";
5151
inline constexpr char kPlannerRetriesExhausted[] = "MANYMOVE_PLANNER_RETRIES_EXHAUSTED";
5252
// External e-stop / stop_execution flag triggered while motion was running.
5353
inline constexpr char kPlannerEstopTriggered[] = "MANYMOVE_PLANNER_ESTOP_TRIGGERED";
54-
// Required move_id input missing on the BT node.
55-
inline constexpr char kPlannerMissingMoveId[] = "MANYMOVE_PLANNER_MISSING_MOVE_ID";
5654

5755
// -- Object manager ---------------------------------------------------------
5856
// AddCollisionObject action returned failure (planning scene reject).
@@ -65,8 +63,6 @@ inline constexpr char kObjectAttachFailed[] = "MANYMOVE_OBJECT_ATTACH_FAILED";
6563
inline constexpr char kObjectGetPoseFailed[] = "MANYMOVE_OBJECT_GET_POSE_FAILED";
6664
// WaitForObject elapsed without the expected object presence/absence.
6765
inline constexpr char kObjectWaitTimeout[] = "MANYMOVE_OBJECT_WAIT_TIMEOUT";
68-
// CheckObjectExists action errored out (different from a clean negative).
69-
inline constexpr char kObjectExistsCheckFailed[] = "MANYMOVE_OBJECT_EXISTS_CHECK_FAILED";
7066

7167
// -- Signals (gripper, IO, robot state) -------------------------------------
7268
// SetOutputAction returned a failed result from the action server.

manymove_cpp_trees/src/action_nodes_gripper.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,10 @@ BT::NodeStatus GripperTrajAction::onStart()
284284
BT::NodeStatus GripperTrajAction::onRunning()
285285
{
286286
if (!goal_sent_) {
287-
return BT::NodeStatus::FAILURE;
287+
// onRunning reached before onStart actually dispatched a goal: BT
288+
// framework misuse, not an operational fault (per the
289+
// programmer-error-throws / operational-fault-reports policy).
290+
throw BT::RuntimeError("GripperTrajAction::onRunning called before goal was sent");
288291
}
289292

290293
if (!result_received_) {

manymove_cpp_trees/src/action_nodes_isaac.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -608,6 +608,10 @@ BT::NodeStatus FoundationPoseAlignmentNode::onRunning()
608608
RCLCPP_ERROR(
609609
node_->get_logger(), "[%s] Timed out waiting for TF transform to '%s'", name().c_str(),
610610
alignment_frame.c_str());
611+
reportFault(
612+
fault_codes::kIsaacFoundationPoseFailed, kSeverityError,
613+
"FoundationPose: TF transform to '" + alignment_frame +
614+
"' timed out after " + std::to_string(timeout_seconds_) + "s");
611615
return BT::NodeStatus::FAILURE;
612616
}
613617
}
@@ -646,6 +650,10 @@ BT::NodeStatus FoundationPoseAlignmentNode::onRunning()
646650
RCLCPP_ERROR(
647651
node_->get_logger(), "[%s] Timed out waiting for TF transform to '%s'", name().c_str(),
648652
planning_frame_.c_str());
653+
reportFault(
654+
fault_codes::kIsaacFoundationPoseFailed, kSeverityError,
655+
"FoundationPose: TF transform to planning frame '" + planning_frame_ +
656+
"' timed out after " + std::to_string(timeout_seconds_) + "s");
649657
return BT::NodeStatus::FAILURE;
650658
}
651659
}

manymove_cpp_trees/src/action_nodes_objects.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1119,6 +1119,11 @@ void WaitForObjectAction::onHalted()
11191119
RCLCPP_INFO(node_->get_logger(), "%s: Goal canceled.", kName);
11201120
}
11211121

1122+
// Heal a previously raised wait-timeout: when the subtree is halted
1123+
// mid-wait, the timeout condition no longer holds. Without this,
1124+
// any earlier `kObjectWaitTimeout` would linger in FaultManager.
1125+
reportFaultPassed(fault_codes::kObjectWaitTimeout);
1126+
11221127
goal_sent_ = false;
11231128
result_received_ = false;
11241129
}

manymove_cpp_trees/src/action_nodes_signals.cpp

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -657,6 +657,10 @@ void ResetRobotStateAction::goalResponseCallback(
657657
name().c_str());
658658
action_result_.success = false;
659659
result_received_ = true;
660+
reportFault(
661+
fault_codes::kRobotResetFailed, kSeverityError,
662+
std::string("ResetRobotStateAction [") + name() +
663+
"]: reset_robot_state goal rejected by server");
660664
} else {
661665
RCLCPP_INFO(
662666
node_->get_logger(), "ResetRobotStateAction [%s]: ResetRobotState Goal ACCEPTED by server.",
@@ -672,6 +676,10 @@ void ResetRobotStateAction::resultCallback(
672676
} else {
673677
action_result_.success = false;
674678
action_result_.message = "ResetRobotState aborted or failed";
679+
reportFault(
680+
fault_codes::kRobotResetFailed, kSeverityError,
681+
std::string("ResetRobotStateAction [") + name() +
682+
"]: reset_robot_state action aborted or failed");
675683
}
676684
result_received_ = true;
677685
}
@@ -684,6 +692,10 @@ void ResetRobotStateAction::goalResponseCallbackUnloadTraj(
684692
node_->get_logger(),
685693
"ResetRobotStateAction [%s]: UnloadTrajController Goal REJECTED by server.", name().c_str());
686694
unload_traj_success_ = false;
695+
reportFault(
696+
fault_codes::kRobotResetFailed, kSeverityError,
697+
std::string("ResetRobotStateAction [") + name() +
698+
"]: unload_trajectory_controller goal rejected by server");
687699
} else {
688700
RCLCPP_INFO(
689701
node_->get_logger(),
@@ -696,8 +708,18 @@ void ResetRobotStateAction::resultCallbackUnloadTraj(
696708
{
697709
if (wrapped_result.code == rclcpp_action::ResultCode::SUCCEEDED) {
698710
unload_traj_success_ = wrapped_result.result->success;
711+
if (!unload_traj_success_) {
712+
reportFault(
713+
fault_codes::kRobotResetFailed, kSeverityError,
714+
std::string("ResetRobotStateAction [") + name() +
715+
"]: unload_trajectory_controller reported success=false");
716+
}
699717
} else {
700718
unload_traj_success_ = false;
719+
reportFault(
720+
fault_codes::kRobotResetFailed, kSeverityError,
721+
std::string("ResetRobotStateAction [") + name() +
722+
"]: unload_trajectory_controller action aborted or failed");
701723
}
702724
}
703725

@@ -709,6 +731,10 @@ void ResetRobotStateAction::goalResponseCallbackLoadTraj(
709731
node_->get_logger(),
710732
"ResetRobotStateAction [%s]: LoadTrajController Goal REJECTED by server.", name().c_str());
711733
load_traj_success_ = false;
734+
reportFault(
735+
fault_codes::kRobotResetFailed, kSeverityError,
736+
std::string("ResetRobotStateAction [") + name() +
737+
"]: load_trajectory_controller goal rejected by server");
712738
} else {
713739
RCLCPP_INFO(
714740
node_->get_logger(),
@@ -721,8 +747,18 @@ void ResetRobotStateAction::resultCallbackLoadTraj(
721747
{
722748
if (wrapped_result.code == rclcpp_action::ResultCode::SUCCEEDED) {
723749
load_traj_success_ = wrapped_result.result->success;
750+
if (!load_traj_success_) {
751+
reportFault(
752+
fault_codes::kRobotResetFailed, kSeverityError,
753+
std::string("ResetRobotStateAction [") + name() +
754+
"]: load_trajectory_controller reported success=false");
755+
}
724756
} else {
725757
load_traj_success_ = false;
758+
reportFault(
759+
fault_codes::kRobotResetFailed, kSeverityError,
760+
std::string("ResetRobotStateAction [") + name() +
761+
"]: load_trajectory_controller action aborted or failed");
726762
}
727763
}
728764

@@ -877,6 +913,11 @@ void WaitForInputAction::onHalted()
877913
if (goal_sent_ && !result_received_) {
878914
action_client_->async_cancel_all_goals();
879915
}
916+
// Heal a previously raised wait-timeout: when the subtree is halted
917+
// mid-wait, the timeout condition no longer holds. Without this,
918+
// any earlier `kSignalWaitInputTimeout` would linger in FaultManager
919+
// until the next successful wait.
920+
reportFaultPassed(fault_codes::kSignalWaitInputTimeout);
880921
goal_sent_ = false;
881922
result_received_ = false;
882923
}

0 commit comments

Comments
 (0)