Skip to content

Commit a297c0c

Browse files
committed
Make ROS 2 RobotInterface joint-state waiting safe and fresh
wait_until_update_all_joints, update_robot_state, and wait_interpolation in skrobot.interfaces.ros2.base previously had three sharp edges that already had matching fixes on the ROS 1 side: - wait_until_update_all_joints had no timeout and no diagnostic, so it waited forever for any URDF joint that was never published (e.g. gripper finger joints when only an arm controller is up). - update_robot_state did not return a usable bool, so callers could not distinguish no joint_states yet from updated successfully. - wait_interpolation returned as soon as the action result arrived, but the final joint_state_broadcaster sample lands ~one cycle later, so an immediate update_robot_state read a frame from before motion completion. This patch backports the ROS 1 base improvements: - _received_joint_names tracks names actually seen on the joint_states topic so the wait logic only waits for joints being published. - robot_state slots are pre-allocated from robot.joint_list, so late publishers (e.g. a gripper node started after controllers) get their slots ready in advance. - wait_until_update_all_joints accepts a timeout, returns bool, and populates _timeout_reason / _not_updated_joints for diagnosis. - update_robot_state returns True/False matching the ROS 1 contract. - wait_interpolation now waits for a fresh joint_state after the action result by default (wait_for_state_update=True, state_update_timeout=1.0), eliminating the post-motion stale-read race. - ROS2RobotInterfaceBase.__init__ gains opt-in wait_for_joint_states / joint_states_timeout to fail fast on misconfigured topics. 10 unit tests cover the new bookkeeping, timeout behaviour, bool returns, and the constructor wait option. A new ros2-tests job is added to .github/workflows/test.yml that installs ROS 2 humble (Ubuntu 22.04) and jazzy (Ubuntu 24.04) via ros-tooling/setup-ros and runs the new test_base.py with pytest --timeout=60 so that any future regression of the wait_until_update_all_joints hang surfaces as a per-test timeout instead of a stuck CI run.
1 parent 1c8a0cb commit a297c0c

3 files changed

Lines changed: 443 additions & 45 deletions

File tree

.github/workflows/test.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,3 +260,38 @@ jobs:
260260
fi
261261
done
262262
263+
ros2-tests:
264+
name: Run ROS 2 Unit Tests (${{ matrix.ros-distribution }})
265+
runs-on: ${{ matrix.os }}
266+
strategy:
267+
fail-fast: false
268+
matrix:
269+
include:
270+
- ros-distribution: humble
271+
os: ubuntu-22.04
272+
- ros-distribution: jazzy
273+
os: ubuntu-24.04
274+
steps:
275+
- uses: actions/checkout@v4
276+
- name: Setup ROS 2
277+
uses: ros-tooling/setup-ros@v0.7
278+
with:
279+
required-ros-distributions: ${{ matrix.ros-distribution }}
280+
- name: Install Python dependencies
281+
shell: bash
282+
run: |
283+
source /opt/ros/${{ matrix.ros-distribution }}/setup.bash
284+
python -m pip install --upgrade pip setuptools wheel
285+
pip install pytest pytest-timeout
286+
# Install the project itself; '[all]' would pull pybullet etc., but
287+
# the ROS 2 unit tests only need the core skrobot model + URDF code.
288+
pip install -e .
289+
- name: Run ROS 2 unit tests
290+
shell: bash
291+
# --timeout=60 is intentional: regressions in wait_until_update_all_joints
292+
# surface as a per-test timeout rather than a stuck CI run.
293+
run: |
294+
source /opt/ros/${{ matrix.ros-distribution }}/setup.bash
295+
pytest -v --timeout=60 \
296+
tests/skrobot_tests/interfaces_tests/ros2/test_base.py
297+

skrobot/interfaces/ros2/base.py

Lines changed: 182 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,9 @@ def __init__(self, robot,
5454
joint_states_queue_size=1,
5555
controller_timeout=3,
5656
namespace=None,
57-
node_name='robot_interface'):
57+
node_name='robot_interface',
58+
wait_for_joint_states=False,
59+
joint_states_timeout=10.0):
5860
"""Initialization of RobotInterface
5961
6062
Parameters
@@ -73,6 +75,17 @@ def __init__(self, robot,
7375
namespace of controller
7476
node_name : string
7577
name of the ROS2 node
78+
wait_for_joint_states : bool
79+
If True, block in ``__init__`` until at least one ``JointState``
80+
message is received on ``joint_states_topic``. This makes the
81+
instance immediately usable for ``angle_vector`` / ``update_robot_state``
82+
without manual polling. Default is False to preserve backwards
83+
compatibility — set to True when you want the constructor to fail
84+
fast on misconfigured topics. Has no effect when an executor is
85+
not spinning the node externally; in that case rclpy.spin_once
86+
is used to drain pending messages.
87+
joint_states_timeout : float
88+
Timeout in seconds for the ``wait_for_joint_states`` option.
7689
7790
"""
7891
super().__init__(node_name)
@@ -85,11 +98,20 @@ def __init__(self, robot,
8598
self.joint_action_enable = True
8699
self.namespace = namespace
87100
self._joint_state_msg = None
101+
# Joint names actually seen on the joint_states topic. Used to filter
102+
# which joints wait_until_update_all_joints actually waits for, so
103+
# that joints declared in the URDF but never published (e.g. gripper
104+
# finger joints when the gripper node is not running) do not cause
105+
# an indefinite wait.
106+
self._received_joint_names = set()
107+
self._not_updated_joints = []
108+
self._timeout_reason = ""
88109

89110
if self.namespace:
90111
topic_name = f'{self.namespace}/{joint_states_topic}'
91112
else:
92113
topic_name = joint_states_topic
114+
self._joint_states_topic_name = topic_name
93115

94116
self.joint_state_sub = self.create_subscription(
95117
JointState,
@@ -98,6 +120,9 @@ def __init__(self, robot,
98120
joint_states_queue_size
99121
)
100122

123+
if wait_for_joint_states:
124+
self._wait_for_first_joint_state(joint_states_timeout)
125+
101126
self.controller_table = {}
102127
self.controller_param_table = {}
103128
self.controller_type = default_controller
@@ -164,51 +189,116 @@ def _check_time(self, time, fastest_time, time_scale):
164189
'time is invalid type. {}'.format(time))
165190
return time
166191

167-
def wait_until_update_all_joints(self, tgt_tm):
168-
"""Wait until all joints have been updated with timestamps newer than target time.
192+
def _wait_for_first_joint_state(self, timeout):
193+
"""Spin the node until a JointState is received or timeout elapses."""
194+
deadline = time.time() + timeout
195+
while self._joint_state_msg is None and time.time() < deadline:
196+
try:
197+
rclpy.spin_once(self, timeout_sec=0.1)
198+
except Exception:
199+
# If an external executor is already spinning this node,
200+
# spin_once raises. Just yield and let it deliver the message.
201+
time.sleep(0.05)
202+
if self._joint_state_msg is None:
203+
raise TimeoutError(
204+
"No JointState received on '{}' within {:.1f}s. "
205+
"Check that joint_state_publisher / joint_state_broadcaster "
206+
"are running and that the topic name / namespace matches."
207+
.format(self._joint_states_topic_name, timeout))
208+
209+
@staticmethod
210+
def _stamp_nanoseconds(ts):
211+
if ts is None:
212+
return None
213+
if hasattr(ts, 'nanoseconds'):
214+
return ts.nanoseconds
215+
if hasattr(ts, 'sec') and hasattr(ts, 'nanosec'):
216+
return ts.sec * 1_000_000_000 + ts.nanosec
217+
return None
218+
219+
def wait_until_update_all_joints(self, tgt_tm, timeout=3.0):
220+
"""Wait until joints seen on joint_states have timestamps > tgt_tm.
169221
170-
This method handles both rclpy.time.Time objects (with nanoseconds attribute)
171-
and builtin_interfaces.msg.Time objects (with sec and nanosec attributes).
222+
Only joints that have ever been received on the joint_states topic
223+
are considered (see ``_received_joint_names``). Joints declared in
224+
the URDF but never published — e.g. finger joints when the gripper
225+
node is not running — are ignored, so this method does not hang
226+
when a subset of the URDF is being driven.
172227
173228
Parameters
174229
----------
175-
tgt_tm : rclpy.time.Time or bool
176-
Target time to wait for. If True, uses current time.
230+
tgt_tm : rclpy.time.Time, builtin_interfaces.msg.Time, bool, or None
231+
Target time. If True / None, uses the current ROS time (i.e. waits
232+
for any joint_state newer than now).
233+
timeout : float
234+
Maximum seconds to wait. ``0`` or ``None`` waits forever.
235+
236+
Returns
237+
-------
238+
bool
239+
True if all received joints have a stamp newer than ``tgt_tm``
240+
within the timeout, False otherwise. The reason for a False
241+
return is recorded in ``self._timeout_reason``.
177242
"""
178-
if hasattr(tgt_tm, 'nanoseconds'):
179-
initial_time = tgt_tm.nanoseconds
180-
else:
243+
self._not_updated_joints = []
244+
self._timeout_reason = "wait_until_update_all_joints did not complete"
245+
246+
initial_time = self._stamp_nanoseconds(tgt_tm)
247+
if initial_time is None:
181248
initial_time = self.get_clock().now().nanoseconds
182249

250+
start = time.time()
183251
while True:
184-
if 'stamp_list' in self.robot_state:
185-
all_valid = True
186-
for ts in self.robot_state['stamp_list']:
187-
if ts is None:
188-
all_valid = False
189-
break
190-
191-
# Handle rclpy.time.Time objects
192-
if hasattr(ts, 'nanoseconds'):
193-
if ts.nanoseconds <= initial_time:
194-
all_valid = False
195-
break
196-
# Handle builtin_interfaces.msg.Time objects
197-
elif hasattr(ts, 'sec') and hasattr(ts, 'nanosec'):
198-
ts_nano = ts.sec * 1e9 + ts.nanosec
199-
if ts_nano <= initial_time:
200-
all_valid = False
201-
break
202-
else:
203-
# Unknown timestamp type
204-
all_valid = False
205-
break
206-
207-
if all_valid:
208-
return
209-
210-
# Small sleep to avoid busy waiting and allow other threads to run
211-
time.sleep(0.001)
252+
if 'stamp_list' in self.robot_state and self._received_joint_names:
253+
names = self.robot_state['name']
254+
stamps = self.robot_state['stamp_list']
255+
check_results = []
256+
for i, name in enumerate(names):
257+
if name in self._received_joint_names:
258+
ts_ns = self._stamp_nanoseconds(stamps[i])
259+
check_results.append(
260+
ts_ns is not None and ts_ns > initial_time)
261+
if check_results and all(check_results):
262+
self._timeout_reason = ""
263+
return True
264+
265+
if timeout and (time.time() - start) > timeout:
266+
self._set_timeout_reason(initial_time)
267+
logger.warning(
268+
"wait_until_update_all_joints timeout. %s",
269+
self._timeout_reason)
270+
return False
271+
time.sleep(0.005)
272+
273+
def _set_timeout_reason(self, initial_time):
274+
"""Populate self._timeout_reason for diagnostic logging."""
275+
if 'stamp_list' not in self.robot_state \
276+
or 'name' not in self.robot_state:
277+
self._timeout_reason = (
278+
"No joint_states message received. robot_state keys: {}"
279+
.format(list(self.robot_state.keys())))
280+
return
281+
stamps = self.robot_state['stamp_list']
282+
names = self.robot_state['name']
283+
if not names or not stamps:
284+
self._timeout_reason = (
285+
"joint_names or stamp_list is empty. "
286+
"len(joint_names)={}, len(stamp_list)={}"
287+
.format(len(names), len(stamps)))
288+
return
289+
for name, ts in zip(names, stamps):
290+
if name not in self._received_joint_names:
291+
continue
292+
ts_ns = self._stamp_nanoseconds(ts)
293+
if ts_ns is None or ts_ns <= initial_time:
294+
self._not_updated_joints.append(name)
295+
if self._not_updated_joints:
296+
self._timeout_reason = (
297+
"Not updated joints: {}".format(self._not_updated_joints))
298+
else:
299+
self._timeout_reason = (
300+
"All received joints have stamps but none newer than "
301+
"initial_time. Check use_sim_time / clock alignment.")
212302

213303
def set_robot_state(self, key, msg):
214304
self.robot_state[key] = msg
@@ -234,20 +324,29 @@ def set_moving_status(self, key, msg):
234324
self.moving_status[key] = False
235325

236326
def update_robot_state(self, wait_until_update=False):
237-
"""Update robot state.
327+
"""Update the robot model from the latest joint_states.
238328
239329
Parameters
240330
----------
241-
wait_until_update : bool
242-
if True TODO
331+
wait_until_update : bool, rclpy.time.Time, or float
332+
If truthy, wait via :meth:`wait_until_update_all_joints` for a
333+
joint_state newer than this time (or "now" when True) before
334+
applying the update. Useful right after :meth:`wait_interpolation`
335+
to avoid reading state from before motion completion.
243336
244337
Returns
245338
-------
246-
TODO
339+
bool
340+
True if the model was updated, False if no joint_states have been
341+
received yet or wait_until_update timed out. ``self._timeout_reason``
342+
holds a description in the False case.
247343
"""
248344
if wait_until_update:
249-
self.wait_until_update_all_joints(wait_until_update)
345+
if not self.wait_until_update_all_joints(wait_until_update):
346+
return False
250347
if not self.robot_state:
348+
self._timeout_reason = (
349+
"robot_state is empty (no joint_states callback received)")
251350
return False
252351
joint_names = self.robot_state['name']
253352
positions = self.robot_state['position']
@@ -269,13 +368,23 @@ def update_robot_state(self, wait_until_update=False):
269368
joint.joint_angle(position)
270369
joint.joint_velocity = velocity
271370
joint.joint_torque = effort
371+
return True
272372

273373
def joint_state_callback(self, msg):
274374
self._joint_state_msg = msg
375+
# Track every joint name we have ever seen on this topic so that
376+
# wait_until_update_all_joints only waits for joints that are
377+
# actually being published.
378+
self._received_joint_names.update(msg.name)
379+
275380
if 'name' in self.robot_state:
276381
robot_state_names = self.robot_state['name']
277382
else:
278-
robot_state_names = msg.name
383+
# Pre-allocate the state slots from the URDF model, not from the
384+
# first message — joints arriving from a later publisher (e.g. a
385+
# gripper node started after the controllers) need their slot to
386+
# already exist for stamp tracking.
387+
robot_state_names = [j.name for j in self.robot.joint_list]
279388
self.robot_state['name'] = robot_state_names
280389
for key in ['position', 'velocity', 'effort']:
281390
self.robot_state[key] = np.zeros(len(robot_state_names))
@@ -289,6 +398,11 @@ def joint_state_callback(self, msg):
289398
if len(joint_names) == len(joint_data):
290399
data = self.robot_state[key]
291400
for jn in joint_names:
401+
# Skip joints not in the robot model (e.g. when a stray
402+
# publisher sends names that are not part of the URDF).
403+
if jn not in robot_state_names:
404+
index += 1
405+
continue
292406
joint_index = robot_state_names.index(jn)
293407
data[joint_index] = joint_data[index]
294408
index += 1
@@ -664,7 +778,9 @@ def angle_vector_sequence(self,
664778
traj_points)
665779
return avs
666780

667-
def wait_interpolation(self, controller_type=None, timeout=0):
781+
def wait_interpolation(self, controller_type=None, timeout=0,
782+
wait_for_state_update=True,
783+
state_update_timeout=1.0):
668784
"""Wait until last sent motion is finished.
669785
670786
Parameters
@@ -673,6 +789,17 @@ def wait_interpolation(self, controller_type=None, timeout=0):
673789
controller to be wait
674790
timeout : float
675791
max time of for waiting
792+
wait_for_state_update : bool
793+
If True (default), after each action result arrives, also wait
794+
until a joint_state with stamp newer than the result time is
795+
received. This guarantees that an immediate ``update_robot_state``
796+
/ ``self.robot.angle_vector()`` afterwards sees the post-motion
797+
state instead of a frame from before completion. Set to False to
798+
preserve the old "return as soon as the action result arrives"
799+
behaviour.
800+
state_update_timeout : float
801+
Seconds to wait for the post-motion joint_state. Bounded so a
802+
stalled state publisher cannot make wait_interpolation hang.
676803
677804
Returns
678805
-------
@@ -748,6 +875,16 @@ def wait_interpolation(self, controller_type=None, timeout=0):
748875
self.get_logger().error(f"Error getting result: {e}")
749876
results.append(False)
750877

878+
# After every controller has reported its goal result, optionally wait
879+
# for a fresh joint_state so that callers immediately seeing the post-
880+
# motion model state via update_robot_state() do not get a sample from
881+
# before the action result arrived (action result completes ~1 cycle
882+
# before joint_state_broadcaster publishes the final state).
883+
if wait_for_state_update:
884+
now = self.get_clock().now()
885+
self.wait_until_update_all_joints(
886+
now, timeout=state_update_timeout)
887+
751888
return results
752889

753890
def is_interpolating(self, controller_type=None):

0 commit comments

Comments
 (0)