Skip to content

[Testing Needed] Reducing Robot Boot Time#3745

Draft
suchirss wants to merge 3 commits into
UBC-Thunderbots:masterfrom
suchirss:raspberry-pi-boot-time
Draft

[Testing Needed] Reducing Robot Boot Time#3745
suchirss wants to merge 3 commits into
UBC-Thunderbots:masterfrom
suchirss:raspberry-pi-boot-time

Conversation

@suchirss
Copy link
Copy Markdown
Contributor

@suchirss suchirss commented May 28, 2026

Description

An issue we have during competition is that robots take too long to reboot during matches. This results in idle time while we wait for boot.

After investigating why this might be the case, I found two likely bottlenecks:

  1. Raspberry Pi Boot Time:

Context;

  • Raspberry Pi's are linux-based and use systemd as their service manager. systemd configures certain services on boot.
  • we can use .service files to configure custom services that will be managed by systemd
  • We define our own thunderloop.service to run thunderloop_main on the Raspberry Pi.

Problem:

  • thunderloop.service specifies that the thunderloop_main should wait for the network to be online; specifically it is blocked by the systemd-networkd-wait-online.service
  • This results in ~10 seconds of idle time while the Pi waits for the network to connect and be online.
  • This is redundant because thunderloop has its own waitForNetworkUp service.

Solution:

  • My solution: This PR changes thunderloop.service so that it does not require the network to be online. On Pi boot alone, this cuts ~10 seconds. Ideally, this should mean that our robots should be functional ~10 seconds faster on reboot with the changes made here. This is what needs to be tested.
  • Alternate solution: Disabling the systemd-networkd-wait-online.service altogether. This can be baked into the setup_pi.yml ansible file.
  1. Blocking Setup Functions in thunderloop.cpp

*Note: this PR doesn't fix this problem. If boot time is acceptable with the solution to Problem 1 above, this second problem does not need to be addressed prior to robocup.

thunderloop.cpp has setup functions that block one another - this adds to boot time before our robots can step into the main loop. One of these blocking setup functions is waitForNetworkUp, which blocks the thread until the network is up. Ex: the power and motor services are blocked by waitForNetworkUp even though they don't require the network. This adds to our boot time. My suggested fix: use the four cores on the Raspberry Pi to run setup tasks in parallel, or use multithreading so setup services like waitForNetworkUp aren't blocking, or use a combination of both.

Testing Needed

As mentioned above, this PR only addresses the blocking behaviour of thunderloop.service such that thunderloop_main does not wait on systemd-networkd-wait-online.service. On the Raspberry Pi alone, this resulted in a ~10 second time save on startup. Your (the tester's) job will be to check if this time save translates from "time saved on Pi boot" -> "time saved from reboot to when robot is functional".

When I took on this task, we did not have any functional robots. As such, I don't actually know how long it takes from reboot → functional robot.

  • Note: I would define a "functional robot" as when the thunderloop.cpp setup functions are complete and the robot can accept messages and use them to move around. Your definition for what qualifies as a functional robot may depend on what level of working robot we have at this stage.

I would suggest testing in the following incremental steps:

  • Recreate/verify my findings on the Raspberry Pi alone :
  1. Set up the Raspberry Pi with the latest version of our codebase (without the changes made by my PR).
  2. Reboot the Raspberry Pi 5.
  3. Once boot is complete, use the systemd-analyze tool to see how long boot takes. These commands in particular are very useful:
image
  1. Now repeat steps 1-3 but with the changes in this PR and verify that boot time drops.
  • Verify if boot time drops "functionally" on the robot
  1. Set up a Pi without the changes in this PR and hook it up to a working robot.
  2. Reboot the Pi and measure the time it takes for the robot to "function" - you may choose to define what it means for a robot to "function" but please keep this definition consistent and documented.
  3. Repeat steps 1-2 but with my changes to this PR. Assess the difference in the time from reboot -> functioning robot.
  • If boot time does drop with this PR, assess if the changes made are sufficient for competition.
  • What is an ideal boot time during competition?
  • Does the current reboot behaviour meet our needs during competition?
  • Would our fleet benefit from even further reduced boot time? As mentioned in the description of this PR, there are other solutions that can tried. However, if this PR reduces the boot time "enough" - these other solutions can marked as non-urgent and looked into after RoboCup.

Resolved Issues

Length Justification and Key Files to Review

Review Checklist

It is the reviewers responsibility to also make sure every item here has been covered

  • Function & Class comments: All function definitions (usually in the .h file) should have a javadoc style comment at the start of them. For examples, see the functions defined in thunderbots/software/geom. Similarly, all classes should have an associated Javadoc comment explaining the purpose of the class.
  • Remove all commented out code
  • Remove extra print statements: for example, those just used for testing
  • Resolve all TODO's: All TODO (or similar) statements should either be completed or associated with a github issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant