IEP-1771 Deadlock when terminating a stuck OpenOCD/GDB launch sequence#1484
IEP-1771 Deadlock when terminating a stuck OpenOCD/GDB launch sequence#1484sigmaaa wants to merge 4 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR overhauls the OpenOCD/GDB debug launch lifecycle in the IDF Eclipse plugin: port allocation is moved out of command-line construction into a dedicated pre-launch phase; ChangesOpenOCD/GDB Launch Lifecycle Overhaul
Sequence Diagram(s)sequenceDiagram
rect rgba(70, 130, 180, 0.5)
note over AppLvlTracingHandler,Launch: OpenOCD-only launch path
end
AppLvlTracingHandler->>LaunchConfigurationDelegate: runOpenOcdOnlyLaunch(config, mode, monitor)
LaunchConfigurationDelegate->>LaunchConfigurationDelegate: pendingLaunchOptions.set(LaunchOptions.openOcdOnly())
LaunchConfigurationDelegate->>Launch: createGdbLaunch → setDoStartGdbClient(false)
rect rgba(60, 179, 113, 0.5)
note over LaunchConfigurationDelegate,GdbServerBackend: OpenOCD server startup with bounded poll
end
LaunchConfigurationDelegate->>Configuration: allocateServerPorts(workingCopy)
LaunchConfigurationDelegate->>GdbServerBackend: getServerServicesSequence()
GdbServerBackend->>LaunchProcessDictionary: registerBackendProcess("openocd", process)
loop poll ≤ SERVER_STATUS_POLL_TIMEOUT_MS
LaunchConfigurationDelegate->>GdbServerBackend: getServerState()
end
alt timeout
LaunchConfigurationDelegate->>LaunchProcessDictionary: killAllProcessesInLaunch(launch)
LaunchConfigurationDelegate-->>AppLvlTracingHandler: throw CoreException(OPENOCD_STARTUP_TIMEOUT_STATUS)
end
rect rgba(255, 165, 0, 0.5)
note over LaunchConfigurationDelegate,GdbBackend: Full debug launch — GDB startup
end
LaunchConfigurationDelegate->>GdbBackend: getServicesSequence()
GdbBackend->>GdbBackend: daemon thread drains MI error stream
GdbBackend->>LaunchProcessDictionary: registerBackendProcess("gdb", process)
LaunchConfigurationDelegate->>LaunchConfigurationDelegate: completeLaunchQuery.get(COMPLETE_INIT_TIMEOUT_MS)
alt timeout
LaunchConfigurationDelegate->>Launch: cleanupLaunch (async Job)
LaunchConfigurationDelegate-->>LaunchConfigurationDelegate: throw DebugException "Debugger initialization timed out"
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchConfigurationDelegate.java (2)
167-179:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake the GDB-version timeout actually terminate stubborn processes.
The timeout job only calls
process.destroy(). If the spawned command ignores graceful termination, the stdout reader can remain blocked and the 10-second timeout does not boundgetGDBVersion().Proposed fix
`@Override` protected IStatus run(IProgressMonitor arg) { process.destroy(); + try + { + if (!process.waitFor(2, TimeUnit.SECONDS)) + { + process.destroyForcibly(); + } + } + catch (InterruptedException e) + { + Thread.currentThread().interrupt(); + process.destroyForcibly(); + } return Status.OK_STATUS; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchConfigurationDelegate.java` around lines 167 - 179, The timeout job in the anonymous Job class only calls process.destroy() which performs graceful termination that can be ignored by stubborn processes, leaving the stdout reader blocked. In the run method of the timeout job, after calling process.destroy(), also call process.destroyForcibly() to forcefully terminate the process if it does not respond to the graceful termination within a short window, ensuring the 10-second timeout actually bounds the getGDBVersion() operation.
429-470:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep server-status polling bounded and fail closed on executor errors.
The deadline check is bypassed while Line 445 waits on an unbounded
get(). If the DSF executor is stuck, this can still hang indefinitely; if the status query throws, Lines 467-470 only log and continue launching without a verified OpenOCD server.Proposed fix
+import java.util.concurrent.Future; import java.util.concurrent.RejectedExecutionException; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException;- Thread.sleep(SERVER_STATUS_POLL_INTERVAL_MS); - serverStatus = launch.getSession().getExecutor().submit(callable).get(); + long waitMs = Math.min(SERVER_STATUS_POLL_INTERVAL_MS, + Math.max(1, deadline - System.currentTimeMillis())); + Future<IStatus> serverStatusFuture = launch.getSession().getExecutor().submit(callable); + try + { + serverStatus = serverStatusFuture.get(waitMs, TimeUnit.MILLISECONDS); + } + catch (TimeoutException e) + { + serverStatusFuture.cancel(false); + } } if (serverStatus != Status.OK_STATUS) @@ catch (InterruptedException e) { if (monitor.isCanceled()) { cleanupLaunch(launch); return; } - Activator.log(e); + Thread.currentThread().interrupt(); + cleanupLaunch(launch); + throw new DebugException(new Status(IStatus.ERROR, Activator.PLUGIN_ID, + DebugException.INTERNAL_ERROR, "Interrupted", e)); } catch (ExecutionException e) { - Activator.log(e); + cleanupLaunch(launch); + throw new DebugException(new Status(IStatus.ERROR, Activator.PLUGIN_ID, + DebugException.REQUEST_FAILED, "Error reading OpenOCD server status", e.getCause())); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchConfigurationDelegate.java` around lines 429 - 470, The polling loop has two issues: the unbounded get() call on the executor submit bypasses the deadline check and can hang indefinitely, and ExecutionException errors only log without failing the launch. Fix this by using a time-bounded get(timeout, unit) call that respects the remaining deadline calculated from the deadline variable, and change the ExecutionException catch block to call cleanupLaunch and throw a CoreException instead of just logging with Activator.log(e).
🧹 Nitpick comments (3)
bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbConsoleCleanup.java (1)
82-96: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueJob cancellation relies on fragile string matching.
The
cancelGdbCliReadJobs()method identifies jobs by checking if their name contains"GDB CLI". If CDT changes job naming conventions in a future version, this could silently fail to cancel the intended jobs. Consider documenting this coupling or adding a fallback mechanism.That said, this is acceptable for a cleanup utility where missing a job just means slightly delayed cleanup rather than data corruption.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbConsoleCleanup.java` around lines 82 - 96, The cancelGdbCliReadJobs() method relies on fragile string matching to identify jobs by checking if the job name contains the GDB_CLI_JOB_MARKER constant. This creates an undocumented coupling with CDT's job naming conventions that could silently fail if CDT changes their naming. Add a comment above the method or near the string matching logic in cancelGdbCliReadJobs() that documents this dependency on the specific GDB_CLI_JOB_MARKER naming convention used by CDT, or alternatively implement a fallback mechanism to handle cases where the naming convention might change.bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchProcessDictionary.java (1)
386-408: 🧹 Nitpick | 🔵 Trivial | 💤 Low valuePOSIX
kill -9does not terminate child processes like Windowstaskkill /T.On Windows,
taskkill /F /T /PIDterminates the entire process tree. On POSIX,kill -9 <pid>only terminates the specified process, leaving children orphaned. Since the JVM'sProcessHandle.descendants()snapshot anddestroyTree()should have already handled children, this fallback may be less effective on POSIX if any descendants were missed.Consider using
pkill -9 -P <pid>or iterating through/proc/<pid>/task/*/childrenfor true tree termination on Linux, though this may be acceptable if the earlierdestroyTree()calls typically succeed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchProcessDictionary.java` around lines 386 - 408, In the killTreeViaOsCommand method, the POSIX branch uses kill -9 which only terminates the specified process without affecting children, unlike the Windows taskkill /T which kills the entire process tree. Replace the kill -9 command in the else block with pkill -9 -P to kill both the parent process and its direct children, making the behavior consistent with the Windows approach. This will ensure child processes are properly terminated on POSIX systems in the fallback scenario.bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbBackend.java (1)
1008-1010: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueUnusual usage of
Thread.interrupted()- consider clarifying intent.
Thread.interrupted()reads and clears the interrupt status, but the return value is discarded. Since the job is ending anyway and swallowing the interrupt is intentional, either remove the call entirely (empty catch is fine), or replace with a comment explaining the intent.Suggested alternatives
Option A - Empty catch with comment:
catch (InterruptedException ie) { - Thread.interrupted(); + // Intentionally swallowed: job is ending regardless }Option B - Just clear the flag explicitly:
catch (InterruptedException ie) { - Thread.interrupted(); + Thread.interrupted(); // Clear the flag; job is exiting anyway }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbBackend.java` around lines 1008 - 1010, The catch block for InterruptedException in the GdbBackend class is calling Thread.interrupted() but discarding the return value, making the intent unclear. Either add a comment above the Thread.interrupted() call explaining that the interrupt flag is being intentionally cleared before the job ends, or remove the call entirely and add a comment explaining why the interrupt is being swallowed. Choose the approach that best reflects your intent for handling this interrupt.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/Configuration.java`:
- Around line 345-348: The allocateServerPorts method always starts from the
default GDB port when allocating a port, which ignores any user-configured port
preference already stored in the configuration under
IGDBJtagConstants.ATTR_PORT_NUMBER. Before calling
PortChecker.getAvailablePort(), first check if the configuration already has a
port number attribute set; if it does, pass that configured port as the
preferred starting point to getAvailablePort instead of always using
DefaultPreferences.GDB_SERVER_GDB_PORT_NUMBER_DEFAULT. Only use the default port
as a fallback if no prior configuration exists.
In
`@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchConfigurationDelegate.java`:
- Around line 129-137: The issue is that when OpenOCD-only mode is active
(checked via options.isOpenOcdOnly()), the code only sets the launch instance's
doStartGdbClient flag to false, but does not update the working copy
configuration. Since GdbServerBackend.initialize reads the DO_START_GDB_CLIENT
attribute directly from the launch configuration using
Configuration.getDoStartGdbClient(fLaunchConfiguration), the change to the
launch instance alone is insufficient. In the if block where
options.isOpenOcdOnly() is true, after calling
launch.setDoStartGdbClient(false), also set the DO_START_GDB_CLIENT attribute in
the working copy (wc) to false so that GdbServerBackend.initialize will read the
correct value when it later queries the launch configuration.
---
Outside diff comments:
In
`@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchConfigurationDelegate.java`:
- Around line 167-179: The timeout job in the anonymous Job class only calls
process.destroy() which performs graceful termination that can be ignored by
stubborn processes, leaving the stdout reader blocked. In the run method of the
timeout job, after calling process.destroy(), also call
process.destroyForcibly() to forcefully terminate the process if it does not
respond to the graceful termination within a short window, ensuring the
10-second timeout actually bounds the getGDBVersion() operation.
- Around line 429-470: The polling loop has two issues: the unbounded get() call
on the executor submit bypasses the deadline check and can hang indefinitely,
and ExecutionException errors only log without failing the launch. Fix this by
using a time-bounded get(timeout, unit) call that respects the remaining
deadline calculated from the deadline variable, and change the
ExecutionException catch block to call cleanupLaunch and throw a CoreException
instead of just logging with Activator.log(e).
---
Nitpick comments:
In
`@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbBackend.java`:
- Around line 1008-1010: The catch block for InterruptedException in the
GdbBackend class is calling Thread.interrupted() but discarding the return
value, making the intent unclear. Either add a comment above the
Thread.interrupted() call explaining that the interrupt flag is being
intentionally cleared before the job ends, or remove the call entirely and add a
comment explaining why the interrupt is being swallowed. Choose the approach
that best reflects your intent for handling this interrupt.
In
`@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbConsoleCleanup.java`:
- Around line 82-96: The cancelGdbCliReadJobs() method relies on fragile string
matching to identify jobs by checking if the job name contains the
GDB_CLI_JOB_MARKER constant. This creates an undocumented coupling with CDT's
job naming conventions that could silently fail if CDT changes their naming. Add
a comment above the method or near the string matching logic in
cancelGdbCliReadJobs() that documents this dependency on the specific
GDB_CLI_JOB_MARKER naming convention used by CDT, or alternatively implement a
fallback mechanism to handle cases where the naming convention might change.
In
`@bundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchProcessDictionary.java`:
- Around line 386-408: In the killTreeViaOsCommand method, the POSIX branch uses
kill -9 which only terminates the specified process without affecting children,
unlike the Windows taskkill /T which kills the entire process tree. Replace the
kill -9 command in the else block with pkill -9 -P to kill both the parent
process and its direct children, making the behavior consistent with the Windows
approach. This will ensure child processes are properly terminated on POSIX
systems in the fallback scenario.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6bd089f5-4155-4bd1-a999-0136ac8d480c
📒 Files selected for processing (12)
bundles/com.espressif.idf.debug.gdbjtag.openocd/plugin.xmlbundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/Activator.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/Configuration.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbBackend.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbConsoleCleanup.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/GdbServerBackend.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/Launch.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchConfigurationDelegate.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchOptions.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/LaunchProcessDictionary.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/dsf/process/IdfRuntimeProcess.javabundles/com.espressif.idf.debug.gdbjtag.openocd/src/com/espressif/idf/debug/gdbjtag/openocd/ui/AppLvlTracingHandler.java
💤 Files with no reviewable changes (1)
- bundles/com.espressif.idf.debug.gdbjtag.openocd/plugin.xml
Description
Fixes ESP-IDF OpenOCD/GDB debug sessions that hang indefinitely during launch or when pressing Stop — especially when the target is unresponsive or stuck in a reset loop.
Fixes # (IEP-1771)
Type of change
Please delete options that are not relevant.
How has this been tested?
UI: Open Debug Configurations → confirm only one Debugger tab is shown.
Test Configuration:
Dependent components impacted by this PR:
Checklist
Summary by CodeRabbit