Skip to content

Optionally keep stopped threads stopped on attaching#548

Open
cwalther wants to merge 8 commits into
eclipse-cdt-cloud:mainfrom
indel-ag:keepstopped
Open

Optionally keep stopped threads stopped on attaching#548
cwalther wants to merge 8 commits into
eclipse-cdt-cloud:mainfrom
indel-ag:keepstopped

Conversation

@cwalther

@cwalther cwalther commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

As discussed in #528:

Currently, when attaching to an inferior that already has some stopped threads, these threads are resumed by an indiscriminate continue command in the final continueIfNeeded call at the end of the configuration phase. This is a problem for us: our systems can have breakpoints even when no debugger is attached, and when a thread stops on such a breakpoint, we want to be able to attach a debugger and analyze the thread in this stopped state, and not have it continued. It’s also inconsistent: continueIfNeeded is supposed to be a counterpart to pauseIfNeeded, and pausing was not needed in this case because there was a paused thread already, so there shouldn’t be a matching continue either.

Commit 1

The first of the attached commits fixes that by inserting the appropriate if (this.waitPausedNeeded). Because existing clients, and that in fact includes many of the tests, may expect the previous behavior, gate it behind an option in the launch/attach arguments: The new, optional property "run" can have values "always" or "preserve" (and potentially more in the future). always is the default and corresponds to the previous behavoir: always let all threads run. preserve is the new behavior: leave the running state of threads unchanged. Alternative suggestions for names of option and values are welcome.

This is not sufficient however to fully handle this situation: Because async: stopped notifications from GDB are not relayed to the client as DAP stopped events until this.isInitialized, which happens at the beginning of the configuration phase after the initialized event, and in this case the stopped notification from GDB comes earlier than that, the client would not know about the stopped thread and would display all threads as running. Even manually clicking Pause on such a thread does not help, as that does not cause GDB to send another stopped notification because to it the thread is stopped already. Therefore, any stopped notifications from GDB that arrive during this time of suppression must not be simply discarded, but must be stored and delivered deferredly when the suppression ends. running notifications can still be discarded because running is the default state of a thread, but one that comes after a stopped notification for the same thread can cancel the latter. This is the rest of the first commit.

Moreover, in a situation that I am currently working on, this change is not just nice to have, but things would not work at all without it. The situation is that the stub is not attached to an inferior yet at the time GDB connects to it, but I only issue a -target-attach 1 & command just before the end of the configuration phase, after receiving breakpoints. This is possible when using the extended-remote target in GDB. (I can go into details separately on why I am doing this, if anyone is interested.) In this case, there are no threads yet at the beginning of the configuration phase, so pausing is not needed, but they arrive before the end of the configuration phase and may all be running at that time. An indiscriminate continue in that case results in an error “Cannot execute this command while the selected thread is running.” from GDB that takes down the whole debug session.

The first commit makes some tests in the remote gdb-non-stop scenario fail with “AssertionError [ERR_ASSERTION]: '0' == 'breakpoint'”. That is not due to a problem with the code under test, but due to a problem with DebugClient.hitBreakpoint() from @vscode/debugadapter-testsupport. hitBreakpoint assumes that the very first stopped event comes from its breakpoint. That is not a reasonable assumption to make, and in fact is necessarily false in the situation that this commit seeks to enable, keeping stopped threads stopped. The only reason why it held so far is that we erroneously discarded early stop notifications before.

However, the problematic behavior of hitBreakpoint will be sidestepped as a side effect of the second commit, therefore I decided not to try to fix it here.

Commit 2

The second commit is a further adaptation to this special situation of late attaching, but I think it does not hurt in the general case, so I am proposing it along with the first one: Unlike in the normal early-attach case (remote target), in the late-attach situation (extended-remote target), existing threads (a potentially large number of them) arrive from GDB via async: running notifications after this.isInitialized is set, which therefore are no longer suppressed from going to the client as continued events. The client (at least VS Code) responds to each of them with a threads request, and if you have 110 threads as I do in my current example system, that results in a lot of completely unnecessary XML transmission over the remote connection and takes a long time. (I am guessing this is the reason why the suppression was introduced in the first place.) I therefore propose to defer continued/stopped events not just to the beginning, but to the end of the configuration phase. This is what the second commit does. Does everyone agree that this does not cause any problems? The suppression could be extended even further without any harm, it does not cause us to miss anything except that the client only notices stopped/stopping threads once it is lifted, so there is no need to lift it until the user is ready for that.

This leaves the isInitialized variable unused, but since it is protected and may already be used by third-party subclasses, I am leaving it in.

As a side effect, because the longer suppression lets the continued event caused by the continue command from "run": "always" cancel the deferred early stopped event that would confuse hitBreakpoint, this also avoids the test failures from the first commit. From here on, all tests should pass after every commit.

Commit 3

A launch request with "run": "preserve" leaves GDB in the state where it has not started the program yet. To start the program, GDB needs a run command. So far, there has been no way for the user to explicitly send that command through DAP, other than typing it in the debug console, it was only sent automatically in the "run": "always" (default) case. Therefore, in the preserve case, let the first DAP continue request send run instead of continue to GDB (the latter would not be accepted in this state anyway).

Commits 4–6

Since all the existing tests run without a run option and therefore in the always mode, add some tests for explicit run options and the preserve mode in commit 6. To check whether threads are in the desired running/stopped state, these tests send the -thread-info command to GDB, which is much simpler than piecing together this information from DAP responses and events. Making this possible required some further modifications in commits 4 and 5:

Closes #528.

@cwalther

cwalther commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Force push to fix tests on Windows (skip tests that require async with GDB versions that don’t support it).

It also occurs to me that I haven’t added any tests for commit 3 (send run instead of continue) yet. Will try that in an additional commit. This doesn’t affect the rest.

@cwalther

cwalther commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Tests for commit 3 added. No more changes planned for now, ready for review.

@jreineckearm

Copy link
Copy Markdown
Contributor

Had a first look at the implementation and the direction looks good. Thanks!

Would like to test a couple of things and how this could map to some of our connection strategies before approval.

Regarding elevating the first DAP continue request to a GDB run command: would this impact all threads given it doesn't pass a thread ID? Or is behavior GDB server specific? (Note that I haven't checked the spec yet)

Name run looks sensible, given we are explicitly talking about an initial GDB run command. And something like runAfterConnect might become misleading (and also involves the overloaded term connect).

Delaying stopped events makes sense on first glance. But needs some extra manual testing for regressions.

Will let you know when I had some time for testing.

@cwalther

cwalther commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Regarding elevating the first DAP continue request to a GDB run command: would this impact all threads given it doesn't pass a thread ID? Or is behavior GDB server specific? (Note that I haven't checked the spec yet)

There are no threads yet at that point, the program is not running yet. I don’t think -exec-run looks at --thread at all, at least the docs don’t mention it (but then they don’t for -exec-continue either).

@jreineckearm

jreineckearm commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Did some initial testing: always mode seems to operate like before, couldn't spot any regressions yet.

For preserved, stopped events before the end of the configuration phase are not captured and sent later. I am using an all-stop target (single thread after connect) and I see the expected GDB notifications coming. But something's not right with the this.isRunning logic at the end of the handler.

Got interrupted while investigating and calling a day soon. Will have another look unless something comes to mind on your end.

@cwalther

cwalther commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

By “not captured and sent later”, do you mean that they are sent immediately, or not sent at all? Could be that I messed up all-stop mode, as I only use non-stop (tests should cover it, but maybe not thoroughly enough). I will take a look (results unlikely before Monday).

@cwalther

Copy link
Copy Markdown
Contributor Author

Okay, I can see in test-logs/attach remote/remote/can attach without continuing stopped threads.log that it actually reproduces the problem. No stopped event is sent to the client during the handling of configurationDone, unlike in the non-stop case. Maybe I should check for DAP events in the tests after all and not just rely on -thread-info.

@jreineckearm

jreineckearm commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

I see you have reproduced it....

                if (
                    !suppressHandleGDBStopped &&
                    (this.gdb.isNonStopMode() ||
                        (wasRunning && !this.isRunning))
                ) {

This condition at the end of handleGDBAsync() becomes false. I suspect because this.isRunning is still at it's default value of false before evaluating the first event. Probably need to make it tristate (boolean|undefined) to understand if we are looking at an uninitialized value or a known false state. Should come with minimal impact to existing logic, only the need to explicitly check for undefined in places like here.

No need to rush, just wanted to catch up with PRs.

@cwalther

Copy link
Copy Markdown
Contributor Author

Thinking out loud, I am wondering what the purpose of the condition wasRunning && !this.isRunning is anyway. In what circumstances, other than this initialization situation, can it even be false, and why would we want to suppress the event in that case? Don’t bother trying to answer that, I will probably figure it out on my own. But I’m off now.

@jreineckearm

Copy link
Copy Markdown
Contributor

Got a theory but would need to confirm before sharing. ;-)
Enjoy a long weekend!

@cwalther

Copy link
Copy Markdown
Contributor Author

Note: I am still working on this. My first attempt fixed all-stop but broke sync mode.

@cwalther

Copy link
Copy Markdown
Contributor Author

Regarding my above question: I guess the reasoning behind the wasRunning && !this.isRunning condition was to avoid unnecessary invalidation of object references (handles) as done in sendStoppedEvent(). (Maybe @jonahgraham who wrote it in 50e38d3 has any thoughts.) I am still having a hard time though thinking of situations where wasRunning would be false (!this.isRunning is necessarily true), other than the situation in question, the transition from not-running-because-there-are-no-threads to not-running-because-there-are-stopped-threads, where we do need the event. Possibly something involving the late out-of-band errors from #415, the only place where isRunning is written outside of updateIsRunning.

Anyway, I hesitate to consider this reason enough to throw it out entirely, so here is a tri-state solution, inserted as an additional commit before the others to make sure it doesn’t break anything by itself. Let’s call it “Commit 0” so I don’t have to renumber all the others.

Also fix an issue with canceling deferred events when thread-id=all (occurs in all-stop) (Commit 1), add some DAP event checks to the new tests that would have caught the all-stop breakage (Commits 67), and rebase on main. (See git range-diff 204bc01...1725267.)

I also have another commit on top of Commit 0 (“Commit 0.5”) that is not included here, and I am uncertain whether to include it. (They were not developed in that order, it was a required part in an earlier attempt.) Any opinions would be appreciated. On the one hand, I think it makes things slightly more correct in terms of deciding whether to pause before setting breakpoints. On the other hand, it makes things more complicated, increasing the likelihood of introducing new bugs, and it cannot completely solve the fundamental problem that isRunning is generally a snapshot from the past, and we may always make wrong decisions based on it in race conditions where things have already changed on the target but we have not received or processed the events yet that tell us about it. I tend toward leaving it out as long as we don’t have a demonstrable case where it fixes a practical problem.

@jreineckearm

Copy link
Copy Markdown
Contributor

Thanks for the updates, @cwalther ! Will have another look as soon as I can.

Regarding the reason for this: Like you, I strongly suspect that this is to suppress unnecessary window updates in the IDE and the related chain of scopes, threads, stack, variables, etc requests. They can be quite expensive already on desktop with a debugger attached via USB. And I can imagine this is getting worse in a browser environment and/or a genuine remote connection of the debugged HW.

@cwalther

Copy link
Copy Markdown
Contributor Author

Thanks, take your time. I will be away again Friday through Monday. Will probably attend the meeting on Monday but will not be in the office. I’m sorry to unleash such a monster on you, it keeps getting bigger, but since you said you are interested in the functionality as well I am hoping it will benefit all of us.

@jreineckearm jreineckearm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another code review. Looks good.
Another test follows later today or tomorrow morning. (Sorry, I know I said I'd look at this earlier this week).

Comment thread src/gdb/GDBDebugSessionBase.ts Outdated
let haveStopped = false;
let haveRunning = false;
let haveUnknown = false;
for (const thread of this.threads) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Probably equally fine to have three some statements:. But means multiple iterations over the same array. So, no real need to bother with this comment (prettier would probably make them multi-line statements anyway).

  const haveStopped = this.threads.some(thread => thread === false);
  const haveRunning = this.threads.some(thread => thread === true);
  const haveUnknown = this.threads.some(thread => thread === undefined);

@cwalther cwalther Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I like that, it’s more concise. Also, the variables can then be inlined, so that some of the multiple iterations may be skipped. prettier does not mangle it.

        const newIsRunning = this.threads.some((t) => t.running === false) // have stopped
            ? false
            : this.threads.some((t) => t.running === undefined) // have unknown
              ? undefined
              : this.threads.some((t) => t.running === true); // have running

Will add that next time I rebase.

@jreineckearm

Copy link
Copy Markdown
Contributor

I'd like to look at https://github.com/indel-ag/cdt-gdb-adapter/tree/awaitrunningconfirmation separately. Main concern is introducing regressions due to changed timings as you stated yourself.

cwalther added 8 commits June 25, 2026 17:33
When a thread first arrives via a `thread-created` notification, we do not know yet whether it is running or not, that information only comes with a `running`/`stopped` notification shortly afterward. Previously, the "unknown" state was treated the same as "stopped". However, we need to distinguish between "stopped" and "unknown" because when the initial state of the first thread arrives and is stopped, we need to tell that to the client using a `stopped` DAP event (the initial state of a thread in DAP is running), even though the overall running state has not changed (it transitioned from not-running-because-there-are-no-threads to not-running-because-there-are-stopped-threads).

The situation where this matters does not currently occur because stopped threads are later continued anyway, but it will starting from the following commit.
Previously, when attaching to an inferior that already has some stopped threads, these threads were resumed by an indiscriminate continue command in the final `continueIfNeeded` call at the end of the configuration phase. This is often undesired and also inconsistent: continueIfNeeded is supposed to be a counterpart to pauseIfNeeded, and pausing was not needed in this case because there was a paused thread already, so there should not be a matching continue either.

Fix this by inserting the appropriate `if (this.waitPausedNeeded)`. Because existing clients, and that in fact includes many of the tests, may expect the previous behavior, gate it behind an option in the launch/attach arguments: The new, optional property "run" can have values "always" or "preserve" (and potentially more in the future). "always" is the default and corresponds to the previous behavoir: always let all threads run. "preserve" is the new behavior: leave the running state of threads unchanged.

Furthermore, `stopped` notifications from GDB for such threads that arrive early on and are suppressed from going to the client must not be simply discarded, but must be stored and delivered deferredly when the suppression ends, otherwise the client does not know about the stopped threads and displays all threads as running. `running` notifications can still be discarded because running is the default state of a thread, but one that comes after a `stopped` notification for the same thread can cancel the latter.

This commit makes some tests in the "remote gdb-non-stop" scenario fail with "AssertionError [ERR_ASSERTION]: '0' == 'breakpoint'". That is not due to a problem with the code under test, but due to a problem with `DebugClient.hitBreakpoint()` from @vscode/debugadapter-testsupport. `hitBreakpoint` assumes that the very first `stopped` event comes from its breakpoint. That is not a reasonable assumption to make, and in fact is necessarily false in the situation that this commit seeks to enable, keeping stopped threads stopped. The only reason why it held so far is that we erroneously discarded early stop notifications before.

However, the problematic behavior of `hitBreakpoint` will be sidestepped as a side effect of the following commit, therefore it is not fixed here.
When attaching, the initial `running` notifications from GDB about existing threads are suppressed from going to the client as `continued` events, these are unnecessary because running is the default state of a thread, and would result in a barrage of `threads` requests from the client. Previously, this suppression was upheld until the beginning of the configuration phase, marked by the `initialized` event. In some scenarios involving GDB’s `extended-remote` target, this is too early. Extend the suppression until the end of the configuration phase, shortly before the `configurationDone` response.

This leaves the `isInitialized` variable unused, but since it is `protected` and may already be used by third-party subclasses, I am leaving it in.

Moving the setting of configuringState = DONE after sendExecContinue is needed to fix the test failures from the previous commit, because it allows the `continued` event caused by this to cancel the problematic `stopped` event in deferredStopEvents. This does not disturb the working of the configuringState state machine.
A 'launch' request with the new option "run": "preserve" leaves GDB in the state where it has not started the program yet. To start the program, GDB needs a 'run' command. So far, there has been no way for the user to explicitly send that command through DAP, other than typing it in the debug console, it was only sent automatically in the "run": "always" (default) case. Therefore, in the "preserve" case, let the first DAP 'continue' request send 'run' instead of 'continue' to GDB (the latter would not be accepted in this state anyway).
…rget" debug type

eclipse-cdt-cloud#414 allowed sending GDB commands by evaluating expressions starting with ">" even when there is no stack frame context (many commands don't need one, and for those that do, GDB can report that itself, no need for the adapter to preempt it). However, it only did that for GDBTargetDebugSession (remote, debug type "gdbtarget"), not for GDBDebugSessionBase (local, debug type "gdb"), for no apparent reason. In the latter case, such requests still fail with "Evaluation of expression without frameId is not supported". This is an annoyance and gets in the way of a test I am about to add, so change it as well and treat both cases the same.

This makes the `alwaysAllowCliCommand` argument to `doEvaluateRequest` redundant (always `true`) inside this project, but because the method is protected and may already be used by third-party subclasses, I am not removing it.
When sending a GDB MI command (one that starts with '-') by evaluating it as an expression prefixed by '>', return the MI response as the result of the evaluation, instead of discarding it and returning a rather useless '\r'. GDB CLI commands (those not starting with '-') still return '\r' as they do not have a return value (their output usually comes on stdout).

The evaluation result must be a string, but the MI response has already been parsed into a JS object at that point, so serialize that as JSON. That is different from the original response string sent by GDB, but probably more convenient for programmatic consumers anyway.

This is useful for getting information from GDB in tests (such as those in the following commit), but may also occasionally be useful when working interactively in the debug console. I do not expect existing callers to be disturbed by the change, as they have probably ignored the constant '\r' anyway.
@cwalther

Copy link
Copy Markdown
Contributor Author

Version 4 (git range-diff 1725267...59194fe):

  • Streamline updateIsRunning as per review comment (no functional change).
  • Rebase on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to keep stopped threads stopped on attaching?

2 participants