Optionally keep stopped threads stopped on attaching#548
Conversation
|
Force push to fix tests on Windows (skip tests that require async with GDB versions that don’t support it). It also occurs to me that I haven’t added any tests for commit 3 (send run instead of continue) yet. Will try that in an additional commit. This doesn’t affect the rest. |
|
Tests for commit 3 added. No more changes planned for now, ready for review. |
|
Had a first look at the implementation and the direction looks good. Thanks! Would like to test a couple of things and how this could map to some of our connection strategies before approval. Regarding elevating the first DAP Name Delaying stopped events makes sense on first glance. But needs some extra manual testing for regressions. Will let you know when I had some time for testing. |
There are no threads yet at that point, the program is not running yet. I don’t think |
|
Did some initial testing: For Got interrupted while investigating and calling a day soon. Will have another look unless something comes to mind on your end. |
|
By “not captured and sent later”, do you mean that they are sent immediately, or not sent at all? Could be that I messed up all-stop mode, as I only use non-stop (tests should cover it, but maybe not thoroughly enough). I will take a look (results unlikely before Monday). |
|
Okay, I can see in test-logs/attach remote/remote/can attach without continuing stopped threads.log that it actually reproduces the problem. No |
|
I see you have reproduced it.... This condition at the end of No need to rush, just wanted to catch up with PRs. |
|
Thinking out loud, I am wondering what the purpose of the condition |
|
Got a theory but would need to confirm before sharing. ;-) |
|
Note: I am still working on this. My first attempt fixed all-stop but broke sync mode. |
|
Regarding my above question: I guess the reasoning behind the Anyway, I hesitate to consider this reason enough to throw it out entirely, so here is a tri-state solution, inserted as an additional commit before the others to make sure it doesn’t break anything by itself. Let’s call it “Commit 0” so I don’t have to renumber all the others. Also fix an issue with canceling deferred events when thread-id=all (occurs in all-stop) (Commit 1), add some DAP event checks to the new tests that would have caught the all-stop breakage (Commits 6–7), and rebase on main. (See I also have another commit on top of Commit 0 (“Commit 0.5”) that is not included here, and I am uncertain whether to include it. (They were not developed in that order, it was a required part in an earlier attempt.) Any opinions would be appreciated. On the one hand, I think it makes things slightly more correct in terms of deciding whether to pause before setting breakpoints. On the other hand, it makes things more complicated, increasing the likelihood of introducing new bugs, and it cannot completely solve the fundamental problem that |
|
Thanks for the updates, @cwalther ! Will have another look as soon as I can. Regarding the reason for this: Like you, I strongly suspect that this is to suppress unnecessary window updates in the IDE and the related chain of scopes, threads, stack, variables, etc requests. They can be quite expensive already on desktop with a debugger attached via USB. And I can imagine this is getting worse in a browser environment and/or a genuine remote connection of the debugged HW. |
|
Thanks, take your time. I will be away again Friday through Monday. Will probably attend the meeting on Monday but will not be in the office. I’m sorry to unleash such a monster on you, it keeps getting bigger, but since you said you are interested in the functionality as well I am hoping it will benefit all of us. |
jreineckearm
left a comment
There was a problem hiding this comment.
Another code review. Looks good.
Another test follows later today or tomorrow morning. (Sorry, I know I said I'd look at this earlier this week).
| let haveStopped = false; | ||
| let haveRunning = false; | ||
| let haveUnknown = false; | ||
| for (const thread of this.threads) { |
There was a problem hiding this comment.
Nit: Probably equally fine to have three some statements:. But means multiple iterations over the same array. So, no real need to bother with this comment (prettier would probably make them multi-line statements anyway).
const haveStopped = this.threads.some(thread => thread === false);
const haveRunning = this.threads.some(thread => thread === true);
const haveUnknown = this.threads.some(thread => thread === undefined);
There was a problem hiding this comment.
Good point, I like that, it’s more concise. Also, the variables can then be inlined, so that some of the multiple iterations may be skipped. prettier does not mangle it.
const newIsRunning = this.threads.some((t) => t.running === false) // have stopped
? false
: this.threads.some((t) => t.running === undefined) // have unknown
? undefined
: this.threads.some((t) => t.running === true); // have runningWill add that next time I rebase.
|
I'd like to look at https://github.com/indel-ag/cdt-gdb-adapter/tree/awaitrunningconfirmation separately. Main concern is introducing regressions due to changed timings as you stated yourself. |
When a thread first arrives via a `thread-created` notification, we do not know yet whether it is running or not, that information only comes with a `running`/`stopped` notification shortly afterward. Previously, the "unknown" state was treated the same as "stopped". However, we need to distinguish between "stopped" and "unknown" because when the initial state of the first thread arrives and is stopped, we need to tell that to the client using a `stopped` DAP event (the initial state of a thread in DAP is running), even though the overall running state has not changed (it transitioned from not-running-because-there-are-no-threads to not-running-because-there-are-stopped-threads). The situation where this matters does not currently occur because stopped threads are later continued anyway, but it will starting from the following commit.
Previously, when attaching to an inferior that already has some stopped threads, these threads were resumed by an indiscriminate continue command in the final `continueIfNeeded` call at the end of the configuration phase. This is often undesired and also inconsistent: continueIfNeeded is supposed to be a counterpart to pauseIfNeeded, and pausing was not needed in this case because there was a paused thread already, so there should not be a matching continue either. Fix this by inserting the appropriate `if (this.waitPausedNeeded)`. Because existing clients, and that in fact includes many of the tests, may expect the previous behavior, gate it behind an option in the launch/attach arguments: The new, optional property "run" can have values "always" or "preserve" (and potentially more in the future). "always" is the default and corresponds to the previous behavoir: always let all threads run. "preserve" is the new behavior: leave the running state of threads unchanged. Furthermore, `stopped` notifications from GDB for such threads that arrive early on and are suppressed from going to the client must not be simply discarded, but must be stored and delivered deferredly when the suppression ends, otherwise the client does not know about the stopped threads and displays all threads as running. `running` notifications can still be discarded because running is the default state of a thread, but one that comes after a `stopped` notification for the same thread can cancel the latter. This commit makes some tests in the "remote gdb-non-stop" scenario fail with "AssertionError [ERR_ASSERTION]: '0' == 'breakpoint'". That is not due to a problem with the code under test, but due to a problem with `DebugClient.hitBreakpoint()` from @vscode/debugadapter-testsupport. `hitBreakpoint` assumes that the very first `stopped` event comes from its breakpoint. That is not a reasonable assumption to make, and in fact is necessarily false in the situation that this commit seeks to enable, keeping stopped threads stopped. The only reason why it held so far is that we erroneously discarded early stop notifications before. However, the problematic behavior of `hitBreakpoint` will be sidestepped as a side effect of the following commit, therefore it is not fixed here.
When attaching, the initial `running` notifications from GDB about existing threads are suppressed from going to the client as `continued` events, these are unnecessary because running is the default state of a thread, and would result in a barrage of `threads` requests from the client. Previously, this suppression was upheld until the beginning of the configuration phase, marked by the `initialized` event. In some scenarios involving GDB’s `extended-remote` target, this is too early. Extend the suppression until the end of the configuration phase, shortly before the `configurationDone` response. This leaves the `isInitialized` variable unused, but since it is `protected` and may already be used by third-party subclasses, I am leaving it in. Moving the setting of configuringState = DONE after sendExecContinue is needed to fix the test failures from the previous commit, because it allows the `continued` event caused by this to cancel the problematic `stopped` event in deferredStopEvents. This does not disturb the working of the configuringState state machine.
A 'launch' request with the new option "run": "preserve" leaves GDB in the state where it has not started the program yet. To start the program, GDB needs a 'run' command. So far, there has been no way for the user to explicitly send that command through DAP, other than typing it in the debug console, it was only sent automatically in the "run": "always" (default) case. Therefore, in the "preserve" case, let the first DAP 'continue' request send 'run' instead of 'continue' to GDB (the latter would not be accepted in this state anyway).
…rget" debug type eclipse-cdt-cloud#414 allowed sending GDB commands by evaluating expressions starting with ">" even when there is no stack frame context (many commands don't need one, and for those that do, GDB can report that itself, no need for the adapter to preempt it). However, it only did that for GDBTargetDebugSession (remote, debug type "gdbtarget"), not for GDBDebugSessionBase (local, debug type "gdb"), for no apparent reason. In the latter case, such requests still fail with "Evaluation of expression without frameId is not supported". This is an annoyance and gets in the way of a test I am about to add, so change it as well and treat both cases the same. This makes the `alwaysAllowCliCommand` argument to `doEvaluateRequest` redundant (always `true`) inside this project, but because the method is protected and may already be used by third-party subclasses, I am not removing it.
When sending a GDB MI command (one that starts with '-') by evaluating it as an expression prefixed by '>', return the MI response as the result of the evaluation, instead of discarding it and returning a rather useless '\r'. GDB CLI commands (those not starting with '-') still return '\r' as they do not have a return value (their output usually comes on stdout). The evaluation result must be a string, but the MI response has already been parsed into a JS object at that point, so serialize that as JSON. That is different from the original response string sent by GDB, but probably more convenient for programmatic consumers anyway. This is useful for getting information from GDB in tests (such as those in the following commit), but may also occasionally be useful when working interactively in the debug console. I do not expect existing callers to be disturbed by the change, as they have probably ignored the constant '\r' anyway.
|
Version 4 (
|
As discussed in #528:
Currently, when attaching to an inferior that already has some stopped threads, these threads are resumed by an indiscriminate continue command in the final
continueIfNeededcall at the end of the configuration phase. This is a problem for us: our systems can have breakpoints even when no debugger is attached, and when a thread stops on such a breakpoint, we want to be able to attach a debugger and analyze the thread in this stopped state, and not have it continued. It’s also inconsistent: continueIfNeeded is supposed to be a counterpart to pauseIfNeeded, and pausing was not needed in this case because there was a paused thread already, so there shouldn’t be a matching continue either.Commit 1
The first of the attached commits fixes that by inserting the appropriate
if (this.waitPausedNeeded). Because existing clients, and that in fact includes many of the tests, may expect the previous behavior, gate it behind an option in the launch/attach arguments: The new, optional property"run"can have values"always"or"preserve"(and potentially more in the future).alwaysis the default and corresponds to the previous behavoir: always let all threads run.preserveis the new behavior: leave the running state of threads unchanged. Alternative suggestions for names of option and values are welcome.This is not sufficient however to fully handle this situation: Because
async: stoppednotifications from GDB are not relayed to the client as DAPstoppedevents untilthis.isInitialized, which happens at the beginning of the configuration phase after theinitializedevent, and in this case thestoppednotification from GDB comes earlier than that, the client would not know about the stopped thread and would display all threads as running. Even manually clicking Pause on such a thread does not help, as that does not cause GDB to send anotherstoppednotification because to it the thread is stopped already. Therefore, anystoppednotifications from GDB that arrive during this time of suppression must not be simply discarded, but must be stored and delivered deferredly when the suppression ends.runningnotifications can still be discarded because running is the default state of a thread, but one that comes after astoppednotification for the same thread can cancel the latter. This is the rest of the first commit.Moreover, in a situation that I am currently working on, this change is not just nice to have, but things would not work at all without it. The situation is that the stub is not attached to an inferior yet at the time GDB connects to it, but I only issue a
-target-attach 1 &command just before the end of the configuration phase, after receiving breakpoints. This is possible when using theextended-remotetarget in GDB. (I can go into details separately on why I am doing this, if anyone is interested.) In this case, there are no threads yet at the beginning of the configuration phase, so pausing is not needed, but they arrive before the end of the configuration phase and may all be running at that time. An indiscriminate continue in that case results in an error “Cannot execute this command while the selected thread is running.” from GDB that takes down the whole debug session.The first commit makes some tests in the remote gdb-non-stop scenario fail with “AssertionError [ERR_ASSERTION]: '0' == 'breakpoint'”. That is not due to a problem with the code under test, but due to a problem with
DebugClient.hitBreakpoint()from @vscode/debugadapter-testsupport.hitBreakpointassumes that the very firststoppedevent comes from its breakpoint. That is not a reasonable assumption to make, and in fact is necessarily false in the situation that this commit seeks to enable, keeping stopped threads stopped. The only reason why it held so far is that we erroneously discarded early stop notifications before.However, the problematic behavior of
hitBreakpointwill be sidestepped as a side effect of the second commit, therefore I decided not to try to fix it here.Commit 2
The second commit is a further adaptation to this special situation of late attaching, but I think it does not hurt in the general case, so I am proposing it along with the first one: Unlike in the normal early-attach case (
remotetarget), in the late-attach situation (extended-remotetarget), existing threads (a potentially large number of them) arrive from GDB viaasync: runningnotifications afterthis.isInitializedis set, which therefore are no longer suppressed from going to the client ascontinuedevents. The client (at least VS Code) responds to each of them with athreadsrequest, and if you have 110 threads as I do in my current example system, that results in a lot of completely unnecessary XML transmission over the remote connection and takes a long time. (I am guessing this is the reason why the suppression was introduced in the first place.) I therefore propose to defer continued/stopped events not just to the beginning, but to the end of the configuration phase. This is what the second commit does. Does everyone agree that this does not cause any problems? The suppression could be extended even further without any harm, it does not cause us to miss anything except that the client only notices stopped/stopping threads once it is lifted, so there is no need to lift it until the user is ready for that.This leaves the
isInitializedvariable unused, but since it isprotectedand may already be used by third-party subclasses, I am leaving it in.As a side effect, because the longer suppression lets the
continuedevent caused by the continue command from"run": "always"cancel the deferred earlystoppedevent that would confusehitBreakpoint, this also avoids the test failures from the first commit. From here on, all tests should pass after every commit.Commit 3
A
launchrequest with"run": "preserve"leaves GDB in the state where it has not started the program yet. To start the program, GDB needs a run command. So far, there has been no way for the user to explicitly send that command through DAP, other than typing it in the debug console, it was only sent automatically in the"run": "always"(default) case. Therefore, in thepreservecase, let the first DAPcontinuerequest send run instead of continue to GDB (the latter would not be accepted in this state anyway).Commits 4–6
Since all the existing tests run without a
runoption and therefore in thealwaysmode, add some tests for explicitrunoptions and thepreservemode in commit 6. To check whether threads are in the desired running/stopped state, these tests send the-thread-infocommand to GDB, which is much simpler than piecing together this information from DAP responses and events. Making this possible required some further modifications in commits 4 and 5:gdb, not justgdbtargetdebug type – see also Allow sending commands to gdb in TargetDebug mode while target is running #414 and add commands to change gdb session radices cdt-gdb-vscode#202 (review) (middle).Closes #528.