fix(node): keep node running when a service exits without error by Ehsan-saradar · Pull Request #3770 · NethermindEth/juno

Ehsan-saradar · 2026-06-28T07:02:47Z

Closes #3685.

StartService deferred cancel() on every service return, so a service that finished its work and returned nil would tear the whole node down. That forced short-lived services to artificially block on <-ctx.Done() just to stay alive.

Now the node is only shut down when a service returns an error or panics. A clean return lets the service exit on its own; if it happens before shutdown was requested (unexpected for a long-running service) we log a warning instead of stopping the node.

Added tests covering the three cases: error, panic, and clean exit.

StartService deferred cancel() on every service return, so a service that finished its work and returned nil would shut the whole node down. That forced short-lived services to artificially block on <-ctx.Done() just to stay alive. Only tear the node down when a service returns an error or panics. A clean return now lets the service exit on its own; if it returns before shutdown was requested (unexpected for a long-running service) we log a warning. Closes NethermindEth#3685

Copilot

Pull request overview

This PR changes node service lifecycle handling so a service that returns nil no longer cancels the node-wide context (preventing unintended full node shutdown). Instead, the node is only canceled on service error or panic; clean early exits are logged as warnings.

Changes:

Update Node.StartService to cancel the node context only on service error or panic (and re-panic to propagate panics).
Add logging for clean service exits (warn if the node wasn’t already shutting down; debug otherwise).
Add unit tests covering error, panic, and clean-exit behaviors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`node/node.go`	Adjusts `StartService` cancellation/panic behavior and adds clean-exit logging.
`node/start_service_test.go`	Adds tests validating shutdown behavior for error, panic, and clean service exits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Ehsan-saradar · 2026-06-28T07:15:32Z

+		// re-panic so the WaitGroup can propagate it to the caller.
+		defer func() {
+			if r := recover(); r != nil {
+				n.logger.Error("Service panicked, shutting down node", zap.String("name", name), zap.Any("panic", r))


Good catch. The stack is now captured with debug.Stack() inside the recover, before the re-panic — at that point gopanic is still running our deferred func on top of the original panicking frames, so it points at the real panic site rather than the wg.Wait() re-panic. Added as a stack field on the log entry in 242f665.

Ehsan-saradar · 2026-06-28T07:15:34Z

+		// The service returned without an error. Let it exit on its own without
+		// taking the rest of the node down. A clean return before shutdown was
+		// requested is unexpected for a long-running service, so flag it.
+		if ctx.Err() == nil {
+			n.logger.Warn("Service stopped before node shutdown was requested", zap.String("name", name))


Agreed. Updated the service.Service doc to drop "Services should be blocking" and spell out the contract instead: a long-running service blocks until ctx is cancelled, a short-lived service may return nil once its work is done without shutting the node down, and an error/panic triggers a node-wide shutdown. Done in 242f665.

- Capture debug.Stack() at the recover site so the panic log points at the original panic location instead of the WaitGroup re-panic. - Update the service.Service doc to reflect that a short-lived service may return nil cleanly without shutting the node down.

rodrodros

Thanks for your contribution

rodrodros · 2026-06-29T09:41:32Z

@@ -0,0 +1,97 @@
+package node


Please use package node_test. It is important test node behaviour. The _test suffix forces you to only have acess to the public API when writing tests.

Note that this doesn't mean you should change the public / private API and keep the tests the same but you need to update the tests so they are relevant but only through the use of public API.

I believe after changing the package to _test the modifications to the test file should be minimal.

Done — moved to package node_test. Used a small export_test.go seam (NewWithLogger) so the external test can build a logger-only Node without widening the public API.

rodrodros · 2026-06-29T09:53:41Z

+// TestStartService covers the three ways a service's Run can return and the
+// expected effect on the node-wide context:
+//   - error  -> shut the node down
+//   - panic  -> shut the node down and propagate the panic
+//   - nil    -> let the service exit without taking the node down
+func TestStartService(t *testing.T) {
+	newNode := func() *Node { return &Node{logger: log.NewNopZapLogger()} }
+
+	t.Run("error shuts the node down", func(t *testing.T) {
+		n := newNode()
+		wg := conc.NewWaitGroup()
+		ctx, cancel := context.WithCancel(t.Context())
+		defer cancel()
+
+		failing := &fakeService{run: func(context.Context) error {
+			return errors.New("boom")
+		}}
+		n.StartService(wg, ctx, cancel, failing)
+		wg.Wait()
+
+		require.Error(t, ctx.Err(), "a service error should shut the node down")
+	})


This tests interact with the services but they don't check anything, just that they node stopped.

Please use the zap.Observer utility and then check that the right final error logs where logged, otherwise you've no way of checking node behaviour.

Done — now asserting on the emitted logs via zaptest/observer: Service error on the error path, and the early-exit warning on a clean return.

codecov · 2026-06-29T10:21:47Z

Codecov Report

❌ Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.23%. Comparing base (62d2222) to head (39f6e6e).
⚠️ Report is 16 commits behind head on main.

Files with missing lines	Patch %	Lines
node/node.go	66.66%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3770      +/-   ##
==========================================
+ Coverage   75.07%   75.23%   +0.16%     
==========================================
  Files         427      433       +6     
  Lines       38532    38675     +143     
==========================================
+ Hits        28928    29099     +171     
+ Misses       7608     7574      -34     
- Partials     1996     2002       +6

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Ehsan-saradar · 2026-06-30T08:52:03Z

+				n.logger.Error("Service panicked, shutting down node",
+					zap.String("name", name), zap.Any("panic", r), zap.ByteString("stack", debug.Stack()))


This is moot now — the panic log was dropped entirely in a later commit (we agreed it shouldn't ever fire), so there's no more debug.Stack() to encode. The recover path just cancels and re-panics, and the WaitGroup surfaces the original stack to the caller.

Drop the panic log and manual debug.Stack() capture: the WaitGroup re-panics with the original stack, so logging it here is redundant. Trim the "shutting down node" suffix from the error log (the node already announces its own shutdown), and shorten the Service.Run contract doc.

Move the test to package node_test and assert the emitted logs with zaptest/observer instead of only checking the cancelled context. Add an export_test.go seam so the external test can build a logger-only Node without widening the public API.

rodrodros · 2026-06-29T15:21:38Z

I don't understand why do you need this?

rodrodros · 2026-06-29T15:22:45Z

+	// messages StartService emits.
+	newNode := func() (*node.Node, *observer.ObservedLogs) {
+		core, logs := observer.New(zapcore.DebugLevel)
+		return node.NewWithLogger(log.NewZapLoggerWithCore(core)), logs


Cannot you initialize the log here directly with the right logger?

Yeah, the catch is logger is unexported, so from package node_test I can't set it on the struct directly. The only public way to get a Node with a logger is New(), but that spins up the DB, VM, mempool and all the services — way too heavy for a lifecycle test.

That's all NewWithLogger is for — it lives in export_test.go, so it's compiled only during tests and never becomes part of the package's actual API. It's the standard Go trick for testing from an external _test package while reaching one internal bit, and it's what keeps the test in node_test without changing the public/private API like you asked.

Happy to go the New() route instead if you'd rather, but it'd pull a lot of unrelated setup into the test. Let me know what you prefer.

Yes no NewWithLogger please, use the available public API to do your tests

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

 	wg.Go(func() {
-		// Immediately acknowledge panicing services by shutting down the node
-		// Without the deffered cancel(), we would have to wait for user to hit Ctrl+C
-		defer cancel()
+		name := reflect.TypeOf(s).String()
+


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Ehsan-saradar <ehsan.saradar@gmail.com>

Copilot AI review requested due to automatic review settings June 28, 2026 07:02

Copilot started reviewing on behalf of Ehsan-saradar June 28, 2026 07:03 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

rodrodros reviewed Jun 29, 2026

View reviewed changes

Ehsan-saradar requested review from Copilot and rodrodros June 29, 2026 10:14

Copilot started reviewing on behalf of Ehsan-saradar June 29, 2026 10:19 View session

Ehsan-saradar marked this pull request as draft June 29, 2026 10:22

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Ehsan-saradar added 2 commits June 29, 2026 11:24

Ehsan-saradar marked this pull request as ready for review June 29, 2026 12:50

rodrodros reviewed Jun 29, 2026

View reviewed changes

Ehsan-saradar added 2 commits June 30, 2026 08:48

test(node): inline single-use earlyExitMsg constant

c00be34

test(node): wrap require.Equal to satisfy lll diff lint

39f6e6e

Ehsan-saradar requested review from Copilot and rodrodros June 30, 2026 09:12

Copilot started reviewing on behalf of Ehsan-saradar June 30, 2026 09:12 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Potential fix for pull request finding

fdce2ad

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Ehsan-saradar <ehsan.saradar@gmail.com>

		n.logger.Error("Service panicked, shutting down node",
		zap.String("name", name), zap.Any("panic", r), zap.ByteString("stack", debug.Stack()))

Uh oh!

Conversation

Ehsan-saradar commented Jun 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rodrodros left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Jun 29, 2026 •

edited

Loading