Skip to content

fix: wait for log production goroutine to drain on stop#3660

Merged
mdelapenya merged 4 commits into
testcontainers:mainfrom
mdelapenya:fix-log-drain-on-terminate
Apr 24, 2026
Merged

fix: wait for log production goroutine to drain on stop#3660
mdelapenya merged 4 commits into
testcontainers:mainfrom
mdelapenya:fix-log-drain-on-terminate

Conversation

@mdelapenya

Copy link
Copy Markdown
Member

What does this PR do?

Adds a logProductionDone channel that the goroutine closes on exit, and waits on it (with a bounded timeout) in stopLogProduction() before returning.

Why is it important?

stopLogProduction() was cancelling the context and returning immediately, creating a race where in-flight log data could be dropped when Terminate() fired during active log streaming. Confirmed data loss of up to 770/1000 lines in reproduction tests.

Related issues

Fixes #2887

stopLogProduction() was cancelling the context and returning
immediately, creating a race where in-flight log data could be
dropped when Terminate() fired during active log streaming.
Confirmed data loss of up to 77% of lines in reproduction tests.

Add a logProductionDone channel (closed by the goroutine on exit)
and wait for it in stopLogProduction() before returning. For
containers that have already exited, the goroutine finishes
naturally once stdcopy.StdCopy reaches EOF, so all buffered logs
are delivered. A 5-second bounded timeout handles the case where
the container is still actively streaming (e.g. Stop() called on
a running container), after which we force-cancel and wait for
the goroutine to acknowledge.

Adds TestContainerLogDrainOnTerminate to demonstrate the fix:
it runs 5 parallel sub-tests, each starting an alpine container
that emits 1000 log lines and exits, then calls Terminate() and
asserts all 1000 lines were received.

Fixes testcontainers#2887

Co-Authored-By: Claude <noreply@anthropic.com>
@mdelapenya mdelapenya requested a review from a team as a code owner April 24, 2026 16:58
@netlify

netlify Bot commented Apr 24, 2026

Copy link
Copy Markdown

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit 0a1e14b
🔍 Latest deploy log https://app.netlify.com/projects/testcontainers-go/deploys/69eba5aad706e80008ba1950
😎 Deploy Preview https://deploy-preview-3660--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3d184a41-82ae-4009-ac81-1513791ce931

📥 Commits

Reviewing files that changed from the base of the PR and between 2467f2c and 0a1e14b.

📒 Files selected for processing (1)
  • docker.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker.go

Summary by CodeRabbit

  • Bug Fixes

    • Improved logging reliability when stopping containers by adding a coordinated, timeout‑bounded drain and acknowledgement flow so log producers are given a chance to finish and shutdown is detected reliably, reducing the risk of dropped logs during termination.
  • Tests

    • Added a regression test that runs short‑lived containers emitting bursts of logs, terminates them after exit, and verifies all expected log lines are captured.

Walkthrough

DockerContainer's log shutdown is coordinated via a new logProductionDone channel: startLogProduction creates and closes it when the log goroutine exits. stopLogProduction first waits briefly for natural completion, then cancels log production and waits again (with time bounds) for the goroutine to acknowledge shutdown. A regression test asserts no logs are lost on container exit.

Changes

Cohort / File(s) Summary
Log production shutdown
docker.go
Introduce logProductionDone channel; startLogProduction creates and closes it when the log goroutine exits. stopLogProduction uses a two-phase shutdown: short wait for natural completion, call logProductionCancel(errLogProductionStop) if needed, then wait again (bounded) for goroutine acknowledgement and return based on context.Cause(c.logProductionCtx).
Regression test / test utility
logconsumer_test.go
Add a test-only atomic log consumer and TestContainerLogDrainOnTerminate, which starts short-lived containers emitting bursts of lines, terminates after exit, and asserts all emitted logs were received (detects dropped logs).

Sequence Diagram(s)

sequenceDiagram
    participant Controller as Controller
    participant Container as DockerContainer
    participant Producer as LogGoroutine
    participant Ctx as LogContext

    Controller->>Container: startLogProduction()
    Container->>Producer: spawn log producer (creates `logProductionDone`)
    Producer-->>Container: forward logs
    Note over Container,Producer: container exits or producer drains logs
    Controller->>Container: stopLogProduction()
    Container->>Producer: wait up to short timeout for `logProductionDone` close
    alt not closed
        Container->>Ctx: call logProductionCancel(errLogProductionStop)
        Container->>Producer: wait up to bounded timeout for `logProductionDone` close
    end
    Producer-->>Container: close `logProductionDone` (ack)
    Container-->>Controller: stopLogProduction returns (uses `context.Cause` of log ctx)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibbled lines from dusk till dawn,
I watched the goroutine hum and yawn.
I waited soft, then gave a chime,
The channel closed — not one lost line.
Hooray, the logs stayed safe this time! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the primary change: waiting for the log production goroutine to drain on stop.
Description check ✅ Passed The description comprehensively explains what the PR does, why it matters, and references the related issue, all directly related to the changeset.
Linked Issues check ✅ Passed The PR successfully addresses issue #2887 by implementing a mechanism to wait for log production goroutine to drain before returning, restoring container log output that was lost.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the log drain issue: modifications to log shutdown coordination in docker.go and a test verifying log drain behavior in logconsumer_test.go.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2427ce97de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker.go Outdated
Comment on lines +931 to +932
if c.logProductionDone != nil {
<-c.logProductionDone

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound the post-cancel wait for log producer shutdown

After the 5s pre-cancel timeout expires, stopLogProduction() performs an unbounded receive on logProductionDone. If the log reader goroutine fails to exit after cancellation (for example, a stuck Docker log stream that does not unblock promptly on context cancel), Stop/Terminate can hang indefinitely during cleanup. This commit introduced the blocking wait, so cleanup paths that previously returned can now deadlock under transport/daemon failure conditions.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
logconsumer_test.go (2)

741-744: Consider registering CleanupContainer for failure-path robustness.

The test relies solely on an explicit TerminateContainer(ctr) call. If Run succeeds but the explicit terminate fails (err2 != nil), the container will leak because no deferred cleanup is registered — this differs from every other test in this file which uses CleanupContainer(t, c). Registering a cleanup in addition to the explicit terminate is idempotent and keeps the test safe against failures on the terminate path itself.

🧹 Proposed change
 			ctr, err := Run(
 				ctx,
 				"alpine:latest",
 				// ...
 				WithWaitStrategy(wait.ForExit()),
 			)
+			CleanupContainer(t, ctr)
+			require.NoError(t, err)
+
 			// Terminate after the container has already exited so the race
 			// window is as narrow as possible.
-			err2 := TerminateContainer(ctr)
-			require.NoError(t, err)
-			require.NoError(t, err2)
+			require.NoError(t, TerminateContainer(ctr))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logconsumer_test.go` around lines 741 - 744, The test currently calls
TerminateContainer(ctr) and checks its error but doesn't register deferred
cleanup; add a call to CleanupContainer(t, ctr) (e.g., immediately after
container creation or before invoking TerminateContainer) so the test registers
idempotent cleanup even if TerminateContainer fails, while retaining the
explicit TerminateContainer(ctr) call and require.NoError assertions to keep the
existing success-path behavior.

719-721: Remove redundant loop variable shadowing in Go 1.22+

With Go 1.22's integer range syntax (for run := range runs), the loop variable is already scoped per iteration. The redundant run := run assignment can be safely removed.

Suggested change
 	for run := range runs {
-		run := run
 		t.Run(fmt.Sprintf("run-%d", run), func(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logconsumer_test.go` around lines 719 - 721, The loop contains an unnecessary
shadowing assignment `run := run` inside the range loop; remove that redundant
line so the per-iteration `run` from `for run := range runs` is used directly in
the closure passed to t.Run (the anonymous func starting with func(t
*testing.T)), avoiding the redundant variable shadowing.
docker.go (1)

911-933: Consider extracting the 5-second timeout and bounding the post-cancel wait.

Two suggestions on the new shutdown coordination:

  1. The hardcoded 5 * time.Second on line 922 is a magic number. Since it happens to equal minLogProductionTimeout, promoting it to a named constant (e.g., logProductionDrainTimeout) would make intent explicit and make the value easier to tune/test.

  2. The second wait on line 932 is unbounded. In practice the goroutine should exit quickly after logProductionCancel is invoked (the HTTP read in copyLogs respects context cancellation, and copyLogsTimeout returns false on c.logProductionCtx.Err() != nil). However, if the Docker client ever fails to honor cancellation (stuck connection, buggy transport), stopLogProduction — and therefore Stop/Terminate — would hang indefinitely. Bounding this wait with a second timeout would make the shutdown path strictly bounded, which is valuable for the PostStop hook path.

🛡️ Proposed change
+const logProductionDrainTimeout = 5 * time.Second
+
 // stopLogProduction will stop the concurrent process that is reading logs
 // and sending them to each added LogConsumer
 func (c *DockerContainer) stopLogProduction() error {
 	if c.logProductionCancel == nil {
 		return nil
 	}

 	// Wait for the log production goroutine to finish draining any buffered
 	// logs before cancelling. When the container has already exited, the
 	// goroutine will reach EOF naturally and close logProductionDone on its
 	// own. The bounded timeout prevents blocking indefinitely when the
 	// container is still actively streaming (e.g. Stop() called on a running
 	// container).
 	if c.logProductionDone != nil {
 		select {
 		case <-c.logProductionDone:
 			// Goroutine already finished naturally; nothing more to do.
 			return nil
-		case <-time.After(5 * time.Second):
+		case <-time.After(logProductionDrainTimeout):
 			// Timed out waiting for natural exit; force-cancel now.
 		}
 	}

 	// Signal the log production to stop (for still-running containers).
 	c.logProductionCancel(errLogProductionStop)

 	// Wait for the goroutine to acknowledge the cancellation.
 	if c.logProductionDone != nil {
-		<-c.logProductionDone
+		select {
+		case <-c.logProductionDone:
+		case <-time.After(logProductionDrainTimeout):
+			// Goroutine failed to acknowledge cancellation in time;
+			// give up rather than hang the caller (e.g. Terminate()).
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker.go` around lines 911 - 933, Extract the hardcoded 5*time.Second into a
named constant (e.g., logProductionDrainTimeout) instead of the magic number and
use it where the current select waits for c.logProductionDone; then make the
post-cancel receive on c.logProductionDone bounded as well by waiting with a
timeout (e.g., logProductionCancelAckTimeout or reuse minLogProductionTimeout)
so that after calling c.logProductionCancel(errLogProductionStop) the code does
not block indefinitely; update references around c.logProductionDone,
c.logProductionCancel, errLogProductionStop, and any tests or callers of
stopLogProduction/Stop/Terminate/PostStop to reflect the new timeout
constant(s).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docker.go`:
- Around line 911-933: Extract the hardcoded 5*time.Second into a named constant
(e.g., logProductionDrainTimeout) instead of the magic number and use it where
the current select waits for c.logProductionDone; then make the post-cancel
receive on c.logProductionDone bounded as well by waiting with a timeout (e.g.,
logProductionCancelAckTimeout or reuse minLogProductionTimeout) so that after
calling c.logProductionCancel(errLogProductionStop) the code does not block
indefinitely; update references around c.logProductionDone,
c.logProductionCancel, errLogProductionStop, and any tests or callers of
stopLogProduction/Stop/Terminate/PostStop to reflect the new timeout
constant(s).

In `@logconsumer_test.go`:
- Around line 741-744: The test currently calls TerminateContainer(ctr) and
checks its error but doesn't register deferred cleanup; add a call to
CleanupContainer(t, ctr) (e.g., immediately after container creation or before
invoking TerminateContainer) so the test registers idempotent cleanup even if
TerminateContainer fails, while retaining the explicit TerminateContainer(ctr)
call and require.NoError assertions to keep the existing success-path behavior.
- Around line 719-721: The loop contains an unnecessary shadowing assignment
`run := run` inside the range loop; remove that redundant line so the
per-iteration `run` from `for run := range runs` is used directly in the closure
passed to t.Run (the anonymous func starting with func(t *testing.T)), avoiding
the redundant variable shadowing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 36a77d39-2274-408d-868c-972a9c7b2912

📥 Commits

Reviewing files that changed from the base of the PR and between b8fc680 and 2427ce9.

📒 Files selected for processing (2)
  • docker.go
  • logconsumer_test.go

The post-cancel <-logProductionDone receive was unbounded. While
context cancellation normally unblocks stdcopy.StdCopy promptly
(the timeoutCtx derived from logProductionCtx is cancelled too),
a stuck kernel socket read or Docker daemon transport failure
could cause the goroutine to hang indefinitely, blocking
Stop/Terminate forever.

Add a select with time.After(minLogProductionTimeout) on the
post-cancel wait, matching the existing pre-cancel timeout.

Co-Authored-By: Claude <noreply@anthropic.com>
@mdelapenya

Copy link
Copy Markdown
Member Author

Re: the P1 review comment on stopLogProduction (unbounded post-cancel wait):

Valid concern — fixed in 56f30d5.

Context cancellation propagates into the Docker transport and normally unblocks stdcopy.StdCopy within milliseconds (timeoutCtx passed to ContainerLogs is a child of logProductionCtx, so it is cancelled immediately). However, a stuck kernel socket read or daemon transport failure could prevent unblocking, leaving the wait truly unbounded.

Added a select with time.After(minLogProductionTimeout) (5 s) on the post-cancel receive, matching the pre-cancel timeout. Both waits are now bounded at 5 s each, so stopLogProduction can block at most ~10 s total under worst-case transport failure.

- Replace raw 5*time.Second literal in stopLogProduction pre-cancel
  wait with minLogProductionTimeout for consistency; both drain
  timeouts now share the same named constant.
- Remove redundant 'run := run' loop-variable shadow (unnecessary
  since Go 1.22 integer range captures loop vars by value).
- Add CleanupContainer(t, ctr) after Run() succeeds so containers
  are always reclaimed even if a subsequent assertion panics.

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docker.go (2)

922-922: Use minLogProductionTimeout instead of a hardcoded 5 * time.Second.

The commit message states this pre-cancel timeout is meant to match the existing (post-cancel) one, which is minLogProductionTimeout. Duplicating the literal lets the two drift silently if the constant is ever tuned.

♻️ Proposed change
-		case <-time.After(5 * time.Second):
+		case <-time.After(minLogProductionTimeout):
 			// Timed out waiting for natural exit; force-cancel now.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker.go` at line 922, Replace the hardcoded pre-cancel timeout
`time.After(5 * time.Second)` with the existing constant
`minLogProductionTimeout` so it matches the post-cancel timeout; locate the
select/case that currently uses `time.After(5 * time.Second)` and change it to
use `time.After(minLogProductionTimeout)` (ensuring the
`minLogProductionTimeout` symbol is in scope/imports where the select is in the
same package).

935-940: Post-cancel wait looks correct; consider logging on timeout.

If the goroutine fails to acknowledge within minLogProductionTimeout (stuck kernel read / daemon transport hang that the PR description calls out), the function silently moves on to the context.Cause check and returns nil via the errLogProductionStop case. A leaked goroutine in that scenario would be hard to diagnose without at least a debug log line. Optional, but worth considering.

💡 Optional
 	if c.logProductionDone != nil {
 		select {
 		case <-c.logProductionDone:
 		case <-time.After(minLogProductionTimeout):
+			c.logger.Printf("log production goroutine did not exit within %s; leaking", minLogProductionTimeout)
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker.go` around lines 935 - 940, The post-cancel wait on
c.logProductionDone using time.After(minLogProductionTimeout) can silently time
out and hide a leaked goroutine; update the select in the function that reads
c.logProductionDone so that the timeout case logs a debug/warn message (using
the same logger the surrounding code uses) indicating the log production
goroutine did not acknowledge within minLogProductionTimeout and include context
like errLogProductionStop or context.Cause as available; keep the current
behavior of proceeding after the timeout but emit the log so a stuck kernel
read/daemon transport hang is detectable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docker.go`:
- Line 922: Replace the hardcoded pre-cancel timeout `time.After(5 *
time.Second)` with the existing constant `minLogProductionTimeout` so it matches
the post-cancel timeout; locate the select/case that currently uses
`time.After(5 * time.Second)` and change it to use
`time.After(minLogProductionTimeout)` (ensuring the `minLogProductionTimeout`
symbol is in scope/imports where the select is in the same package).
- Around line 935-940: The post-cancel wait on c.logProductionDone using
time.After(minLogProductionTimeout) can silently time out and hide a leaked
goroutine; update the select in the function that reads c.logProductionDone so
that the timeout case logs a debug/warn message (using the same logger the
surrounding code uses) indicating the log production goroutine did not
acknowledge within minLogProductionTimeout and include context like
errLogProductionStop or context.Cause as available; keep the current behavior of
proceeding after the timeout but emit the log so a stuck kernel read/daemon
transport hang is detectable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: efaedbac-76bd-400c-ae94-754187f76b34

📥 Commits

Reviewing files that changed from the base of the PR and between 2427ce9 and 56f30d5.

📒 Files selected for processing (1)
  • docker.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docker.go (1)

911-940: Pre-cancel 5 s wait is wasted when the container is still actively streaming.

For the Terminate path this is fine because Stop() has already run and the goroutine reaches EOF promptly, so logProductionDone closes within milliseconds and the first select returns early. However, for direct callers of the (deprecated) StopLogProducer() on a still-running container, the goroutine will never exit naturally and every call will block for the full minLogProductionTimeout (5 s) before the cancel fires — a regression versus the previous immediate-cancel behavior.

If you want to preserve the old fast path for that case, you can flip the order: cancel first, then wait bounded for the goroutine to drain whatever it already had buffered. In practice the in-flight stdcopy.StdCopy call still flushes what it has before honoring the cancellation, so you don't actually need the pre-cancel wait to avoid dropping logs — the post-cancel drain is what guarantees delivery.

♻️ Possible simplification
-	// Wait for the log production goroutine to finish draining any buffered
-	// logs before cancelling. When the container has already exited, the
-	// goroutine will reach EOF naturally and close logProductionDone on its
-	// own. The bounded timeout prevents blocking indefinitely when the
-	// container is still actively streaming (e.g. Stop() called on a running
-	// container).
-	if c.logProductionDone != nil {
-		select {
-		case <-c.logProductionDone:
-			// Goroutine already finished naturally; nothing more to do.
-			return nil
-		case <-time.After(minLogProductionTimeout):
-			// Timed out waiting for natural exit; force-cancel now.
-		}
-	}
-
-	// Signal the log production to stop (for still-running containers).
+	// Give the goroutine a brief window to finish naturally (container
+	// already exited) before we force-cancel, so buffered logs are drained.
+	if c.logProductionDone != nil {
+		select {
+		case <-c.logProductionDone:
+			return nil
+		default:
+		}
+	}
+
+	// Signal the log production to stop.
 	c.logProductionCancel(errLogProductionStop)

This relies on the post-cancel drain wait (lines 935–940) to let stdcopy.StdCopy flush what it already has. Worst-case total becomes ~5 s instead of ~10 s.

That said, if the intent is specifically to give slow daemons time to flush buffered log frames through the transport before cancellation cuts them off, the current shape is defensible — in which case consider adding a brief note to the doc comment explaining the tradeoff.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker.go` around lines 911 - 940, The current logic waits up to
minLogProductionTimeout on c.logProductionDone before calling
c.logProductionCancel, which causes StopLogProducer() on a running container to
always wait the full timeout; flip the order so you call
c.logProductionCancel(errLogProductionStop) first and then wait (bounded by
minLogProductionTimeout) on c.logProductionDone to preserve the previous
immediate-cancel behavior while still allowing the post-cancel drain via
stdcopy.StdCopy to flush buffered logs; update the code path that references
c.logProductionDone and c.logProductionCancel (and the StopLogProducer/Terminate
callers) accordingly and add a short comment clarifying the intent of the
post-cancel drain wait if desired.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docker.go`:
- Around line 911-940: The current logic waits up to minLogProductionTimeout on
c.logProductionDone before calling c.logProductionCancel, which causes
StopLogProducer() on a running container to always wait the full timeout; flip
the order so you call c.logProductionCancel(errLogProductionStop) first and then
wait (bounded by minLogProductionTimeout) on c.logProductionDone to preserve the
previous immediate-cancel behavior while still allowing the post-cancel drain
via stdcopy.StdCopy to flush buffered logs; update the code path that references
c.logProductionDone and c.logProductionCancel (and the StopLogProducer/Terminate
callers) accordingly and add a short comment clarifying the intent of the
post-cancel drain wait if desired.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 784863a9-51cd-4588-a34d-88c7b7191f6a

📥 Commits

Reviewing files that changed from the base of the PR and between 56f30d5 and 2467f2c.

📒 Files selected for processing (2)
  • docker.go
  • logconsumer_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • logconsumer_test.go

If stdcopy.StdCopy fails to unblock within minLogProductionTimeout
after context cancellation (stuck kernel socket / daemon hang),
stopLogProduction now logs a warning via c.logger.Printf so leaked
goroutines are diagnosable rather than silently ignored.

Co-Authored-By: Claude <noreply@anthropic.com>
@mdelapenya

Copy link
Copy Markdown
Member Author

Re: CodeRabbit review comment on docker.go:911-940 (pre-cancel 5 s wait regression):

The claim that "cancel first, then wait for drain" is equally safe is incorrect. Tested empirically with cancel-first: 4 out of 10 runs dropped logs (up to 850/1000 lines dropped).

Root cause is in stdcopy.StdCopy itself. When the context is cancelled, the Docker HTTP response body's Read() returns immediately with an error — StdCopy propagates that error and discards the remainder of the stream. It does not flush buffered data first. From the source:

nr2, err = multiplexedSource.Read(buf[nr:])
// ...
if err != nil {
    return 0, err  // drops everything, no flush
}

The pre-cancel wait is load-bearing for correctness on the Terminate() path. The reviewer's StopLogProducer() regression concern is valid (5 s extra latency on still-running containers), but that method is already deprecated and the tradeoff favours zero data loss. Keeping the current wait-before-cancel approach.

@mdelapenya mdelapenya merged commit 74c41a3 into testcontainers:main Apr 24, 2026
222 checks passed
@mdelapenya mdelapenya deleted the fix-log-drain-on-terminate branch April 24, 2026 20:52
@mdelapenya mdelapenya added the bug An issue with the library label May 11, 2026
hbjydev pushed a commit to hbjydev/containers that referenced this pull request Jun 25, 2026
… (v0.42.0 → v0.43.0) (#9)

This PR contains the following updates:

| Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [github.com/testcontainers/testcontainers-go](https://github.com/testcontainers/testcontainers-go) | `v0.42.0` → `v0.43.0` | ![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2ftestcontainers%2ftestcontainers-go/v0.43.0?slim=true) | ![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2ftestcontainers%2ftestcontainers-go/v0.42.0/v0.43.0?slim=true) |

---

### Release Notes

<details>
<summary>testcontainers/testcontainers-go (github.com/testcontainers/testcontainers-go)</summary>

### [`v0.43.0`](https://github.com/testcontainers/testcontainers-go/releases/tag/v0.43.0)

[Compare Source](testcontainers/testcontainers-go@v0.42.0...v0.43.0)

### What's Changed

#### ⚠️ Breaking Changes

- chore(wait)!: change url callback in wait.ForSQL to accept network.Port ([#&#8203;3650](testcontainers/testcontainers-go#3650)) [@&#8203;thaJeztah](https://github.com/thaJeztah)

> Users of `wait.ForSQL` need to follow the new API contract, using Moby's `network.Port` instead of `string` when building the callback function to check the URL. Please see <https://golang.testcontainers.org/features/wait/sql/>

- feat!: add PullImageWithPlatform to DockerProvider ([#&#8203;3710](testcontainers/testcontainers-go#3710)) [@&#8203;blueprismo](https://github.com/blueprismo)

> Users implementing their own `testcontainers.ImageProvider` need to implement the new `PullImageWithPlatform` method introduced by this PR.

#### 🚀 Features

- feat(k3s):  pull image opts ([#&#8203;3716](testcontainers/testcontainers-go#3716)) [@&#8203;blueprismo](https://github.com/blueprismo)
- feat(wait): implement AnyMultiStrategy: ForAny equivalent to ForAll. ([#&#8203;3719](testcontainers/testcontainers-go#3719)) [@&#8203;jeanbza](https://github.com/jeanbza)
- feat(eventhubs): add WithAzuriteContainer and functional-options config builder ([#&#8203;3722](testcontainers/testcontainers-go#3722)) [@&#8203;mdelapenya](https://github.com/mdelapenya)
- feat!: add PullImageWithPlatform to DockerProvider ([#&#8203;3710](testcontainers/testcontainers-go#3710)) [@&#8203;blueprismo](https://github.com/blueprismo)
- feat(modules/dex): add Dex OIDC provider module ([#&#8203;3659](testcontainers/testcontainers-go#3659)) [@&#8203;guilycst](https://github.com/guilycst)

#### 🐛 Bug Fixes

- fix(security): remove debug code that leaks Docker credentials ([#&#8203;3721](testcontainers/testcontainers-go#3721)) [@&#8203;mdelapenya](https://github.com/mdelapenya)
- fix(ollama): align local exec test with Ollama 0.30.6 log format ([#&#8203;3715](testcontainers/testcontainers-go#3715)) [@&#8203;mdelapenya](https://github.com/mdelapenya)
- fix: close temp file handle before removal  ([#&#8203;3672](testcontainers/testcontainers-go#3672)) [@&#8203;acouvreur](https://github.com/acouvreur)
- fix(compose): close docker clients to prevent goroutine leaks ([#&#8203;3661](testcontainers/testcontainers-go#3661)) [@&#8203;mdelapenya](https://github.com/mdelapenya)
- fix: wait for log production goroutine to drain on stop ([#&#8203;3660](testcontainers/testcontainers-go#3660)) [@&#8203;mdelapenya](https://github.com/mdelapenya)

#### 📖 Documentation

- chore: update usage metrics (2026-06) ([#&#8203;3714](testcontainers/testcontainers-go#3714)) @&#8203;[github-actions\[bot\]](https://github.com/apps/github-actions)

#### 🧹 Housekeeping

- chore(wait)!: change url callback in wait.ForSQL to accept network.Port ([#&#8203;3650](testcontainers/testcontainers-go#3650)) [@&#8203;thaJeztah](https://github.com/thaJeztah)
- chore: update usage metrics (2026-05) ([#&#8203;3670](testcontainers/testcontainers-go#3670)) @&#8203;[github-actions\[bot\]](https://github.com/apps/github-actions)
- chore: remove cgroupnsMode setting from K3s container configuration ([#&#8203;3653](testcontainers/testcontainers-go#3653)) [@&#8203;lixin9311](https://github.com/lixin9311)

#### 📦 Dependency updates

- chore(deps): update dependencies to latest versions in go.mod and go.sum ([#&#8203;3729](testcontainers/testcontainers-go#3729)) [@&#8203;Steven-Harris](https://github.com/Steven-Harris)
- chore: bump sshd-docker image to 1.4.0 ([#&#8203;3727](testcontainers/testcontainers-go#3727)) [@&#8203;mdelapenya](https://github.com/mdelapenya)
- chore(deps): bump Ryuk to v0.14.0 ([#&#8203;3313](testcontainers/testcontainers-go#3313)) [@&#8203;mdelapenya](https://github.com/mdelapenya)
- chore(deps): bump github.com/shirou/gopsutil/v4 from 4.26.4 to 4.26.5 ([#&#8203;3713](testcontainers/testcontainers-go#3713)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump golang.org/x/sys from 0.44.0 to 0.45.0 ([#&#8203;3712](testcontainers/testcontainers-go#3712)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump mkdocs-include-markdown-plugin from 7.2.2 to 7.3.0 ([#&#8203;3711](testcontainers/testcontainers-go#3711)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump slackapi/slack-github-action from 2.1.1 to 3.0.3 ([#&#8203;3677](testcontainers/testcontainers-go#3677)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump idna from 3.11 to 3.15 ([#&#8203;3708](testcontainers/testcontainers-go#3708)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/containerd/containerd/v2 from 2.2.2 to 2.2.4 in /modules/compose ([#&#8203;3709](testcontainers/testcontainers-go#3709)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump urllib3 from 2.6.3 to 2.7.0 ([#&#8203;3704](testcontainers/testcontainers-go#3704)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/shirou/gopsutil/v4 from 4.26.3 to 4.26.4 ([#&#8203;3667](testcontainers/testcontainers-go#3667)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/moby/moby/api from 1.54.1 to 1.54.2 ([#&#8203;3676](testcontainers/testcontainers-go#3676)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump golang.org/x/crypto from 0.48.0 to 0.51.0 ([#&#8203;3689](testcontainers/testcontainers-go#3689)) [@&#8203;mdelapenya](https://github.com/mdelapenya)
- chore(deps): bump google.golang.org/grpc from 1.79.3 to 1.81.0 in /modules/gcloud ([#&#8203;3690](testcontainers/testcontainers-go#3690)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump google.golang.org/grpc from 1.75.0 to 1.81.0 in /modules/dex ([#&#8203;3686](testcontainers/testcontainers-go#3686)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/in-toto/in-toto-golang from 0.10.0 to 0.11.0 in /modules/compose ([#&#8203;3674](testcontainers/testcontainers-go#3674)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump docker/setup-docker-action from 4.5.0 to 5.1.0 ([#&#8203;3664](testcontainers/testcontainers-go#3664)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump mkdocs-include-markdown-plugin from 7.2.1 to 7.2.2 ([#&#8203;3665](testcontainers/testcontainers-go#3665)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/apache/thrift from 0.21.0 to 0.23.0 in /modules/nebulagraph ([#&#8203;3673](testcontainers/testcontainers-go#3673)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/Azure/go-ntlmssp from 0.0.0-20221128193559-754e69321358 to 0.1.1 in /modules/openldap ([#&#8203;3658](testcontainers/testcontainers-go#3658)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/jackc/pgx/v5 from 5.9.0 to 5.9.2 in /modules/postgres ([#&#8203;3657](testcontainers/testcontainers-go#3657)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/jackc/pgx/v5 from 5.5.4 to 5.9.2 in /modules/cockroachdb ([#&#8203;3656](testcontainers/testcontainers-go#3656)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)
- chore(deps): bump github.com/jackc/pgx/v5 from 5.5.4 to 5.9.0 in /modules/postgres ([#&#8203;3652](testcontainers/testcontainers-go#3652)) @&#8203;[dependabot\[bot\]](https://github.com/apps/dependabot)

</details>

---

### Configuration

📅 **Schedule**: (in timezone Europe/London)

- Branch creation
  - At any time (no schedule defined)
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMzIuMSIsInVwZGF0ZWRJblZlciI6IjQzLjI0Mi4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJ0eXBlL21pbm9yIl19-->

Reviewed-on: https://forgejo.hayden.moe/hayden/containers/pulls/9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug An issue with the library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Regression]: Missing container logs when container exited

1 participant