Skip to content

[BUG]: Memory leak; ServiceEntrySpanOperation.Finish() never calls dyngo.FinishOperation, GLS entry per request #4763

@Nakamurus

Description

@Nakamurus

Tracer Version(s)

2.8.1

Go Version(s)

go version go1.25.3 darwin/arm64

Bug Report

The Finish() method on ServiceEntrySpanOperation does not call dyngo.FinishOperation, so the operation is never popped from orchestrion's per-goroutine GLS context stack and never has its event listeners cleared.
For long-lived goroutines (HTTP keep-alive, gRPC streams) this causes per-request memory growth that no amount of span flushing can release.

This patch adds a ServiceEntrySpanRes type and inserts a single dyngo.FinishOperation call so the operation properly winds down.

Patch

--- a/instrumentation/appsec/trace/service_entry_span.go
+++ b/instrumentation/appsec/trace/service_entry_span.go
@@ -25,6 +25,9 @@ type (
 	// ServiceEntrySpanArgs is the arguments for a ServiceEntrySpanOperation
 	ServiceEntrySpanArgs struct{}

+	// ServiceEntrySpanRes is the result of a ServiceEntrySpanOperation.
+	ServiceEntrySpanRes struct{}
+
 	// ServiceEntrySpanTag is a key value pair event that is used to tag a service entry span
 	ServiceEntrySpanTag struct {
 		Key   string
@@ -47,6 +50,7 @@ type (
 )

 func (ServiceEntrySpanArgs) IsArgOf(*ServiceEntrySpanOperation) {}
+func (ServiceEntrySpanRes) IsResultOf(*ServiceEntrySpanOperation) {}

 // SetTag adds the key/value pair to the tags to add to the service entry span
 func (op *ServiceEntrySpanOperation) SetTag(key string, value any) {
@@ -141,6 +145,9 @@ func StartServiceEntrySpanOperation(ctx context.Context, span TagSetter) (*Servi
 }

 func (op *ServiceEntrySpanOperation) Finish() {
+	// Pop the op from orchestrion's per-goroutine GLS context stack and clear its event listeners.
+	dyngo.FinishOperation(op, ServiceEntrySpanRes{})
+
 	span := op.tagSetter
 	if _, ok := span.(NoopTagSetter); ok { // If the span is a NoopTagSetter or is nil, we don't need to set any tags
 		return

Reproduction Code

The following test fails without the patch above, and passes after applying it. The tests checks if the stack contains GLS entry after operation completion.

package trace_test

import (
	"context"
	"testing"

	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"

	"github.com/DataDog/dd-trace-go/v2/instrumentation/appsec/trace"
	"github.com/DataDog/dd-trace-go/v2/internal/orchestrion"
)

// TestServiceEntrySpanOperation_DoesNotLeakGLSEntry verifies that finishing a
// ServiceEntrySpanOperation pops its entry from orchestrion's per-goroutine
// GLS context stack. Without this, every HTTP/gRPC request handled by a
// long-lived goroutine (HTTP keep-alive, gRPC stream) leaves a stuck entry
// that retains the span, its tag maps, and any attached AppSec listeners.
func TestServiceEntrySpanOperation_DoesNotLeakGLSEntry(t *testing.T) {
	t.Cleanup(orchestrion.MockGLS())

	base := orchestrion.GLSStackDepth()

	op, _ := trace.StartServiceEntrySpanOperation(context.Background(), trace.NoopTagSetter{})
	require.NotNil(t, op)
	assert.Equal(t, base+1, orchestrion.GLSStackDepth(),
		"starting the operation must push one entry onto the GLS context stack")

	op.Finish()
	assert.Equal(t, base, orchestrion.GLSStackDepth(),
		"finishing the operation must pop its GLS entry (otherwise it leaks for the goroutine's lifetime)")
}

// TestServiceEntrySpanOperation_RepeatedRequestsDoNotGrowGLS simulates many
// successive requests on the same goroutine (the keep-alive case). With the
// leak present, the stack depth grows by 1 per iteration. With the fix, it
// stays at the baseline.
func TestServiceEntrySpanOperation_RepeatedRequestsDoNotGrowGLS(t *testing.T) {
	t.Cleanup(orchestrion.MockGLS())

	const iterations = 1000
	base := orchestrion.GLSStackDepth()

	for i := 0; i < iterations; i++ {
		op, _ := trace.StartServiceEntrySpanOperation(context.Background(), trace.NoopTagSetter{})
		op.Finish()
	}

	assert.Equal(t, base, orchestrion.GLSStackDepth(),
		"after %d start/finish cycles on the same goroutine, GLS depth must return to baseline", iterations)
}

Error Logs

No response

Go Env Output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugunintended behavior that has to be fixedneeds-triageNew issues that have not yet been triaged

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions