Tracing Context Corruption When (Nested Pipeline) Component Fails During Span Cleanup

[Note: this whole thing is generated by cursor. I think it is accurate, but I apologize if there are some minor issues somewhere]

[Edit: This issue involves nested pipeline component failures]

## Summary

When a Haystack component contains its own internal `AsyncPipeline` (nested pipeline) and that internal pipeline fails during execution, the tracing context can become corrupted. This specifically occurs when a component like `IntentClassifier` creates and runs its own sub-pipeline within the context of a larger parent pipeline. If the nested pipeline fails and exceptions occur during span cleanup, Langfuse tracing context becomes permanently corrupted.

## Environment

- **haystack-integrations[langfuse]**: 1.1.2, but later versions may be also affected
- **haystack**: 2.13.x
- **langfuse**: 2.x
- **Python**: 3.10

## Root Cause Analysis

The issue occurs with this specific architecture:

```
Main AsyncPipeline (Parent Trace Context)
└── IntentClassifier Component
    └── Internal Pipeline (Nested Trace Context)
        ├── ChatPromptBuilder
        ├── ChatGenerator  
        └── IntentParser ❌ (Fails here)
```

**Failure Sequence:**
1. Main pipeline starts execution with parent trace context
2. `IntentClassifier` component is invoked as part of main pipeline
3. `IntentClassifier.run()` creates and executes its own internal `Pipeline`
4. Internal pipeline's `IntentParser` fails (e.g., JSON parsing error from LLM response)
5. Exception propagates to the nested pipeline's `LangfuseTracer.trace()` context manager
6. During span cleanup, if `span_handler.handle()` or `raw_span.end()` fail with the exception data
7. `self._context.pop()` is never executed in the tracer
8. The failed span remains stuck in the tracer's context stack
9. **All subsequent runs of the main pipeline use the stuck nested span as parent**

<img width="370" alt="Image" src="https://github.com/user-attachments/assets/99a2a269-96c0-4d66-9512-3ba6a0b1dc7a" />

The key insight is that this isn't just a component failure - it's specifically about **nested pipeline execution** corrupting the Langfuse tracing context, especially the parent pipeline trace.

## Reproduction Steps

Here's a minimal reproduction that demonstrates the nested pipeline issue:

```python
from haystack import component, Pipeline, AsyncPipeline
from haystack_integrations.tracing.langfuse import LangfuseConnector
import json

@component 
class FailingParser:
    @component.output_types(result=str)
    def run(self, data: str):
        # This will fail with ValueError when data is not valid JSON
        parsed = json.loads(data)
        return {"result": parsed["key"]}

@component
class ComponentWithNestedPipeline:
    def __init__(self):
        # This simulates IntentClassifier's internal pipeline
        self.internal_pipeline = Pipeline()
        self.internal_pipeline.add_component("parser", FailingParser())
    
    @component.output_types(result=str) 
    def run(self, input_data: str):
        # Run nested pipeline - this is where corruption occurs
        result = self.internal_pipeline.run({"parser": {"data": input_data}})
        return {"result": result["parser"]["result"]}

# Set up tracing
tracer = LangfuseConnector("test")

# Create main pipeline with nested component
main_pipeline = Pipeline()
main_pipeline.add_component("nested_component", ComponentWithNestedPipeline())
main_pipeline.add_component("tracer", tracer)

print("=== First Run (Will Fail and Corrupt Context) ===")
try:
    main_pipeline.run({"nested_component": {"input_data": "invalid json"}})
except Exception as e:
    print(f"First run failed as expected: {e}")

print(f"Tracer context after first run: {len(tracer._tracer._context)}")

print("=== Second Run (Will Have Corrupted Tracing Context) ===") 
try:
    result = main_pipeline.run({"nested_component": {"input_data": '{"key": "valid"}'}})
    print(f"Second run succeeded: {result}")
    print("❌ But the trace hierarchy is now corrupted!")
except Exception as e:
    print(f"Second run failed: {e}")

print(f"Tracer context after second run: {len(tracer._tracer._context)}")
# Should be 0, but will be > 0 showing context corruption
```

## Expected Behavior

- Each main pipeline run should create its own independent trace
- Nested pipeline failures should not affect the parent pipeline's tracing context  
- Failed nested spans should be properly cleaned up
- Subsequent main pipeline runs should start with clean tracing context

## Actual Behavior

- Failed nested pipeline spans remain in the tracer's context indefinitely
- Subsequent main pipeline runs inherit the failed nested span as parent
- Trace hierarchy shows main pipeline operations as children of failed nested operations
- Memory leak as failed spans accumulate over time

## Impact in Production

This specifically affects component architectures like:

- **IntentClassifier**: Contains internal pipeline for prompt building → LLM generation → JSON parsing
- **Multi-step RAG components**: Have internal pipelines for retrieval → reranking → generation
- **Validation components**: Run internal pipelines for content checking
- **Any composite component pattern**: Where components encapsulate their own pipelines

Production symptoms:
- All pipeline traces appear as children of old failed operations
- Difficult to debug actual pipeline flows
- Tracing dashboards show confusing hierarchies
- Long-running services accumulate memory in tracer contexts

## Proposed Fix

The fix needs to handle nested pipeline context isolation. One approach is to ensure that nested pipeline failures don't corrupt parent contexts:

```python
@contextlib.contextmanager  
def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None, parent_span: Optional[Span] = None) -> Iterator[Span]:
    # ... existing span creation code ...
    
    self._context.append(span)
    span.set_tags(tags)

    try:
        yield span
    finally:
        # Always clean up context, even if nested operations fail
        try:
            # Process span data (may fail with nested pipeline exceptions)
            self._span_handler.handle(span, component_type)
            
            # End span (may fail if span data is corrupted)
            raw_span = span.raw_span() 
            if isinstance(raw_span, (StatefulSpanClient, StatefulGenerationClient)):
                raw_span.end()
        except Exception as cleanup_error:
            # Log cleanup errors but don't let them corrupt context
            logger.warning(f"Error during span cleanup for {operation_name}: {cleanup_error}")
            # Consider marking span as failed but still ending it
        finally:
            # CRITICAL: Always pop context to prevent corruption
            # This is especially important for nested pipeline scenarios
            if self._context and self._context[-1] == span:
                self._context.pop()
            else:
                logger.error(f"Context corruption detected: expected {span} at top of stack")

        if self.enforce_flush:
            self.flush()
```

## Additional Context

This issue was discovered in a production system where:

1. Main chat pipeline processes user messages
2. `IntentClassifier` component runs its own internal pipeline (prompt builder → LLM → JSON parser)  
3. LLM occasionally returns unparseable responses
4. JSON parsing failures corrupt the main pipeline's tracing context
5. All subsequent chat interactions show up as children of the failed intent classification

The nested pipeline pattern is common in Haystack applications, making this a critical issue for production deployments.

## Workaround

Currently, the only workaround is to implement defensive exception handling in components with nested pipelines, but this silences legitimate errors that should be visible in traces.

---

**This affects any component that uses the "component with internal pipeline" pattern, which is a common architectural approach in Haystack applications.** 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracing Context Corruption When (Nested Pipeline) Component Fails During Span Cleanup #1976

Summary

Environment

Root Cause Analysis

Reproduction Steps

Expected Behavior

Actual Behavior

Impact in Production

Proposed Fix

Additional Context

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tracing Context Corruption When (Nested Pipeline) Component Fails During Span Cleanup #1976

Description

Summary

Environment

Root Cause Analysis

Reproduction Steps

Expected Behavior

Actual Behavior

Impact in Production

Proposed Fix

Additional Context

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions