|
| 1 | +# Context Propagation in Python |
| 2 | + |
| 3 | +This document covers how the Drift Python SDK handles tracing context propagation across different execution contexts, including edge cases and recommended patterns. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The SDK uses OpenTelemetry for distributed tracing, which relies on Python's `contextvars` module for context propagation. Understanding when context propagates automatically vs. when it requires explicit handling is crucial for correct trace hierarchies. |
| 8 | + |
| 9 | +## Context Propagation Behavior |
| 10 | + |
| 11 | +| Scenario | Auto-propagates? | Notes | |
| 12 | +|----------|------------------|-------| |
| 13 | +| `async/await` chains | ✅ Yes | Native `contextvars` support | |
| 14 | +| `ThreadPoolExecutor` | ❌ No | Requires explicit propagation | |
| 15 | +| `ProcessPoolExecutor` | ❌ No | Context cannot cross process boundaries | |
| 16 | +| `asyncio.run_in_executor()` | ❌ No | Same as ThreadPoolExecutor | |
| 17 | +| `asyncio.to_thread()` (Python 3.9+) | ✅ Yes | Recommended for blocking calls | |
| 18 | +| Callback-based libraries | ❌ No | Context lost when callback executes | |
| 19 | + |
| 20 | +## Stack Trace Capture |
| 21 | + |
| 22 | +The SDK captures stack traces for debugging and mock matching. Different components use different truncation levels: |
| 23 | + |
| 24 | +| Component | Max Frames | Use Case | |
| 25 | +|-----------|------------|----------| |
| 26 | +| Socket instrumentation (unpatched alerts) | Unlimited | Full debugging info | |
| 27 | +| `SpanUtils.capture_stack_trace()` | 10 (default) | Span metadata | |
| 28 | +| Communicator debug traces | 20 | Internal debugging | |
| 29 | + |
| 30 | +## ThreadPoolExecutor Pattern |
| 31 | + |
| 32 | +Context does **not** automatically propagate to thread pool workers. Use the explicit propagation pattern: |
| 33 | + |
| 34 | +```python |
| 35 | +from opentelemetry import context as otel_context |
| 36 | + |
| 37 | +def _run_with_context(ctx, fn, *args, **kwargs): |
| 38 | + """Run function with OpenTelemetry context in a thread pool.""" |
| 39 | + token = otel_context.attach(ctx) |
| 40 | + try: |
| 41 | + return fn(*args, **kwargs) |
| 42 | + finally: |
| 43 | + otel_context.detach(token) |
| 44 | + |
| 45 | +# Usage |
| 46 | +ctx = otel_context.get_current() |
| 47 | +with ThreadPoolExecutor(max_workers=4) as executor: |
| 48 | + future = executor.submit(_run_with_context, ctx, my_function, arg1) |
| 49 | +``` |
| 50 | + |
| 51 | +### Alternative: asyncio.to_thread() (Python 3.9+) |
| 52 | + |
| 53 | +For async code needing to run blocking operations, `asyncio.to_thread()` automatically propagates context: |
| 54 | + |
| 55 | +```python |
| 56 | +# Context propagates automatically - no wrapper needed |
| 57 | +result = await asyncio.to_thread(blocking_function, arg1) |
| 58 | +``` |
| 59 | + |
| 60 | +## Possible SDK-Level Solutions |
| 61 | + |
| 62 | +### Option 1: Manual Helper (Current Approach) |
| 63 | + |
| 64 | +**What:** Provide documented `_run_with_context()` pattern. |
| 65 | + |
| 66 | +**Pros:** Explicit, no magic, works everywhere |
| 67 | +**Cons:** Requires user code changes |
| 68 | + |
| 69 | +### Option 2: ContextAwareThreadPoolExecutor |
| 70 | + |
| 71 | +**What:** SDK provides a drop-in executor that auto-propagates context. |
| 72 | + |
| 73 | +```python |
| 74 | +from concurrent.futures import ThreadPoolExecutor |
| 75 | +import contextvars |
| 76 | + |
| 77 | +class ContextAwareThreadPoolExecutor(ThreadPoolExecutor): |
| 78 | + def submit(self, fn, *args, **kwargs): |
| 79 | + ctx = contextvars.copy_context() |
| 80 | + return super().submit(ctx.run, fn, *args, **kwargs) |
| 81 | +``` |
| 82 | + |
| 83 | +**Pros:** Clean API, opt-in |
| 84 | +**Cons:** User must change imports |
| 85 | + |
| 86 | +### Option 3: Monkey-patch ThreadPoolExecutor |
| 87 | + |
| 88 | +**What:** SDK patches `ThreadPoolExecutor.submit()` globally at initialization. |
| 89 | + |
| 90 | +**Pros:** Zero user code changes |
| 91 | +**Cons:** |
| 92 | + |
| 93 | +- High risk of breaking other libraries |
| 94 | +- Hidden global side effects |
| 95 | +- Performance overhead for all executors (even unrelated ones) |
| 96 | +- Debugging becomes harder |
| 97 | + |
| 98 | +**Recommendation:** Not recommended for tracing SDKs. |
| 99 | + |
| 100 | +## Comparison with Node.js SDK |
| 101 | + |
| 102 | +| Aspect | Python | Node.js | |
| 103 | +|--------|--------|---------| |
| 104 | +| Async context mechanism | `contextvars` (native) | `AsyncLocalStorage` via OpenTelemetry | |
| 105 | +| `async/await` propagation | ✅ Automatic | ❌ Requires `context.with()` | |
| 106 | +| Thread pools | ❌ Manual propagation | N/A (single-threaded) | |
| 107 | +| Callbacks | ❌ Context lost | ❌ Requires `context.bind()` | |
| 108 | + |
| 109 | +Python's native `contextvars` makes async code simpler—no explicit binding needed for `await` chains. However, thread pools and callbacks still require explicit handling in both languages. |
| 110 | + |
| 111 | +## Testing Context Propagation |
| 112 | + |
| 113 | +The FastAPI e2e tests include endpoints that verify context propagation: |
| 114 | + |
| 115 | +- `GET /api/test-async-context` - Verifies context across concurrent async calls |
| 116 | +- `GET /api/test-thread-context` - Verifies explicit thread pool propagation |
| 117 | + |
| 118 | +Run the e2e tests to validate: |
| 119 | + |
| 120 | +```bash |
| 121 | +cd drift/instrumentation/fastapi/e2e-tests |
| 122 | +./run.sh |
| 123 | +``` |
| 124 | + |
| 125 | +## Edge Cases to Watch For |
| 126 | + |
| 127 | +1. **Libraries using internal thread pools** (e.g., some HTTP clients, database drivers) - May lose context unless the library explicitly supports it |
| 128 | + |
| 129 | +2. **Fire-and-forget async tasks** - `asyncio.create_task()` preserves context, but if the task outlives the parent span, relationships may be unclear |
| 130 | + |
| 131 | +3. **Gevent/eventlet** - Green threads have different context semantics; not currently tested |
| 132 | + |
| 133 | +4. **Multiprocessing** - Context cannot be serialized across process boundaries; each process needs independent tracing setup |
0 commit comments