Conversation
Remove additional ddtrace appsec packages from site-packages.
5 tasks
rithikanarayan
approved these changes
Apr 6, 2026
Contributor
rithikanarayan
left a comment
There was a problem hiding this comment.
LGTM, thanks for doing this! Please wait until the e2e test status job has passed before merging. :)
brettlangdon
approved these changes
Apr 6, 2026
Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot
pushed a commit
to DataDog/dd-trace-py
that referenced
this pull request
Apr 7, 2026
…h_utils (#17334) ## Summary - Extract `rel_path()` and `_compute_file_line()` from `VulnerabilityBase` in `_iast/taint_sinks/_base.py` into shared functions (`rel_path` and `get_caller_frame_info`) in `_patch_utils.py`. - Migrate `insecure_cookie.py` to use the shared `get_caller_frame_info()` instead of `cls._compute_file_line()`. - Update `test_weak_hash.py` mock target from `get_info_frame` to `get_caller_frame_info`. - Both IAST and SCA can now reuse these functions without depending on IAST internals. Split out from #17156 to keep PRs incremental and reviewable. > **Important:** Before merging this PR, DataDog/datadog-lambda-python#761 must be merged first. ## Test plan - [ ] Existing IAST vulnerability tests pass (they call `VulnerabilityBase.report()` which now delegates to `get_caller_frame_info()`) - [ ] IAST cookie tests pass (`insecure_cookie.py` now uses shared function) - [ ] `test_weak_hash.py` edge case test passes with updated mock target 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot
pushed a commit
to DataDog/dd-trace-py
that referenced
this pull request
Apr 7, 2026
) ## Summary - Move the native `_stacktrace` C extension from `ddtrace/appsec/_iast/` to `ddtrace/appsec/_shared/` so it can be reused by both IAST and SCA without creating a dependency from SCA into IAST internals. - Update all imports, build configuration (`setup.py`), and test references to use the new `ddtrace.appsec._shared._stacktrace` path. Split out from #17156 to keep PRs incremental and reviewable. > **Important:** Before merging this PR, DataDog/datadog-lambda-python#761 must be merged first, since `datadog-lambda-python` imports `ddtrace.appsec._iast._stacktrace` and needs to be updated to the new path. ## Test plan - [ ] Existing IAST stacktrace tests pass (`tests/appsec/iast/test_stacktrace.py`) - [ ] IAST memcheck tests pass (`tests/appsec/iast_memcheck/test_iast_mem_check.py`) - [ ] Architecture loading module test passes (`tests/appsec/architectures/test_appsec_loading_modules.py`) - [ ] Serverless import test passes (`tests/internal/test_serverless.py`) - [ ] Native C extension builds correctly from new path 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot
pushed a commit
to DataDog/dd-trace-py
that referenced
this pull request
Apr 17, 2026
## Description Implements **Runtime SCA Reachability** — the tracer reports which vulnerable symbols (functions/methods in third-party libraries with known CVEs) have actually been invoked at runtime, reducing false positives from static SCA analysis. **RFC**: https://docs.google.com/document/d/1xDw9iG6h41VCEgJGTqoJdruRaNS4pYgNifO6nhiizWA/edit?tab=t.a9gurws0d8ua ### How it works When `DD_APPSEC_SCA_ENABLED=true`: 1. **CVE data loading** — At tracer startup, reads `_cve_data.json` containing vulnerability targets (symbol + CVE + version constraint). Filters against installed packages. 2. **Runtime instrumentation** — For each applicable CVE target, applies bytecode injection (`inject_hook`) to the vulnerable function. Supports both eager (already-imported modules) and lazy (via `ModuleWatchdog` for deferred imports). 3. **CVE registration** — Immediately registers all applicable CVEs on their dependencies with `reached: []` in the telemetry `app-dependencies-loaded` payload. The backend knows which CVEs apply before any symbol is hit. 4. **Reachability detection** — When an instrumented function executes, the hook captures the caller's file path, method name, and line number, then attaches it to the CVE's `reached` array. Only the first occurrence is reported per CVE (RFC: "reporting a single occurrence is sufficient"). 5. **Telemetry reporting** — On each heartbeat (default 60s), dependencies with new reachability data are re-reported with all their metadata via `app-dependencies-loaded`. ### Telemetry payload (RFC v3) ```json { "request_type": "app-dependencies-loaded", "payload": { "dependencies": [ { "name": "requests", "version": "2.31.0", "metadata": [ { "type": "reachability", "value": "{\"id\":\"CVE-2024-35195\",\"reached\":[{\"path\":\"myapp/views.py\",\"method\":\"handle_request\",\"line\":42}]}" } ] } ] } } ``` Before any symbol is hit, CVEs are reported with `"reached":[]`. When a symbol executes, the first caller info is added to the `reached` array. ## Performance Benchmark: `scripts/perf_bench_heartbeat_cycles.py` Measures `collect_report()` execution time per heartbeat cycle under different scenarios. The overhead is paid only in the background telemetry thread (every 60s), never in user request paths. - **SCA OFF (main)**: baseline on `main` branch — old `update_imported_dependencies()` path, no `DependencyTracker`, no re-report scan - **SCA OFF**: this branch with `DD_APPSEC_SCA_ENABLED=false` — new `DependencyTracker` with `metadata=None`, re-report scan skipped - **SCA ON**: this branch with `DD_APPSEC_SCA_ENABLED=true` — new `DependencyTracker` with `metadata=[]` - **Overhead**: `(SCA ON − SCA OFF) / SCA OFF` ### 1,000 dependencies (typical large application) | Heartbeat Cycle | SCA OFF (main) | SCA OFF | SCA ON | Overhead | |---|---|---|---|---| | First heartbeat (all new) | 4.56ms | 4.63ms | 5.10ms | **+10.1%** | | Idle (nothing to report) | 0.2us | 72.2us | 239.6us | **+231.8%** | | CVE registration (100 CVEs, reached=[]) | 0.3us | 73.5us | 1.20ms | **+1527.5%** | | SCA hits (100 hits on 50 deps) | 0.3us | 73.7us | 1.44ms | **+1857.1%** | ### 10,000 dependencies (extreme scale) | Heartbeat Cycle | SCA OFF (main) | SCA OFF | SCA ON | Overhead | |---|---|---|---|---| | First heartbeat (all new) | 53.15ms | 57.18ms | 62.12ms | **+8.6%** | | Idle (nothing to report) | 0.3us | 742.6us | 2.20ms | **+195.7%** | | CVE registration (1,000 CVEs, reached=[]) | 0.3us | 720.7us | 13.51ms | **+1774.4%** | | SCA hits (1,000 hits on 500 deps) | 0.3us | 718.9us | 15.56ms | **+2064.2%** | ### Payload size | Scenario (1,000 deps) | Entries | Size | |---|---|---| | First heartbeat SCA OFF | 1,000 | 62 KB | | First heartbeat SCA ON | 1,000 | 62 KB | | CVE registration (100 CVEs) | 50 | 10 KB | | SCA hits (100 hits) | 50 | 16 KB | ### Memory overhead | Scenario (1,000 deps) | SCA OFF | SCA ON | Delta | |---|---|---|---| | Idle heartbeat | 155.3 KB | 192.5 KB | +37.2 KB | | CVE registration (100 CVEs) | 136.4 KB | 205.8 KB | +69.4 KB | | SCA hits (100 hits) | 136.2 KB | 208.5 KB | +72.3 KB | > **Note on overhead**: The percentage overhead for idle, CVE registration, and SCA hits appears very high because the SCA OFF baseline includes mock/benchmark harness overhead (~72us at 1K deps) that dominates the measurement. In absolute terms: > > - **First heartbeat**: nearly identical across all three columns (4.56ms → 4.63ms → 5.10ms at 1K deps), confirming the `DependencyTracker` refactor adds **minimal overhead** to initial dependency discovery. > - **Idle heartbeat with SCA OFF**: ~72us includes benchmark mock overhead; measured independently at ~8us (lock acquisition + `get_newly_imported_modules()` + lazy config check). The re-report scan is **completely skipped** when SCA is disabled. > - **SCA ON worst case** at 1,000 dependencies: **1.44ms per 60s heartbeat** — 0.002% of the cycle. > - **SCA ON worst case** at 10,000 dependencies with 1,000 CVEs: **16ms** — 0.027% of the heartbeat interval. > > All overhead runs entirely in the background telemetry thread and does not affect user request latency. ### SLO benchmark A CI-integrated benchmark suite is available at `benchmarks/telemetry_dependencies/` with 8 scenarios covering first/idle/cve/hits phases at 100 and 1,000 dependencies. It is triggered automatically when `ddtrace/internal/telemetry/dependency*.py` or `ddtrace/appsec/sca/*` files change and compares performance between the base branch and this PR. ## Risks - **Bytecode injection**: Uses the existing `inject_hook` infrastructure from dd-trace-py. The hook is exception-safe (wrapped in try/except) and never raises in user code. - **Memory**: `DependencyEntry` objects add ~150 bytes per dependency vs plain strings. At 1,000 deps this is ~150KB total — negligible. - **Lock contention**: The `DependencyTracker._lock` is held briefly during `attach_metadata` calls from the SCA hook. After the first hit per CVE (max reached=1), subsequent hook invocations short-circuit before any lock acquisition. ## Additional Notes - Merge this PR first: DataDog/datadog-lambda-python#761 - Previous model PR benchmarks: #17092 - The static CVE JSON (`_cve_data.json`) will be replaced by Remote Config in the long-term solution Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
dubloom
pushed a commit
to DataDog/dd-trace-py
that referenced
this pull request
Apr 21, 2026
## Description Implements **Runtime SCA Reachability** — the tracer reports which vulnerable symbols (functions/methods in third-party libraries with known CVEs) have actually been invoked at runtime, reducing false positives from static SCA analysis. **RFC**: https://docs.google.com/document/d/1xDw9iG6h41VCEgJGTqoJdruRaNS4pYgNifO6nhiizWA/edit?tab=t.a9gurws0d8ua ### How it works When `DD_APPSEC_SCA_ENABLED=true`: 1. **CVE data loading** — At tracer startup, reads `_cve_data.json` containing vulnerability targets (symbol + CVE + version constraint). Filters against installed packages. 2. **Runtime instrumentation** — For each applicable CVE target, applies bytecode injection (`inject_hook`) to the vulnerable function. Supports both eager (already-imported modules) and lazy (via `ModuleWatchdog` for deferred imports). 3. **CVE registration** — Immediately registers all applicable CVEs on their dependencies with `reached: []` in the telemetry `app-dependencies-loaded` payload. The backend knows which CVEs apply before any symbol is hit. 4. **Reachability detection** — When an instrumented function executes, the hook captures the caller's file path, method name, and line number, then attaches it to the CVE's `reached` array. Only the first occurrence is reported per CVE (RFC: "reporting a single occurrence is sufficient"). 5. **Telemetry reporting** — On each heartbeat (default 60s), dependencies with new reachability data are re-reported with all their metadata via `app-dependencies-loaded`. ### Telemetry payload (RFC v3) ```json { "request_type": "app-dependencies-loaded", "payload": { "dependencies": [ { "name": "requests", "version": "2.31.0", "metadata": [ { "type": "reachability", "value": "{\"id\":\"CVE-2024-35195\",\"reached\":[{\"path\":\"myapp/views.py\",\"method\":\"handle_request\",\"line\":42}]}" } ] } ] } } ``` Before any symbol is hit, CVEs are reported with `"reached":[]`. When a symbol executes, the first caller info is added to the `reached` array. ## Performance Benchmark: `scripts/perf_bench_heartbeat_cycles.py` Measures `collect_report()` execution time per heartbeat cycle under different scenarios. The overhead is paid only in the background telemetry thread (every 60s), never in user request paths. - **SCA OFF (main)**: baseline on `main` branch — old `update_imported_dependencies()` path, no `DependencyTracker`, no re-report scan - **SCA OFF**: this branch with `DD_APPSEC_SCA_ENABLED=false` — new `DependencyTracker` with `metadata=None`, re-report scan skipped - **SCA ON**: this branch with `DD_APPSEC_SCA_ENABLED=true` — new `DependencyTracker` with `metadata=[]` - **Overhead**: `(SCA ON − SCA OFF) / SCA OFF` ### 1,000 dependencies (typical large application) | Heartbeat Cycle | SCA OFF (main) | SCA OFF | SCA ON | Overhead | |---|---|---|---|---| | First heartbeat (all new) | 4.56ms | 4.63ms | 5.10ms | **+10.1%** | | Idle (nothing to report) | 0.2us | 72.2us | 239.6us | **+231.8%** | | CVE registration (100 CVEs, reached=[]) | 0.3us | 73.5us | 1.20ms | **+1527.5%** | | SCA hits (100 hits on 50 deps) | 0.3us | 73.7us | 1.44ms | **+1857.1%** | ### 10,000 dependencies (extreme scale) | Heartbeat Cycle | SCA OFF (main) | SCA OFF | SCA ON | Overhead | |---|---|---|---|---| | First heartbeat (all new) | 53.15ms | 57.18ms | 62.12ms | **+8.6%** | | Idle (nothing to report) | 0.3us | 742.6us | 2.20ms | **+195.7%** | | CVE registration (1,000 CVEs, reached=[]) | 0.3us | 720.7us | 13.51ms | **+1774.4%** | | SCA hits (1,000 hits on 500 deps) | 0.3us | 718.9us | 15.56ms | **+2064.2%** | ### Payload size | Scenario (1,000 deps) | Entries | Size | |---|---|---| | First heartbeat SCA OFF | 1,000 | 62 KB | | First heartbeat SCA ON | 1,000 | 62 KB | | CVE registration (100 CVEs) | 50 | 10 KB | | SCA hits (100 hits) | 50 | 16 KB | ### Memory overhead | Scenario (1,000 deps) | SCA OFF | SCA ON | Delta | |---|---|---|---| | Idle heartbeat | 155.3 KB | 192.5 KB | +37.2 KB | | CVE registration (100 CVEs) | 136.4 KB | 205.8 KB | +69.4 KB | | SCA hits (100 hits) | 136.2 KB | 208.5 KB | +72.3 KB | > **Note on overhead**: The percentage overhead for idle, CVE registration, and SCA hits appears very high because the SCA OFF baseline includes mock/benchmark harness overhead (~72us at 1K deps) that dominates the measurement. In absolute terms: > > - **First heartbeat**: nearly identical across all three columns (4.56ms → 4.63ms → 5.10ms at 1K deps), confirming the `DependencyTracker` refactor adds **minimal overhead** to initial dependency discovery. > - **Idle heartbeat with SCA OFF**: ~72us includes benchmark mock overhead; measured independently at ~8us (lock acquisition + `get_newly_imported_modules()` + lazy config check). The re-report scan is **completely skipped** when SCA is disabled. > - **SCA ON worst case** at 1,000 dependencies: **1.44ms per 60s heartbeat** — 0.002% of the cycle. > - **SCA ON worst case** at 10,000 dependencies with 1,000 CVEs: **16ms** — 0.027% of the heartbeat interval. > > All overhead runs entirely in the background telemetry thread and does not affect user request latency. ### SLO benchmark A CI-integrated benchmark suite is available at `benchmarks/telemetry_dependencies/` with 8 scenarios covering first/idle/cve/hits phases at 100 and 1,000 dependencies. It is triggered automatically when `ddtrace/internal/telemetry/dependency*.py` or `ddtrace/appsec/sca/*` files change and compares performance between the base branch and this PR. ## Risks - **Bytecode injection**: Uses the existing `inject_hook` infrastructure from dd-trace-py. The hook is exception-safe (wrapped in try/except) and never raises in user code. - **Memory**: `DependencyEntry` objects add ~150 bytes per dependency vs plain strings. At 1,000 deps this is ~150KB total — negligible. - **Lock contention**: The `DependencyTracker._lock` is held briefly during `attach_metadata` calls from the SCA hook. After the first hit per CVE (max reached=1), subsequent hook invocations short-circuit before any lock acquisition. ## Additional Notes - Merge this PR first: DataDog/datadog-lambda-python#761 - Previous model PR benchmarks: #17092 - The static CVE JSON (`_cve_data.json`) will be replaced by Remote Config in the long-term solution Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Remove additional ddtrace appsec packages from site-packages.
What does this PR do?
Motivation
Testing Guidelines
Additional Notes
Types of Changes
Check all that apply