Skip to content

Remove unused ddtrace appsec packages#761

Merged
avara1986 merged 3 commits intomainfrom
avara1986-patch-1
Apr 7, 2026
Merged

Remove unused ddtrace appsec packages#761
avara1986 merged 3 commits intomainfrom
avara1986-patch-1

Conversation

@avara1986
Copy link
Copy Markdown
Member

Remove additional ddtrace appsec packages from site-packages.

What does this PR do?

Motivation

Testing Guidelines

Additional Notes

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

Remove additional ddtrace appsec packages from site-packages.
@avara1986 avara1986 marked this pull request as ready for review April 6, 2026 12:31
@avara1986 avara1986 requested review from a team as code owners April 6, 2026 12:31
@avara1986 avara1986 requested a review from lym953 April 6, 2026 12:31
Copy link
Copy Markdown
Contributor

@rithikanarayan rithikanarayan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this! Please wait until the e2e test status job has passed before merging. :)

Comment thread Dockerfile Outdated
Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
@avara1986 avara1986 merged commit affe9c7 into main Apr 7, 2026
105 checks passed
@avara1986 avara1986 deleted the avara1986-patch-1 branch April 7, 2026 07:36
gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/dd-trace-py that referenced this pull request Apr 7, 2026
…h_utils (#17334)

## Summary
- Extract `rel_path()` and `_compute_file_line()` from `VulnerabilityBase` in `_iast/taint_sinks/_base.py` into shared functions (`rel_path` and `get_caller_frame_info`) in `_patch_utils.py`.
- Migrate `insecure_cookie.py` to use the shared `get_caller_frame_info()` instead of `cls._compute_file_line()`.
- Update `test_weak_hash.py` mock target from `get_info_frame` to `get_caller_frame_info`.
- Both IAST and SCA can now reuse these functions without depending on IAST internals.

Split out from #17156 to keep PRs incremental and reviewable.

> **Important:** Before merging this PR, DataDog/datadog-lambda-python#761 must be merged first.

## Test plan
- [ ] Existing IAST vulnerability tests pass (they call `VulnerabilityBase.report()` which now delegates to `get_caller_frame_info()`)
- [ ] IAST cookie tests pass (`insecure_cookie.py` now uses shared function)
- [ ] `test_weak_hash.py` edge case test passes with updated mock target

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/dd-trace-py that referenced this pull request Apr 7, 2026
)

## Summary
- Move the native `_stacktrace` C extension from `ddtrace/appsec/_iast/` to `ddtrace/appsec/_shared/` so it can be reused by both IAST and SCA without creating a dependency from SCA into IAST internals.
- Update all imports, build configuration (`setup.py`), and test references to use the new `ddtrace.appsec._shared._stacktrace` path.

Split out from #17156 to keep PRs incremental and reviewable.

> **Important:** Before merging this PR, DataDog/datadog-lambda-python#761 must be merged first, since `datadog-lambda-python` imports `ddtrace.appsec._iast._stacktrace` and needs to be updated to the new path.

## Test plan
- [ ] Existing IAST stacktrace tests pass (`tests/appsec/iast/test_stacktrace.py`)
- [ ] IAST memcheck tests pass (`tests/appsec/iast_memcheck/test_iast_mem_check.py`)
- [ ] Architecture loading module test passes (`tests/appsec/architectures/test_appsec_loading_modules.py`)
- [ ] Serverless import test passes (`tests/internal/test_serverless.py`)
- [ ] Native C extension builds correctly from new path

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/dd-trace-py that referenced this pull request Apr 17, 2026
## Description

Implements **Runtime SCA Reachability** — the tracer reports which vulnerable symbols (functions/methods in third-party libraries with known CVEs) have actually been invoked at runtime, reducing false positives from static SCA analysis.

**RFC**: https://docs.google.com/document/d/1xDw9iG6h41VCEgJGTqoJdruRaNS4pYgNifO6nhiizWA/edit?tab=t.a9gurws0d8ua

### How it works

When `DD_APPSEC_SCA_ENABLED=true`:

1. **CVE data loading** — At tracer startup, reads `_cve_data.json` containing vulnerability targets (symbol + CVE + version constraint). Filters against installed packages.
2. **Runtime instrumentation** — For each applicable CVE target, applies bytecode injection (`inject_hook`) to the vulnerable function. Supports both eager (already-imported modules) and lazy (via `ModuleWatchdog` for deferred imports).
3. **CVE registration** — Immediately registers all applicable CVEs on their dependencies with `reached: []` in the telemetry `app-dependencies-loaded` payload. The backend knows which CVEs apply before any symbol is hit.
4. **Reachability detection** — When an instrumented function executes, the hook captures the caller's file path, method name, and line number, then attaches it to the CVE's `reached` array. Only the first occurrence is reported per CVE (RFC: "reporting a single occurrence is sufficient").
5. **Telemetry reporting** — On each heartbeat (default 60s), dependencies with new reachability data are re-reported with all their metadata via `app-dependencies-loaded`.

### Telemetry payload (RFC v3)

```json
{
  "request_type": "app-dependencies-loaded",
  "payload": {
    "dependencies": [
      {
        "name": "requests",
        "version": "2.31.0",
        "metadata": [
          {
            "type": "reachability",
            "value": "{\"id\":\"CVE-2024-35195\",\"reached\":[{\"path\":\"myapp/views.py\",\"method\":\"handle_request\",\"line\":42}]}"
          }
        ]
      }
    ]
  }
}
```

Before any symbol is hit, CVEs are reported with `"reached":[]`. When a symbol executes, the first caller info is added to the `reached` array.

## Performance

Benchmark: `scripts/perf_bench_heartbeat_cycles.py`

Measures `collect_report()` execution time per heartbeat cycle under different scenarios. The overhead is paid only in the background telemetry thread (every 60s), never in user request paths.

- **SCA OFF (main)**: baseline on `main` branch — old `update_imported_dependencies()` path, no `DependencyTracker`, no re-report scan
- **SCA OFF**: this branch with `DD_APPSEC_SCA_ENABLED=false` — new `DependencyTracker` with `metadata=None`, re-report scan skipped
- **SCA ON**: this branch with `DD_APPSEC_SCA_ENABLED=true` — new `DependencyTracker` with `metadata=[]`
- **Overhead**: `(SCA ON − SCA OFF) / SCA OFF`

### 1,000 dependencies (typical large application)

| Heartbeat Cycle | SCA OFF (main) | SCA OFF | SCA ON | Overhead |
|---|---|---|---|---|
| First heartbeat (all new) | 4.56ms | 4.63ms | 5.10ms | **+10.1%** |
| Idle (nothing to report) | 0.2us | 72.2us | 239.6us | **+231.8%** |
| CVE registration (100 CVEs, reached=[]) | 0.3us | 73.5us | 1.20ms | **+1527.5%** |
| SCA hits (100 hits on 50 deps) | 0.3us | 73.7us | 1.44ms | **+1857.1%** |

### 10,000 dependencies (extreme scale)

| Heartbeat Cycle | SCA OFF (main) | SCA OFF | SCA ON | Overhead |
|---|---|---|---|---|
| First heartbeat (all new) | 53.15ms | 57.18ms | 62.12ms | **+8.6%** |
| Idle (nothing to report) | 0.3us | 742.6us | 2.20ms | **+195.7%** |
| CVE registration (1,000 CVEs, reached=[]) | 0.3us | 720.7us | 13.51ms | **+1774.4%** |
| SCA hits (1,000 hits on 500 deps) | 0.3us | 718.9us | 15.56ms | **+2064.2%** |

### Payload size

| Scenario (1,000 deps) | Entries | Size |
|---|---|---|
| First heartbeat SCA OFF | 1,000 | 62 KB |
| First heartbeat SCA ON | 1,000 | 62 KB |
| CVE registration (100 CVEs) | 50 | 10 KB |
| SCA hits (100 hits) | 50 | 16 KB |

### Memory overhead

| Scenario (1,000 deps) | SCA OFF | SCA ON | Delta |
|---|---|---|---|
| Idle heartbeat | 155.3 KB | 192.5 KB | +37.2 KB |
| CVE registration (100 CVEs) | 136.4 KB | 205.8 KB | +69.4 KB |
| SCA hits (100 hits) | 136.2 KB | 208.5 KB | +72.3 KB |

> **Note on overhead**: The percentage overhead for idle, CVE registration, and SCA hits appears very high because the SCA OFF baseline includes mock/benchmark harness overhead (~72us at 1K deps) that dominates the measurement. In absolute terms:
>
> - **First heartbeat**: nearly identical across all three columns (4.56ms → 4.63ms → 5.10ms at 1K deps), confirming the `DependencyTracker` refactor adds **minimal overhead** to initial dependency discovery.
> - **Idle heartbeat with SCA OFF**: ~72us includes benchmark mock overhead; measured independently at ~8us (lock acquisition + `get_newly_imported_modules()` + lazy config check). The re-report scan is **completely skipped** when SCA is disabled.
> - **SCA ON worst case** at 1,000 dependencies: **1.44ms per 60s heartbeat** — 0.002% of the cycle.
> - **SCA ON worst case** at 10,000 dependencies with 1,000 CVEs: **16ms** — 0.027% of the heartbeat interval.
>
> All overhead runs entirely in the background telemetry thread and does not affect user request latency.

### SLO benchmark

A CI-integrated benchmark suite is available at `benchmarks/telemetry_dependencies/` with 8 scenarios covering first/idle/cve/hits phases at 100 and 1,000 dependencies. It is triggered automatically when `ddtrace/internal/telemetry/dependency*.py` or `ddtrace/appsec/sca/*` files change and compares performance between the base branch and this PR.

## Risks

- **Bytecode injection**: Uses the existing `inject_hook` infrastructure from dd-trace-py. The hook is exception-safe (wrapped in try/except) and never raises in user code.
- **Memory**: `DependencyEntry` objects add ~150 bytes per dependency vs plain strings. At 1,000 deps this is ~150KB total — negligible.
- **Lock contention**: The `DependencyTracker._lock` is held briefly during `attach_metadata` calls from the SCA hook. After the first hit per CVE (max reached=1), subsequent hook invocations short-circuit before any lock acquisition.

## Additional Notes

- Merge this PR first: DataDog/datadog-lambda-python#761
- Previous model PR benchmarks: #17092
- The static CVE JSON (`_cve_data.json`) will be replaced by Remote Config in the long-term solution

Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
dubloom pushed a commit to DataDog/dd-trace-py that referenced this pull request Apr 21, 2026
## Description

Implements **Runtime SCA Reachability** — the tracer reports which vulnerable symbols (functions/methods in third-party libraries with known CVEs) have actually been invoked at runtime, reducing false positives from static SCA analysis.

**RFC**: https://docs.google.com/document/d/1xDw9iG6h41VCEgJGTqoJdruRaNS4pYgNifO6nhiizWA/edit?tab=t.a9gurws0d8ua

### How it works

When `DD_APPSEC_SCA_ENABLED=true`:

1. **CVE data loading** — At tracer startup, reads `_cve_data.json` containing vulnerability targets (symbol + CVE + version constraint). Filters against installed packages.
2. **Runtime instrumentation** — For each applicable CVE target, applies bytecode injection (`inject_hook`) to the vulnerable function. Supports both eager (already-imported modules) and lazy (via `ModuleWatchdog` for deferred imports).
3. **CVE registration** — Immediately registers all applicable CVEs on their dependencies with `reached: []` in the telemetry `app-dependencies-loaded` payload. The backend knows which CVEs apply before any symbol is hit.
4. **Reachability detection** — When an instrumented function executes, the hook captures the caller's file path, method name, and line number, then attaches it to the CVE's `reached` array. Only the first occurrence is reported per CVE (RFC: "reporting a single occurrence is sufficient").
5. **Telemetry reporting** — On each heartbeat (default 60s), dependencies with new reachability data are re-reported with all their metadata via `app-dependencies-loaded`.

### Telemetry payload (RFC v3)

```json
{
  "request_type": "app-dependencies-loaded",
  "payload": {
    "dependencies": [
      {
        "name": "requests",
        "version": "2.31.0",
        "metadata": [
          {
            "type": "reachability",
            "value": "{\"id\":\"CVE-2024-35195\",\"reached\":[{\"path\":\"myapp/views.py\",\"method\":\"handle_request\",\"line\":42}]}"
          }
        ]
      }
    ]
  }
}
```

Before any symbol is hit, CVEs are reported with `"reached":[]`. When a symbol executes, the first caller info is added to the `reached` array.

## Performance

Benchmark: `scripts/perf_bench_heartbeat_cycles.py`

Measures `collect_report()` execution time per heartbeat cycle under different scenarios. The overhead is paid only in the background telemetry thread (every 60s), never in user request paths.

- **SCA OFF (main)**: baseline on `main` branch — old `update_imported_dependencies()` path, no `DependencyTracker`, no re-report scan
- **SCA OFF**: this branch with `DD_APPSEC_SCA_ENABLED=false` — new `DependencyTracker` with `metadata=None`, re-report scan skipped
- **SCA ON**: this branch with `DD_APPSEC_SCA_ENABLED=true` — new `DependencyTracker` with `metadata=[]`
- **Overhead**: `(SCA ON − SCA OFF) / SCA OFF`

### 1,000 dependencies (typical large application)

| Heartbeat Cycle | SCA OFF (main) | SCA OFF | SCA ON | Overhead |
|---|---|---|---|---|
| First heartbeat (all new) | 4.56ms | 4.63ms | 5.10ms | **+10.1%** |
| Idle (nothing to report) | 0.2us | 72.2us | 239.6us | **+231.8%** |
| CVE registration (100 CVEs, reached=[]) | 0.3us | 73.5us | 1.20ms | **+1527.5%** |
| SCA hits (100 hits on 50 deps) | 0.3us | 73.7us | 1.44ms | **+1857.1%** |

### 10,000 dependencies (extreme scale)

| Heartbeat Cycle | SCA OFF (main) | SCA OFF | SCA ON | Overhead |
|---|---|---|---|---|
| First heartbeat (all new) | 53.15ms | 57.18ms | 62.12ms | **+8.6%** |
| Idle (nothing to report) | 0.3us | 742.6us | 2.20ms | **+195.7%** |
| CVE registration (1,000 CVEs, reached=[]) | 0.3us | 720.7us | 13.51ms | **+1774.4%** |
| SCA hits (1,000 hits on 500 deps) | 0.3us | 718.9us | 15.56ms | **+2064.2%** |

### Payload size

| Scenario (1,000 deps) | Entries | Size |
|---|---|---|
| First heartbeat SCA OFF | 1,000 | 62 KB |
| First heartbeat SCA ON | 1,000 | 62 KB |
| CVE registration (100 CVEs) | 50 | 10 KB |
| SCA hits (100 hits) | 50 | 16 KB |

### Memory overhead

| Scenario (1,000 deps) | SCA OFF | SCA ON | Delta |
|---|---|---|---|
| Idle heartbeat | 155.3 KB | 192.5 KB | +37.2 KB |
| CVE registration (100 CVEs) | 136.4 KB | 205.8 KB | +69.4 KB |
| SCA hits (100 hits) | 136.2 KB | 208.5 KB | +72.3 KB |

> **Note on overhead**: The percentage overhead for idle, CVE registration, and SCA hits appears very high because the SCA OFF baseline includes mock/benchmark harness overhead (~72us at 1K deps) that dominates the measurement. In absolute terms:
>
> - **First heartbeat**: nearly identical across all three columns (4.56ms → 4.63ms → 5.10ms at 1K deps), confirming the `DependencyTracker` refactor adds **minimal overhead** to initial dependency discovery.
> - **Idle heartbeat with SCA OFF**: ~72us includes benchmark mock overhead; measured independently at ~8us (lock acquisition + `get_newly_imported_modules()` + lazy config check). The re-report scan is **completely skipped** when SCA is disabled.
> - **SCA ON worst case** at 1,000 dependencies: **1.44ms per 60s heartbeat** — 0.002% of the cycle.
> - **SCA ON worst case** at 10,000 dependencies with 1,000 CVEs: **16ms** — 0.027% of the heartbeat interval.
>
> All overhead runs entirely in the background telemetry thread and does not affect user request latency.

### SLO benchmark

A CI-integrated benchmark suite is available at `benchmarks/telemetry_dependencies/` with 8 scenarios covering first/idle/cve/hits phases at 100 and 1,000 dependencies. It is triggered automatically when `ddtrace/internal/telemetry/dependency*.py` or `ddtrace/appsec/sca/*` files change and compares performance between the base branch and this PR.

## Risks

- **Bytecode injection**: Uses the existing `inject_hook` infrastructure from dd-trace-py. The hook is exception-safe (wrapped in try/except) and never raises in user code.
- **Memory**: `DependencyEntry` objects add ~150 bytes per dependency vs plain strings. At 1,000 deps this is ~150KB total — negligible.
- **Lock contention**: The `DependencyTracker._lock` is held briefly during `attach_metadata` calls from the SCA hook. After the first hit per CVE (max reached=1), subsequent hook invocations short-circuit before any lock acquisition.

## Additional Notes

- Merge this PR first: DataDog/datadog-lambda-python#761
- Previous model PR benchmarks: #17092
- The static CVE JSON (`_cve_data.json`) will be replaced by Remote Config in the long-term solution

Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants