Skip to content

CPU Optimizations#3088

Merged
liquidsec merged 2 commits into
devfrom
cpu-optimizations-2026
May 21, 2026
Merged

CPU Optimizations#3088
liquidsec merged 2 commits into
devfrom
cpu-optimizations-2026

Conversation

@liquidsec

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to the CPU inquisition in #2074. After the first round of fixes the top remaining offenders were weighted_shuffle, random.choices (called from weighted_shuffle), and ipaddress.ip_address. This PR flattens all three in pure Python before considering anything more invasive (e.g. Rust/PyO3).

Changes

  • weighted_shuffle — replace the O(n²) "normalize remaining weights and draw with random.choices" loop with Efraimidis–Spirakis: assign each item a single weighted random key, sort once. Same distribution over orderings, O(n log n). random.choices was almost entirely an internal call from this function and disappears from the hot path.
  • make_ip_type — add a cheap _looks_like_ip character-set pre-filter so hostnames (the vast majority of inputs) bail out before paying for ipaddress' parse + double ValueError round-trip. Wrap the function in lru_cache(16384) since the same hosts/IPs are seen by many modules during a scan.
  • cached_ip_address / cached_ip_network — new thin LRU-cached wrappers around the stdlib calls. Same semantics (exceptions are not cached, so invalid input still raises). Swapped into the direct callers that bypass make_ip_type (the IP_ADDRESS event class, the IP_ADDRESS / IP_RANGE event seeds, is_ip, dnsresolve) to absorb the ~5× amplification on ip_address calls visible in the original profile.

Benchmarks

Measured against dev using the existing test_ipaddress_benchmarks.py and test_weighted_shuffle_benchmarks.py:

Benchmark Before After Speedup
make_ip_type (1700 valid IPs) 6.88 ms 107.5 µs ~64×
mixed_ip_operations (1000 mixed) 2.66 ms 1.37 ms ~1.9×
is_ip (1000 mixed) 1.88 ms 1.35 ms ~1.4×
weighted_shuffle typical (n=20) 31.66 µs 3.27 µs ~9.7×
weighted_shuffle priority (n=105) 361.88 µs 17.03 µs ~21×

The 64× on make_ip_type is the LRU cache showing its full effect on repeated inputs (matches real BBOT usage where the same IPs flow through many modules). The mixed/is_ip benchmarks have smaller gains because half their inputs are invalid — those raise ValueError, which lru_cache doesn't cache, so they re-take the slow path each round. The pre-filter still helps "random letters" inputs but not "malformed-but-IP-shaped" inputs.

Expected impact on the original #2074 profile: ip_address and random.choices should fall out of the top 50 entirely, like the regex hotspots did in the prior round.

- weighted_shuffle: replace O(n²) re-normalize-and-draw loop with
  Efraimidis-Spirakis (one biased random key per item, then sort).
  ~10x faster at n=20, ~21x at n=105.
- make_ip_type: lru_cache + cheap "looks like an IP" pre-filter so
  hostnames bail before paying for ipaddress' parse+exception path.
- Add cached_ip_address / cached_ip_network wrappers and use them in
  the IP_ADDRESS event class, event-seed helpers, is_ip, and
  dnsresolve to absorb the ~5x amplification on direct ip_address
  calls visible in #2074's CPU profile.
@liquidsec liquidsec changed the title CPU: O(n log n) weighted_shuffle + cached ip_address lookups CPU Optimizations May 12, 2026
@liquidsec liquidsec self-assigned this May 12, 2026
@codecov

codecov Bot commented May 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90%. Comparing base (4cfaa9c) to head (90a4776).
⚠️ Report is 61 commits behind head on dev.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #3088   +/-   ##
=====================================
- Coverage     90%     90%   -0%     
=====================================
  Files        444     444           
  Lines      38338   38354   +16     
=====================================
+ Hits       34277   34291   +14     
- Misses      4061    4063    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

📊 Performance Benchmark Report

Comparing dev (baseline) vs cpu-optimizations-2026 (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 4.24ms 4.28ms +0.8%
Bloom Filter Large Scale Dns Brute Force 17.92ms 17.91ms -0.0%
Large Closest Match Lookup 358.33ms 361.52ms +0.9%
Realistic Closest Match Workload 190.52ms 189.39ms -0.6%
Event Memory Medium Scan 1784 B/event 1755 B/event -1.6%
Event Memory Large Scan 1768 B/event 1863 B/event +5.3%
Event Validation Full Scan Startup Small Batch 421.90ms 425.67ms +0.9%
Event Validation Full Scan Startup Large Batch 593.98ms 564.25ms -5.0%
Make Event Autodetection Small 31.87ms 25.60ms -19.7% 🟢🟢 🚀
Make Event Autodetection Large 321.52ms 264.53ms -17.7% 🟢🟢 🚀
Make Event Explicit Types 14.20ms 11.73ms -17.3% 🟢🟢 🚀
Excavate Single Thread Small 4.082s 3.755s -8.0%
Excavate Single Thread Large 9.596s 9.270s -3.4%
Excavate Parallel Tasks Small 4.242s 3.955s -6.8%
Excavate Parallel Tasks Large 6.654s 6.423s -3.5%
Is Ip Performance 3.21ms 2.28ms -28.7% 🟢🟢🟢 🚀
Make Ip Type Performance 11.83ms 232.53µs -98.0% 🟢🟢🟢 🚀
Mixed Ip Operations 4.56ms 2.38ms -47.9% 🟢🟢🟢 🚀
Memory Use Web Crawl 656.5 MB 632.2 MB -3.7%
Memory Use Subdomain Enum 33.4 MB 35.7 MB +7.0%
Memory Use Deep Chain 7.8 MB 7.7 MB -1.0%
Memory Use Parallel Chains 20.9 MB 22.9 MB +9.8%
Scan Throughput 100 4.493s 4.297s -4.4%
Scan Throughput 1000 34.837s 33.805s -3.0%
Typical Queue Shuffle 65.13µs 5.51µs -91.5% 🟢🟢🟢 🚀
Priority Queue Shuffle 737.57µs 27.02µs -96.3% 🟢🟢🟢 🚀

🎯 Performance Summary

+ 8 improvements 🚀
  18 unchanged ✅

🔍 Significant Changes (>10%)

  • Make Event Autodetection Small: 19.7% 🚀 faster
  • Make Event Autodetection Large: 17.7% 🚀 faster
  • Make Event Explicit Types: 17.3% 🚀 faster
  • Is Ip Performance: 28.7% 🚀 faster
  • Make Ip Type Performance: 98.0% 🚀 faster
  • Mixed Ip Operations: 47.9% 🚀 faster
  • Typical Queue Shuffle: 91.5% 🚀 faster
  • Priority Queue Shuffle: 96.3% 🚀 faster

🐍 Python Version 3.11.15

@ausmaster ausmaster added this to the BBOT 3.0 - blazed_elijah milestone May 20, 2026
@ausmaster ausmaster linked an issue May 20, 2026 that may be closed by this pull request
@ausmaster ausmaster self-requested a review May 21, 2026 22:58
@liquidsec liquidsec merged commit ffc5e34 into dev May 21, 2026
20 checks passed
@liquidsec liquidsec mentioned this pull request Jun 9, 2026
@ausmaster ausmaster deleted the cpu-optimizations-2026 branch June 11, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CPU Usage Inquisition

2 participants