CPU Optimizations by liquidsec · Pull Request #3088 · blacklanternsecurity/bbot

liquidsec · 2026-05-12T20:22:22Z

Summary

Follow-up to the CPU inquisition in #2074. After the first round of fixes the top remaining offenders were weighted_shuffle, random.choices (called from weighted_shuffle), and ipaddress.ip_address. This PR flattens all three in pure Python before considering anything more invasive (e.g. Rust/PyO3).

Changes

weighted_shuffle — replace the O(n²) "normalize remaining weights and draw with random.choices" loop with Efraimidis–Spirakis: assign each item a single weighted random key, sort once. Same distribution over orderings, O(n log n). random.choices was almost entirely an internal call from this function and disappears from the hot path.
make_ip_type — add a cheap _looks_like_ip character-set pre-filter so hostnames (the vast majority of inputs) bail out before paying for ipaddress' parse + double ValueError round-trip. Wrap the function in lru_cache(16384) since the same hosts/IPs are seen by many modules during a scan.
cached_ip_address / cached_ip_network — new thin LRU-cached wrappers around the stdlib calls. Same semantics (exceptions are not cached, so invalid input still raises). Swapped into the direct callers that bypass make_ip_type (the IP_ADDRESS event class, the IP_ADDRESS / IP_RANGE event seeds, is_ip, dnsresolve) to absorb the ~5× amplification on ip_address calls visible in the original profile.

Benchmarks

Measured against dev using the existing test_ipaddress_benchmarks.py and test_weighted_shuffle_benchmarks.py:

Benchmark	Before	After	Speedup
`make_ip_type` (1700 valid IPs)	6.88 ms	107.5 µs	~64×
`mixed_ip_operations` (1000 mixed)	2.66 ms	1.37 ms	~1.9×
`is_ip` (1000 mixed)	1.88 ms	1.35 ms	~1.4×
`weighted_shuffle` typical (n=20)	31.66 µs	3.27 µs	~9.7×
`weighted_shuffle` priority (n=105)	361.88 µs	17.03 µs	~21×

The 64× on make_ip_type is the LRU cache showing its full effect on repeated inputs (matches real BBOT usage where the same IPs flow through many modules). The mixed/is_ip benchmarks have smaller gains because half their inputs are invalid — those raise ValueError, which lru_cache doesn't cache, so they re-take the slow path each round. The pre-filter still helps "random letters" inputs but not "malformed-but-IP-shaped" inputs.

Expected impact on the original #2074 profile: ip_address and random.choices should fall out of the top 50 entirely, like the regex hotspots did in the prior round.

- weighted_shuffle: replace O(n²) re-normalize-and-draw loop with Efraimidis-Spirakis (one biased random key per item, then sort). ~10x faster at n=20, ~21x at n=105. - make_ip_type: lru_cache + cheap "looks like an IP" pre-filter so hostnames bail before paying for ipaddress' parse+exception path. - Add cached_ip_address / cached_ip_network wrappers and use them in the IP_ADDRESS event class, event-seed helpers, is_ip, and dnsresolve to absorb the ~5x amplification on direct ip_address calls visible in #2074's CPU profile.

codecov · 2026-05-12T21:02:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90%. Comparing base (4cfaa9c) to head (90a4776).
⚠️ Report is 61 commits behind head on dev.

Additional details and impacted files

@@          Coverage Diff          @@
##             dev   #3088   +/-   ##
=====================================
- Coverage     90%     90%   -0%     
=====================================
  Files        444     444           
  Lines      38338   38354   +16     
=====================================
+ Hits       34277   34291   +14     
- Misses      4061    4063    +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-05-12T21:06:25Z

📊 Performance Benchmark Report

Comparing dev (baseline) vs cpu-optimizations-2026 (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name	📏 Base	📏 Current	📈 Change	🎯 Status
Bloom Filter Dns Mutation Tracking Performance	`4.24ms`	`4.28ms`	+0.8% ⚪	✅
Bloom Filter Large Scale Dns Brute Force	`17.92ms`	`17.91ms`	-0.0% ⚪	✅
Large Closest Match Lookup	`358.33ms`	`361.52ms`	+0.9% ⚪	✅
Realistic Closest Match Workload	`190.52ms`	`189.39ms`	-0.6% ⚪	✅
Event Memory Medium Scan	`1784 B/event`	`1755 B/event`	-1.6% ⚪	✅
Event Memory Large Scan	`1768 B/event`	`1863 B/event`	+5.3% ⚪	✅
Event Validation Full Scan Startup Small Batch	`421.90ms`	`425.67ms`	+0.9% ⚪	✅
Event Validation Full Scan Startup Large Batch	`593.98ms`	`564.25ms`	-5.0% ⚪	✅
Make Event Autodetection Small	`31.87ms`	`25.60ms`	-19.7% 🟢🟢	🚀
Make Event Autodetection Large	`321.52ms`	`264.53ms`	-17.7% 🟢🟢	🚀
Make Event Explicit Types	`14.20ms`	`11.73ms`	-17.3% 🟢🟢	🚀
Excavate Single Thread Small	`4.082s`	`3.755s`	-8.0% ⚪	✅
Excavate Single Thread Large	`9.596s`	`9.270s`	-3.4% ⚪	✅
Excavate Parallel Tasks Small	`4.242s`	`3.955s`	-6.8% ⚪	✅
Excavate Parallel Tasks Large	`6.654s`	`6.423s`	-3.5% ⚪	✅
Is Ip Performance	`3.21ms`	`2.28ms`	-28.7% 🟢🟢🟢	🚀
Make Ip Type Performance	`11.83ms`	`232.53µs`	-98.0% 🟢🟢🟢	🚀
Mixed Ip Operations	`4.56ms`	`2.38ms`	-47.9% 🟢🟢🟢	🚀
Memory Use Web Crawl	`656.5 MB`	`632.2 MB`	-3.7% ⚪	✅
Memory Use Subdomain Enum	`33.4 MB`	`35.7 MB`	+7.0% ⚪	✅
Memory Use Deep Chain	`7.8 MB`	`7.7 MB`	-1.0% ⚪	✅
Memory Use Parallel Chains	`20.9 MB`	`22.9 MB`	+9.8% ⚪	✅
Scan Throughput 100	`4.493s`	`4.297s`	-4.4% ⚪	✅
Scan Throughput 1000	`34.837s`	`33.805s`	-3.0% ⚪	✅
Typical Queue Shuffle	`65.13µs`	`5.51µs`	-91.5% 🟢🟢🟢	🚀
Priority Queue Shuffle	`737.57µs`	`27.02µs`	-96.3% 🟢🟢🟢	🚀

🎯 Performance Summary

+ 8 improvements 🚀
  18 unchanged ✅

🔍 Significant Changes (>10%)

Make Event Autodetection Small: 19.7% 🚀 faster
Make Event Autodetection Large: 17.7% 🚀 faster
Make Event Explicit Types: 17.3% 🚀 faster
Is Ip Performance: 28.7% 🚀 faster
Make Ip Type Performance: 98.0% 🚀 faster
Mixed Ip Operations: 47.9% 🚀 faster
Typical Queue Shuffle: 91.5% 🚀 faster
Priority Queue Shuffle: 96.3% 🚀 faster

🐍 Python Version 3.11.15

liquidsec changed the title ~~CPU: O(n log n) weighted_shuffle + cached ip_address lookups~~ CPU Optimizations May 12, 2026

liquidsec self-assigned this May 12, 2026

Merge branch 'dev' into cpu-optimizations-2026

90a4776

ausmaster added this to the BBOT 3.0 - blazed_elijah milestone May 20, 2026

ausmaster linked an issue May 20, 2026 that may be closed by this pull request

CPU Usage Inquisition #2074

Closed

ausmaster self-requested a review May 21, 2026 22:58

ausmaster approved these changes May 21, 2026

View reviewed changes

liquidsec merged commit ffc5e34 into dev May 21, 2026
20 checks passed

liquidsec mentioned this pull request Jun 9, 2026

Dev -> Stable 3.0 #3079

Open

ausmaster deleted the cpu-optimizations-2026 branch June 11, 2026 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CPU Optimizations#3088

CPU Optimizations#3088
liquidsec merged 2 commits into
devfrom
cpu-optimizations-2026

liquidsec commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

liquidsec commented May 12, 2026

Summary

Changes

Benchmarks

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Performance Benchmark Report

🎯 Performance Summary

🔍 Significant Changes (>10%)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 12, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading