Fix graph orphans for omitted event types#3038
Conversation
📊 Performance Benchmark Report
📈 Detailed Results (All Benchmarks)
🎯 Performance Summary✅ No significant performance changes detected (all changes <10%) 🐍 Python Version 3.11.15 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #3038 +/- ##
=====================================
- Coverage 90% 90% -0%
=====================================
Files 446 446
Lines 38377 38398 +21
=====================================
+ Hits 34319 34336 +17
- Misses 4058 4062 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This was intended functionality. graph importance is meant to override internal events, but not omitted ones. omitted ones should always be excluded from output, and that should never create orphans because the omitted event is skipped, preserving the parent chain. The mystery is why JS URLs specifically seem to be slipping through this system. |
|
Need to add additional tests to try to replicate the original bug to understand if its actually been fixed or not |
So it turns out the "fix" i added never mattered... The _omit=True AND _graph_important=True combination the fix targets is unreachable through normal scan flow. I think this was fixed somewhere else since the issue was opened. I reverted the fix but added a regression test to make sure we don't enter this state again. |
The original change in this PR allowed _omit=True events into output if they were also _graph_important. After tracing every assignment, that combination is structurally unreachable through normal scan flow — _graph_important is only ever set in manager.py:254 against event.get_parent(), which by design skips _omit parents. The black-box reproducer (TestJSONGraphOrphanJSURL) walks a real JS URL chain end-to-end and confirms no parent_uuid orphans appear in output.json with or without the change.
Summary
Closes #1151
_is_graph_important()was rejecting events with_omit=Trueeven when they were marked as_graph_important=True. This caused graph orphans in JSON/neo4j output when an omitted event type (e.g. a JSURL_UNVERIFIED) was the parent of a non-omitted event (e.g.DNS_NAME).The
_omitcheck in_is_graph_important()was redundant — output modules without_preserve_graph(stdout, txt, csv) already returnFalsefrom_is_graph_important()becauseself.preserve_graphisFalse. Only modules with_preserve_graph=True(json, neo4j) use graph importance, and those are exactly the modules that need to accept omitted parents to prevent orphans.Fix
Remove
and not getattr(event, "_omit", False)from_is_graph_important().Test
Added test verifying that a
_preserve_graphoutput module accepts graph-important omitted events, while a non-_preserve_graphmodule still rejects them.