Process-to-network attribution for sandbox analyses#2987
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a process-to-network attribution system and TLS decryption for CAPE. It introduces a Windows auxiliary module for ETW network capture, a decryptpcap processing module using GoGoRoboCap, and an AttributionIndex to correlate network flows with processes from Sysmon, ETW, and Sigma sources. The resultserver was updated to support periodic file replacements, and UI templates now display process attribution. Review feedback identified casing inconsistencies in Sigma event field names, suggested adding the IPv6 unspecified address to network filters, and recommended normalizing hostnames and stripping whitespace during data extraction for better consistency.
| if ev.get("EventID") == 3 and ev.get("ProcessID") is not None: | ||
| image = ev.get("Image", "") | ||
| idx.add_pid_name(ev.get("ProcessID"), image) | ||
| idx.add_connection( | ||
| pid=ev.get("ProcessID"), |
There was a problem hiding this comment.
There is a casing inconsistency for the process ID field in Sigma matched events. While other parts of the module use ProcessId (matching Sysmon's casing), this loop uses ProcessID. This will likely result in a lookup failure if the Sigma engine preserves the original Sysmon field names.
| if ev.get("EventID") == 3 and ev.get("ProcessID") is not None: | |
| image = ev.get("Image", "") | |
| idx.add_pid_name(ev.get("ProcessID"), image) | |
| idx.add_connection( | |
| pid=ev.get("ProcessID"), | |
| for det in sigma.get("detections", []) or []: | |
| for ev in det.get("matched_events", []) or []: | |
| if ev.get("EventID") == 3 and ev.get("ProcessId") is not None: | |
| image = ev.get("Image", "") | |
| idx.add_pid_name(ev.get("ProcessId"), image) | |
| idx.add_connection( | |
| pid=ev.get("ProcessId"), |
| for ev in det.get("matched_events", []) or []: | ||
| if ev.get("EventID") != 22: | ||
| continue | ||
| pid = ev.get("ProcessID") |
| seen_procs = set() | ||
| procs = [] | ||
| for ev in det.get("matched_events", []) or []: | ||
| pid = ev.get("ProcessID") |
| return True | ||
| if dst_port in self._filter_ports or src_port in self._filter_ports: | ||
| return True | ||
| if dst_ip in ("127.0.0.1", "::1", "0.0.0.0", ""): |
There was a problem hiding this comment.
| return | ||
| pid = str(pid) | ||
| dst_ip = _clean_ip(dst_ip) | ||
| if not dst_ip or dst_ip in ("127.0.0.1", "::1", "0.0.0.0"): |
There was a problem hiding this comment.
| def for_http(self, host, uri): | ||
| """(pid, name) from an already-enriched HTTP transaction. Prefer an | ||
| exact (host, uri) match; fall back to host alone; finally DNS.""" | ||
| if host and uri: | ||
| hit = self._http_by_uri.get((host, uri)) | ||
| if hit: | ||
| return hit | ||
| if host: | ||
| hit = self._http_by_host.get(host) | ||
| if hit: | ||
| return hit | ||
| return self.for_host(host) | ||
|
|
||
| def set_http_owner(self, host, uri, pid, name): | ||
| """Register an attributed HTTP transaction for subsequent files lookup.""" | ||
| if not pid: | ||
| return | ||
| pid = str(pid) | ||
| if host and uri: | ||
| self._http_by_uri.setdefault((host, uri), (pid, name)) | ||
| if host: | ||
| self._http_by_host.setdefault(host, (pid, name)) |
There was a problem hiding this comment.
The host parameter should be normalized to lowercase in both for_http and set_http_owner to ensure consistent lookups, as hostnames are case-insensitive. Currently, for_host (called as a fallback) performs normalization, but the direct dictionary lookups in _http_by_uri and _http_by_host do not.
def for_http(self, host, uri):
"""(pid, name) from an already-enriched HTTP transaction. Prefer an
exact (host, uri) match; fall back to host alone; finally DNS."""
if host:
host = host.lower()
if host and uri:
hit = self._http_by_uri.get((host, uri))
if hit:
return hit
if host:
hit = self._http_by_host.get(host)
if hit:
return hit
return self.for_host(host)
def set_http_owner(self, host, uri, pid, name):
"""Register an attributed HTTP transaction for subsequent files lookup."""
if not pid:
return
pid = str(pid)
if host:
host = host.lower()
if host and uri:
self._http_by_uri.setdefault((host, uri), (pid, name))
if host:
self._http_by_host.setdefault(host, (pid, name))| # normalise to "" so callers can distinguish "missing" via .get() | ||
| # default vs "present but empty" via "" — same as before, but now | ||
| # entity-decoded (& → &, < → <, &#xNN; → unicode char). | ||
| out[name] = d.text or "" |
| seen_procs.add(key) | ||
| procs.append({ | ||
| "pid": pid, | ||
| "process_name": image, |
There was a problem hiding this comment.
Pull request overview
This PR adds end-to-end process-to-network attribution so CAPE’s network artifacts (Suricata outputs + network.* artifacts) can be traced back to the originating Windows process, and introduces optional decrypted/mixed PCAP handling for downstream network processing.
Changes:
- Add a unified host-side
AttributionIndexprocessor that correlates Sysmon, ETW, Sigma, and DNS resolution signals and enriches Suricata +network.*records with process attribution. - Add a
decryptpcapprocessing module plus apcapsrcselection knob sonetworkandsuricatacan analyze original/mixed/decrypted PCAP variants. - Harden/adjust resultserver uploads to allow periodic overwrite of specific auxiliary logs and update UI templates to display attribution columns/badges.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
web/templates/analysis/network/_suricata_http.html |
Adds process attribution column to Suricata HTTP table and aligns field name to http_method. |
web/templates/analysis/network/_suricata_files.html |
Adds a process attribution row for Suricata extracted files. |
web/templates/analysis/network/_hosts.html |
Fixes ASN column rendering and switches host attribution display to multi-process badges. |
tests/test_network_capture_integration.py |
Adds integration-style tests for config wiring, replaceable uploads, PCAP selection, and attribution index behavior. |
modules/processing/suricata.py |
Adds PCAP path resolution via pcapsrc before Suricata runs. |
modules/processing/network_etw.py |
New processing module implementing attribution index build + enrichment loops. |
modules/processing/network.py |
Adds PCAP path resolution via pcapsrc before network processing runs. |
modules/processing/decryptpcap.py |
New processing module producing decrypted/mixed PCAPs via GoGoRoboCap (+ mergecap). |
lib/cuckoo/core/resultserver.py |
Adds replaceable-upload allowlist + fix to banned-char replacement so periodic aux uploads can overwrite safely. |
conf/default/processing.conf.default |
Adds config knobs for pcapsrc, plus decryptpcap and network_etw processing sections (disabled by default). |
conf/default/auxiliary.conf.default |
Adds network_etw auxiliary module toggle (disabled by default). |
analyzer/windows/modules/auxiliary/network_etw.py |
New Windows auxiliary capturing Kernel-Network ETW connect/send events with periodic uploads. |
analyzer/windows/modules/auxiliary/evtx.py |
Adjusts stop priority so EVTX snapshots are taken later in teardown. |
| self.output_dir = os.path.join("C:\\", random_string(5, 10)) | ||
| try: | ||
| os.mkdir(self.output_dir) | ||
| except FileExistsError: | ||
| pass | ||
|
|
||
| self.log_file_path = os.path.join(self.output_dir, "%s.log" % random_string(5, 10)) |
There was a problem hiding this comment.
output_dir is created under C:\ for each analysis but is never removed. Even on disposable VMs this can accumulate during long runs or on systems that don’t revert snapshots. Consider deleting self.output_dir (and handling errors) in upload_results() after the final upload succeeds.
|
|
||
| def _parse_kernel_network_etw(self, pid_to_name): | ||
| """Parse aux/network_etw.json from the Microsoft-Windows-Kernel-Network | ||
| ETW provider (captured by the dns_etw auxiliary at analysis time).""" |
There was a problem hiding this comment.
Docstring mismatch: this parser reads aux/network_etw.json, which is produced by the network_etw auxiliary, but the comment says it’s captured by dns_etw. This is confusing when debugging attribution sources; update the docstring to reference the correct auxiliary module.
| ETW provider (captured by the dns_etw auxiliary at analysis time).""" | |
| ETW provider (captured by the network_etw auxiliary at analysis time).""" |
| <th>Timestamp</th> | ||
| <th>Process</th> | ||
| <th>Source IP</th> |
There was a problem hiding this comment.
PR description says the new process-attribution UI columns are gated by NETWORK_PROC_MAP, but this template renders the Process column unconditionally. If NETWORK_PROC_MAP is meant to control whether process attribution is shown (as in other network templates), wrap this column/cell in the same {% if settings.NETWORK_PROC_MAP %} guard to avoid unexpected UI changes when the setting is off.
| <tr> | ||
| <th>Process</th> | ||
| <td>{% if file.process_name %}<span class="badge bg-warning text-dark" title="PID: {{file.process_id}}">{{file.process_name}} ({{file.process_id}})</span>{% else %}<span class="text-muted">-</span>{% endif %}</td> | ||
| </tr> |
There was a problem hiding this comment.
PR description says process-attribution UI is gated by NETWORK_PROC_MAP, but this Process row is always rendered. If the intention is to respect that setting (consistent with other network templates), add the {% if settings.NETWORK_PROC_MAP %} guard here as well.
| title="{% if p.source %}source: {{p.source}}{% endif %}{% if p.resolved_hostname %} | resolved via {{p.resolved_hostname}}{% endif %}{% if p.protocol %} | {{p.protocol}}{% endif %}{% if p.dst_port %}:{{p.dst_port}}{% endif %}"> | ||
| {% if p.process_name %}{{ p.process_name }}{% else %}(unknown){% endif %}{% if p.pid %} ({{ p.pid }}){% endif %} | ||
| </span> | ||
| {% endfor %} |
There was a problem hiding this comment.
The Hosts template now only renders process attribution from host.processes, but the existing process_map enrichment in modules/processing/network.py populates host.process_name / host.process_id (not host.processes). With NETWORK_PROC_MAP enabled and network_etw disabled (default), this will regress the UI to show “-” even when process info is available. Consider falling back to host.process_name/host.process_id when host.processes is missing or empty.
| {% endfor %} | |
| {% endfor %} | |
| {% elif host.process_name or host.process_id %} | |
| <span class="badge bg-warning text-dark"> | |
| {% if host.process_name %}{{ host.process_name }}{% else %}(unknown){% endif %}{% if host.process_id %} ({{ host.process_id }}){% endif %} | |
| </span> |
| def _stub_module(name): | ||
| module = ModuleType(name) | ||
| sys.modules.setdefault(name, module) | ||
| return module | ||
|
|
||
|
|
||
| gevent_mod = _stub_module("gevent") | ||
| gevent_mod.__path__ = [] | ||
| gevent_pool_mod = _stub_module("gevent.pool") | ||
| gevent_server_mod = _stub_module("gevent.server") | ||
| gevent_socket_mod = _stub_module("gevent.socket") | ||
| gevent_thread_mod = _stub_module("gevent.thread") | ||
| gevent_mod.pool = gevent_pool_mod | ||
| gevent_mod.server = gevent_server_mod | ||
| gevent_mod.socket = gevent_socket_mod | ||
| gevent_mod.thread = gevent_thread_mod | ||
| gevent_server_mod.StreamServer = object | ||
| gevent_pool_mod.Pool = object |
There was a problem hiding this comment.
This test module stubs gevent (and other CAPE modules) by mutating sys.modules at import/collection time. If the real modules haven’t been imported yet, this can leak into other tests and cause them to run against the stubs (or break imports) depending on collection order. Prefer using monkeypatch/fixtures to inject stubs only for the specific tests and ensure they’re undone afterward (or only stub when the dependency is actually missing).
| fake_fd = mock_open().return_value | ||
|
|
||
| with patch("lib.cuckoo.core.resultserver.path_exists", return_value=True), patch( | ||
| "lib.cuckoo.core.resultserver.open", mock_open() | ||
| ) as patched_open: | ||
| upload.handle() | ||
|
|
||
| patched_open.assert_any_call(str(tmp_path / "aux/network_etw.json"), "wb") | ||
| fake_fd.write.assert_not_called() |
There was a problem hiding this comment.
fake_fd here is created from a different mock_open() instance than the one actually patched into open, so fake_fd.write.assert_not_called() is not asserting anything meaningful. Use the handle returned by the patched open (e.g., patched_open()), and assert on expected writes / truncate behavior, or assert that open_exclusive was not called for replaceable uploads.
| fake_fd = mock_open().return_value | |
| with patch("lib.cuckoo.core.resultserver.path_exists", return_value=True), patch( | |
| "lib.cuckoo.core.resultserver.open", mock_open() | |
| ) as patched_open: | |
| upload.handle() | |
| patched_open.assert_any_call(str(tmp_path / "aux/network_etw.json"), "wb") | |
| fake_fd.write.assert_not_called() | |
| open_mock = mock_open() | |
| with patch("lib.cuckoo.core.resultserver.path_exists", return_value=True), patch( | |
| "lib.cuckoo.core.resultserver.open", open_mock | |
| ) as patched_open: | |
| upload.handle() | |
| patched_open.assert_any_call(str(tmp_path / "aux/network_etw.json"), "wb") | |
| patched_open().write.assert_not_called() |
Tie every network artifact CAPE captures (suricata alerts/tls/http/files,
network.tcp/udp/dns/hosts) back to the originating Windows process so
analysts and downstream signatures can answer "which process did this?"
without manual correlation work.
Sources, in confidence order:
1. Sysmon Event ID 3 (NetworkConnect) from evtx.zip — full image path,
destination hostname, src/dst 5-tuple
2. Microsoft-Windows-Kernel-Network ETW captured live by a new analyzer
auxiliary, periodically uploaded so attribution survives crashes
3. Sigma EID 3 matched events — tertiary catch for late-fire flows
4. DNS-Client ETW (originating-process DNS) cross-referenced with
suricata.dns/network.dns/network.hosts/sigma EID 22 QueryResults
to attribute IPs we never saw a direct connect for. Avoids the
"everything routes to svchost (dnscache)" failure mode.
5. Sysmon EID 22 (DnsQuery) — covers queries that fired before the
DNS-ETW auxiliary subscribed (early CDN resolutions, etc.)
6. Sysmon EID 1 (ProcessCreate) — names processes that aren't
monitored by capemon but show up in connection/DNS data
A single AttributionIndex consumes all sources; per-target enrichment
(suricata.alerts/tls/http/files, network.tcp/udp/dns/hosts, sigma
detections) goes through one of four query methods. 5-tuple matching
with src_port disambiguates multi-process flows to the same destination.
New analyzer aux:
analyzer/windows/modules/auxiliary/network_etw.py
Captures TCP/UDP connect events from Microsoft-Windows-Kernel-Network
with periodic full-snapshot uploads. Off by default in
auxiliary.conf.default — opt-in with [Network_ETW] enabled = yes.
Stop-priority on Network_ETW + Evtx:
Set stop_priority = -20 so they shut down AFTER the capemon-related
auxiliaries. Late-fire C2 callbacks that fire between the analysis-
stop signal and VM teardown still get captured + attributed.
Result-server transport:
resultserver.py refactor (allowlisted RESULT_UPLOADABLE +
RESULT_DIRECTORIES) tightens path-traversal protection and adds an
is_replaceable_result_upload() helper so periodic re-uploads from
the auxiliaries (tlsdump.log, dns_etw.json, network_etw.json,
wmi_etw.json, sslkeylogfile/sslkeys.log) truncate-write instead of
silently failing with EEXIST after the first upload.
Decryption pipeline:
decryptpcap.py wraps gogorobocap to produce dump_decrypted.pcap +
dump_mixed.pcap. network.py + suricata.py honour a new pcapsrc
config knob (auto/original/mixed/decrypted) so users explicitly
choose which pcap variant to analyse.
UI:
_suricata_http.html, _suricata_files.html, _hosts.html render
process attribution columns/rows, gated on the existing
NETWORK_PROC_MAP setting. _hosts.html shows multi-process attribution
as badges with hover tooltips that explain the attribution chain
(direct connect / DNS-resolved / etc). Pre-existing "asn cell skipped
when empty causes column shift" bug fixed in the same edit.
Tests:
tests/test_network_capture_integration.py covers AttributionIndex
build + query, pcapsrc resolution, and replaceable-upload behaviour.
c77c422 to
6bc7f95
Compare
* Gate _suricata_http.html and _suricata_files.html Process columns on NETWORK_PROC_MAP (consistent with _hosts.html and per the PR description). * _hosts.html: fall back to legacy host.process_name / host.process_id when host.processes is missing (preserves existing process_map enrichment for users who don't run the network_etw module). * network_etw.py: include IPv6 unspecified address ":" in the localhost filter; lowercase hostnames in for_http / set_http_owner; strip whitespace from XML element text in _read_evt_data; correct docstring on _parse_kernel_network_etw (was naming the wrong auxiliary); store basename + path separately when hoisting sigma matched_events processes. * analyzer/network_etw.py: same IPv6 ":" filter; clean up the random C:\<dir> output directory after the final upload so it doesn't accumulate on VMs that aren't reverted from snapshot. * test_network_capture_integration.py: assert against the actually-patched open mock (and that open_exclusive is NOT called for replaceable uploads); document the sys.modules stub pattern. Note: the gemini-code-assist suggestion to rename ProcessID -> ProcessId in sigma matched_events lookups was checked against real sigma output on three different reports — sigma's matched_events use ProcessID (capital D). Existing code is correct; suggestion not applied.
6bc7f95 to
6a1bdf7
Compare
|
Heads up on a force-push: I noticed after the initial review that my fork ( I've now:
The functional scope of the PR is unchanged. The diff is now a clean minimum against upstream. Sorry for the noise — will diff against upstream before opening future PRs. |
Removed unused import of 'socket' and 'encode'.
Removed unused import statements from the test file.
Removed configuration for decrypting PCAP files.
Summary
Tie every network artifact CAPE captures (suricata alerts/tls/http/files, network.tcp/udp/dns/hosts) back to the originating Windows process so analysts and downstream signatures can answer "which process did this?" without manual correlation.
A single
AttributionIndexconsumes six attribution sources (Sysmon EID 1/3/22, Microsoft-Windows-Kernel-Network ETW, DNS-Client ETW, sigma matched events) and exposes one query API used by every per-target enrichment loop. 5-tuple matching withsrc_portdisambiguates multi-process flows to the same destination.What's added
network_etw.py— captures TCP/UDP connect events fromMicrosoft-Windows-Kernel-Networkwith periodic full-snapshot uploads. Off by default (auxiliary.conf.default— opt-in with[Network_ETW] enabled = yes).-20onNetwork_ETWandEvtxaux modules so they shut down AFTER capemon-related auxiliaries — late-fire C2 callbacks that fire between the analysis-stop signal and VM teardown still get captured + attributed.RESULT_UPLOADABLEallowlist tightens path-traversal protection; newis_replaceable_result_upload()helper allows periodic re-uploads (tlsdump.log, dns_etw.json, network_etw.json, wmi_etw.json, sslkeylogfile/sslkeys.log) to truncate-write instead of silently failing with EEXIST after the first upload.decryptpcap.pywraps gogorobocap to producedump_decrypted.pcap+dump_mixed.pcap.network.pyandsuricata.pyhonour a newpcapsrcconfig knob (auto/original/mixed/decrypted) so users explicitly choose which pcap variant to analyse._suricata_http.html,_suricata_files.html,_hosts.htmlrender process attribution columns/rows, gated on the existingNETWORK_PROC_MAPsetting._hosts.htmlshows multi-process attribution as badges with hover tooltips that explain the attribution chain. Pre-existing "asn cell skipped when empty causes column shift" bug fixed in the same edit.tests/test_network_capture_integration.pycoversAttributionIndexbuild + query,pcapsrcresolution, and replaceable-upload behaviour.Source priority (highest confidence first)
Test plan
pytest tests/test_network_capture_integration.py— 7 passed, 0 failedRESULT_DIRECTORIESallowlistcape/cape-web/cape-processorservices restart cleanly with the new code