Skip to content

apiv2: add download endpoints for pcap variants, TLS keys, ETW logs, bulk archives#2994

Merged
kevoreilly merged 1 commit into
kevoreilly:masterfrom
wmetcalf:apiv2-download-endpoints
May 1, 2026
Merged

apiv2: add download endpoints for pcap variants, TLS keys, ETW logs, bulk archives#2994
kevoreilly merged 1 commit into
kevoreilly:masterfrom
wmetcalf:apiv2-download-endpoints

Conversation

@wmetcalf
Copy link
Copy Markdown
Contributor

Summary

The web UI file-download view has long exposed dump_decrypted.pcap, dump_mixed.pcap, pcap bundles, pcapng (with embedded keys), tlsdump keylog, and whole-directory archives, but the REST API only served dump.pcap plus a tasks_tlspcap endpoint hard-coded to polarproxy/tls.pcap — a path no longer produced since CAPE moved to SSLproxy + GoGoRoboCap. Programmatic consumers currently have no way to pull the decrypted / merged pcaps or any of the ETW-based telemetry that now drives process attribution.

This PR brings apiv2 to parity with the web UI and adds endpoints for several artifacts that have no download path at all today.

New endpoints

PCAP variants (gated by existing [taskpcap]):

endpoint serves
tasks/get/decrypted_pcap/<id>/ dump_decrypted.pcap
tasks/get/mixed_pcap/<id>/ dump_mixed.pcap
tasks/get/sslproxy_pcap/<id>/ sslproxy/sslproxy.pcap
tasks/get/pcapzip/<id>/ zip of all 5 pcap variants
tasks/get/pcapng/<id>/ on-the-fly pcapng with TLS keys embedded

TLS key material (new [tasktlskeys] — gated separately since these decrypt captured flows):

endpoint serves
tasks/get/tlskeys/<id>/ tlsdump/tlsdump.log (NSS keylog)
tasks/get/sslkeys/<id>/ aux/sslkeylogfile/sslkeys.log
tasks/get/masterkeys/<id>/ sslproxy/master_keys.log

ETW telemetry (new [tasketw]):

endpoint serves
tasks/get/dns_etw/<id>/ aux/dns_etw.json (NDJSON)
tasks/get/network_etw/<id>/ aux/network_etw.json
tasks/get/wmi_etw/<id>/ aux/wmi_etw.json
tasks/get/amsi_etw/<id>/ zip of aux/amsi_etw/

Bulk directory archives (new [taskbulkzip] — AES-encrypted with ZIP_PWD for parity with tasks_dropped/tasks_payloadfiles/tasks_procdumpfiles):

endpoint serves
tasks/get/logs_all/<id>/ logs/
tasks/get/network_all/<id>/ network/
tasks/get/memdump_all/<id>/ memory/
tasks/get/selfextracted_all/<id>/ selfextracted/

Fixes

  • tasks_tlspcap now prefers dump_decrypted.pcap with a polarproxy/tls.pcap fallback so legacy callers keep working on modern analyses (previously returned 404 on every post-PolarProxy run).
  • create_zip picks up a recursive os.walk (replacing os.listdir) and now preserves relative paths. Previously, bulk archives of nested directories — notably logs/filestore/<bucket>/* — silently dropped everything below the top level. Also gains an optional temp_file=True that routes the archive through a disk-backed NamedTemporaryFile for memory-friendly streaming of large folders (used by the new bulk endpoints).

Gating

PCAP variants reuse the existing [taskpcap] section on the theory that operators who opted into pcap access already implicitly trust the caller with packet-capture data. TLS keys, ETW JSON, and whole-directory archives get their own sections so they can be enabled/disabled independently of standard pcap access.

Implementation

Four shared helpers — _resolve_task_id, _serve_analysis_file, _zip_paths, _serve_folder_zip — factor the boilerplate so each new handler reduces to 4–8 lines of actual logic.

Test plan

  • All 16 new endpoints return correct Content-Type and body (verified against live CAPE deployment on tasks with and without TLS decryption data)
  • tasks_tlspcap now serves dump_decrypted.pcap on modern analyses (same SHA256 as tasks_decrypted_pcap) and falls back cleanly when nothing is present
  • pcapng endpoint generates a valid PCAPNG v1.0 with TLS keys embedded (file /tmp/t.pcapngpcapng capture file - version 1.0)
  • AES-encrypted bulk zips round-trip with ZIP_PWD=infected (7z t -pinfected)
  • Missing artifacts return clean JSON errors, not 500s
  • Invalid task IDs return {"error": true, "error_value": "Task does not exist"}
  • Recursive-walk fix archives nested content — verified logs_all on a task where logs/ has 22 files total, 12 top-level and 10 under logs/filestore/*/*; old code shipped only 12 + empty directory entry, new code ships all 22
  • No regression in existing create_zip callers (flat dirs produce identical output)

Copilot AI review requested due to automatic review settings April 24, 2026 16:01
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several new API endpoints for downloading analysis artifacts, such as decrypted PCAPs, ETW logs, and bulk ZIP archives of logs, memory dumps, and network data. It also refactors the download logic into reusable helper functions and enhances the create_zip utility to support recursive directory compression and disk-backed temporary files. Feedback highlights a potential race condition when generating PCAPNG files at a fixed path, suggests making TLP checks case-insensitive for better consistency, and recommends using a more accurate MIME type for PCAPNG downloads.

Comment thread web/apiv2/views.py Outdated
tls_log_path = os.path.join(adir, "tlsdump", "tlsdump.log")
ssl_key_log_path = os.path.join(adir, "aux", "sslkeylogfile", "sslkeys.log")
pcapng = PcapToNg(pcap_path, tls_log_path, ssl_key_log_path)
pcapng.generate(out_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Generating the PCAPNG file at a fixed path (dump.pcapng) within the analysis directory introduces a race condition. Concurrent API requests for the same task will attempt to write to the same file simultaneously, potentially leading to file corruption or serving incomplete data. Consider using a file lock or generating the PCAPNG to a unique temporary file for each request if it is not intended to be a persistent, shared artifact.

Comment thread web/apiv2/views.py Outdated
return Response(check)

return None, Response(check)
if check_tlp and check.get("tlp", "") in ("red", "Red"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The TLP check is case-sensitive and only accounts for "red" and "Red". It should be made case-insensitive to correctly handle "RED" as well, ensuring consistency with other TLP checks in the codebase.

Suggested change
if check_tlp and check.get("tlp", "") in ("red", "Red"):
if check_tlp and (check.get("tlp") or "").lower() == "red":

Comment thread web/apiv2/views.py Outdated
Comment on lines +1835 to +1836
return _serve_analysis_file(task_id, "dump.pcapng", "dump.pcapng",
content_type="application/vnd.tcpdump.pcap")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Content-Type for PCAPNG files should be application/x-pcapng or application/vnd.tcpdump.pcapng to accurately reflect the file format and ensure proper handling by client applications.

Suggested change
return _serve_analysis_file(task_id, "dump.pcapng", "dump.pcapng",
content_type="application/vnd.tcpdump.pcap")
return _serve_analysis_file(task_id, "dump.pcapng", "dump.pcapng",
content_type="application/x-pcapng")

@doomedraven
Copy link
Copy Markdown
Collaborator

that is a bit overflow, would be easier to do: get/pcap and get/pcap/<type>

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new apiv2 download endpoints to bring REST API artifact access closer to the web UI, including multiple PCAP variants, TLS key material, ETW telemetry, and bulk directory archives; also fixes/extends zip creation to support recursive folder zipping and disk-backed streaming.

Changes:

  • Added shared download helpers and 16 new artifact endpoints in web/apiv2/views.py (pcap variants, pcapng, TLS keys, ETW, bulk zips) and updated tasks_tlspcap fallback behavior.
  • Registered new routes for these endpoints in web/apiv2/urls.py.
  • Enhanced create_zip() to walk folders recursively, preserve relative paths, and optionally use a NamedTemporaryFile for streaming large archives; added new API config gates.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
web/apiv2/views.py Introduces helpers + new streaming download endpoints (pcaps, pcapng, TLS keys, ETW, bulk zips) and updates legacy TLS pcap behavior
web/apiv2/urls.py Wires up new task download routes
lib/cuckoo/common/utils.py Improves zip creation (recursive walk, relative paths, temp-file option) to support new bulk download endpoints
conf/default/api.conf.default Adds new API gating sections for TLS keys, ETW logs, and bulk archive downloads

Comment thread web/apiv2/views.py
Comment on lines +1717 to +1727
Uses `create_zip` (password = ZIP_PWD) for parity with tasks_dropped /
tasks_payloadfiles. Returns a Response with a JSON error if the folder
doesn't exist or is empty."""
srcdir = os.path.join(CUCKOO_ROOT, "storage", "analyses", "%s" % task_id, rel_folder)
if not os.path.normpath(srcdir).startswith(ANALYSIS_BASE_PATH):
return Response({"error": True, "error_value": "Invalid path"})
if not path_exists(srcdir) or not os.listdir(srcdir):
return Response({"error": True, "error_value": empty_msg or f"No {rel_folder} artifacts for task {task_id}"})
mem_zip = create_zip(folder=srcdir, encrypted=True, temp_file=True)
if mem_zip is False:
return Response({"error": True, "error_value": "Can't create zip archive"})
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_serve_folder_zip() assumes encrypted=True yields an AES-encrypted archive, but create_zip() only encrypts when pyzipper is available; otherwise it silently produces an unencrypted zip. If these endpoints must always be encrypted for safety, consider explicitly checking HAVE_PYZIPPER and returning a clear error when it's missing (or adjust the messaging/docs accordingly).

Suggested change
Uses `create_zip` (password = ZIP_PWD) for parity with tasks_dropped /
tasks_payloadfiles. Returns a Response with a JSON error if the folder
doesn't exist or is empty."""
srcdir = os.path.join(CUCKOO_ROOT, "storage", "analyses", "%s" % task_id, rel_folder)
if not os.path.normpath(srcdir).startswith(ANALYSIS_BASE_PATH):
return Response({"error": True, "error_value": "Invalid path"})
if not path_exists(srcdir) or not os.listdir(srcdir):
return Response({"error": True, "error_value": empty_msg or f"No {rel_folder} artifacts for task {task_id}"})
mem_zip = create_zip(folder=srcdir, encrypted=True, temp_file=True)
if mem_zip is False:
return Response({"error": True, "error_value": "Can't create zip archive"})
Uses AES encryption with `pyzipper` (password = ZIP_PWD). Returns a
Response with a JSON error if the folder doesn't exist, is empty, or the
archive cannot be created."""
srcdir = os.path.join(CUCKOO_ROOT, "storage", "analyses", "%s" % task_id, rel_folder)
if not os.path.normpath(srcdir).startswith(ANALYSIS_BASE_PATH):
return Response({"error": True, "error_value": "Invalid path"})
if not path_exists(srcdir) or not os.listdir(srcdir):
return Response({"error": True, "error_value": empty_msg or f"No {rel_folder} artifacts for task {task_id}"})
mem_zip = tempfile.NamedTemporaryFile(delete=True)
written = 0
try:
with pyzipper.AESZipFile(mem_zip, "w", compression=pyzipper.ZIP_DEFLATED, encryption=pyzipper.WZ_AES) as zf:
zf.setpassword(ZIP_PWD.encode())
for root, _, files in os.walk(srcdir):
for filename in files:
filepath = os.path.join(root, filename)
if path_exists(filepath) and os.path.getsize(filepath) > 0:
arcname = os.path.relpath(filepath, srcdir)
zf.write(filepath, arcname)
written += 1
except Exception:
mem_zip.close()
return Response({"error": True, "error_value": "Can't create zip archive"})
if not written:
mem_zip.close()
return Response({"error": True, "error_value": empty_msg or f"No {rel_folder} artifacts for task {task_id}"})

Copilot uses AI. Check for mistakes.
Comment thread web/apiv2/views.py Outdated
Comment on lines +1671 to +1672
if check_tlp and check.get("tlp", "") in ("red", "Red"):
return None, Response({"error": True, "error_value": "Task has a TLP of RED"})
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TLP:RED guard only checks for "red"/"Red". Elsewhere the codebase treats "RED" as a possible value too, so tasks with tlp="RED" would bypass this check and allow downloads. Consider normalizing (e.g., tlp.lower() == "red") or including "RED" in the allowed list.

Copilot uses AI. Check for mistakes.
Comment thread web/apiv2/views.py
Comment on lines +1663 to +1664
the same key. TLP:RED checks are skipped only for endpoints that need
to serve regardless (none at present)."""
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _resolve_task_id docstring says TLP:RED checks are skipped only for endpoints that need it ("none at present"), but tasks_tlspcap() calls it with check_tlp=False. Either update the docstring (and rationale) or enable the TLP check here for consistency with other download endpoints.

Suggested change
the same key. TLP:RED checks are skipped only for endpoints that need
to serve regardless (none at present)."""
the same key. By default, TLP:RED tasks are blocked; callers may pass
`check_tlp=False` for endpoints that intentionally need to bypass that
restriction."""

Copilot uses AI. Check for mistakes.
Comment thread web/apiv2/views.py
Comment on lines +1756 to +1758
resp["Content-Length"] = os.path.getsize(srcfile)
resp["Content-Disposition"] = f"attachment; filename={task_id}_{fname}"
return resp
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tasks_tlspcap is described as a back-compat endpoint, but it changes the downloaded filename from the previous *_tls.pcap to *_{dump_decrypted.pcap|tls.pcap}. Clients that key off the attachment name may break; consider keeping the historical filename (e.g., always <task>_tls.pcap) while serving the preferred content.

Copilot uses AI. Check for mistakes.
Comment thread web/apiv2/views.py
Comment on lines +1682 to +1686
srcfile = os.path.join(CUCKOO_ROOT, "storage", "analyses", "%s" % task_id, rel_path)
if not os.path.normpath(srcfile).startswith(ANALYSIS_BASE_PATH):
return render(request, "error.html", {"error": f"File not found: {os.path.basename(srcfile)}"})
if path_exists(srcfile):
fname = "%s_tls.pcap" % task_id
resp = StreamingHttpResponse(FileWrapper(open(srcfile, "rb"), 8096), content_type="application/vnd.tcpdump.pcap")
resp["Content-Length"] = os.path.getsize(srcfile)
resp["Content-Disposition"] = "attachment; filename=" + fname
return resp
return Response({"error": True, "error_value": "Invalid path"})
if not path_exists(srcfile) or os.path.getsize(srcfile) == 0:
return Response({"error": True, "error_value": f"{os.path.basename(rel_path)} does not exist"})
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path validation uses normpath(...).startswith(ANALYSIS_BASE_PATH), which does not account for symlinks. If an analysis artifact path is a symlink to a file outside the analysis directory, this check would still pass and could expose host files. Consider validating with path_safe()/Path.resolve() and/or rejecting symlinks before opening the file.

Copilot uses AI. Check for mistakes.
parent_folder = os.path.dirname(file).rsplit(os.sep, 1)[-1]
path = os.path.join(parent_folder, os.path.basename(file))

zf.write(file, path)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zf.write(file, path) will follow symlinks by default. If any archived entry is a symlink (e.g., introduced via extracted artifacts), this can zip and expose the link target outside the intended folder. Consider skipping symlinks and/or using follow_symlinks=False plus path_safe()/Path.resolve() validation for each file before archiving.

Copilot uses AI. Check for mistakes.
Comment thread web/apiv2/views.py Outdated
so Wireshark can decrypt flows inline, no sidecar required."""
task_id, err = _resolve_task_id(task_id, "taskpcap")
if err:
return err
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pcapng endpoint embeds TLS secrets into the capture, but it is currently gated only by [taskpcap]. This effectively bypasses the new [tasktlskeys] control and may leak decryption keys even when TLS key downloads are disabled. Consider requiring both gates (taskpcap + tasktlskeys) or moving pcapng behind a dedicated TLS-keys-related gate.

Suggested change
return err
return err
task_id, err = _resolve_task_id(task_id, "tasktlskeys")
if err:
return err

Copilot uses AI. Check for mistakes.
…bulk archives

CAPE supports several TLS-interception pipelines that each produce
different on-disk artifacts — PolarProxy writes polarproxy/tls.pcap and
its processor mergecaps it into dump.pcap; SSLproxy writes a synthetic
sslproxy/sslproxy.pcap plus an NSS keylog; the tlsdump / sslkeylogfile
hooks produce in-guest keylogs that decryptpcap feeds through GoGoRoboCap
to yield dump_decrypted.pcap and dump_mixed.pcap. The web UI
file-download view already surfaces all of these, but the REST API only
served dump.pcap plus a legacy tasks_tlspcap endpoint hard-coded to
polarproxy/tls.pcap — fine for PolarProxy operators, 404s for everyone
else. The new ETW / AMSI telemetry and the three in-guest keylogs have
no download path at all.

This brings apiv2 to parity with the web UI and wires up the newer
artifacts. Endpoints are parameterised rather than split per-artifact so
operators see four route shapes instead of sixteen:

  tasks/get/pcap/<id>/                  dump.pcap (existing, unchanged)
  tasks/get/pcap/<id>/<variant>/        variant ∈ {decrypted, mixed,
                                        sslproxy, zip, pcapng}
  tasks/get/keys/<id>/<kind>/           kind ∈ {tls, ssl, master} —
                                        NSS-format keylogs from the three
                                        hook sources (tls: MockSSL;
                                        ssl: bcrypt+ncrypt; master:
                                        SSLproxy)
  tasks/get/etw/<id>/<kind>/            kind ∈ {dns, network, wmi} NDJSON
                                        streams; kind == amsi zips the
                                        AMSI script buffers
  tasks/get/bulkzip/<id>/<folder>/      folder ∈ {logs, network, memory,
                                        selfextracted} — AES-zipped with
                                        ZIP_PWD for parity with
                                        tasks_dropped / tasks_payloadfiles
                                        / tasks_procdumpfiles
  tasks/get/tlspcap/<id>/               existing endpoint; now prefers
                                        dump_decrypted.pcap and falls
                                        back to polarproxy/tls.pcap, so
                                        both TLS pipelines serve from the
                                        same URL

Three new apiconf sections gate the sensitive / bulk categories
separately:

  [tasktlskeys]  TLS key material (decrypts captured flows)
  [tasketw]      ETW JSON logs
  [taskbulkzip]  whole-directory archives

PCAP variants reuse [taskpcap] since operators who opted into pcap access
already implicitly trust the caller with packet-capture data.

create_zip gains a recursive os.walk (replacing os.listdir) with
relative-path preservation, so bulk archives of nested directories —
notably logs/filestore/<bucket>/* — now include their contents instead
of silently dropping everything below the top level. A new temp_file=True
option routes the archive through a disk-backed NamedTemporaryFile so
large folders stream without loading the full archive into RAM; the
bulkzip handler uses this mode.

The pcapng variant generates into a per-request NamedTemporaryFile and
unlinks it as soon as the fd is handed to FileWrapper. Writing the
pcapng to a shared path inside the analysis dir raced: two concurrent
callers could stream each other truncated or partially-overwritten
output.

Variant / kind / folder inputs are matched against a static whitelist
before any path is built, so the URL parameter can't be used to probe
paths outside the analysis dir.

Implementation uses shared helpers — _resolve_task_id,
_serve_analysis_file, _zip_paths, _serve_folder_zip, _pcapng_response,
_pcapzip_response — so each of the four new handlers reduces to a small
dispatch table.
@wmetcalf wmetcalf force-pushed the apiv2-download-endpoints branch from f5f3e55 to eeddca8 Compare April 24, 2026 19:23
@wmetcalf
Copy link
Copy Markdown
Contributor Author

Force-pushed an updated version addressing both reviews:

@doomedraven — collapsed the 16 separate endpoints into four parameterised routes:

  • tasks/get/pcap/<id>/<variant>/variant ∈ {decrypted, mixed, sslproxy, zip, pcapng}
  • tasks/get/keys/<id>/<kind>/kind ∈ {tls, ssl, master}
  • tasks/get/etw/<id>/<kind>/kind ∈ {dns, network, wmi, amsi}
  • tasks/get/bulkzip/<id>/<folder>/folder ∈ {logs, network, memory, selfextracted}

Each handler validates the parameter against a static whitelist before touching the filesystem, so the URL param can't probe outside the analysis dir. The apiconf sections stay the same ([tasktlskeys], [tasketw], [taskbulkzip]). tasks/get/pcap/<id>/ (no variant) and tasks/get/tlspcap/<id>/ keep their existing shape for back-compat.

@gemini-code-assist — all three inline comments addressed:

  1. pcapng generates into a per-request NamedTemporaryFile and unlinks the path as soon as the fd is handed to FileWrapper, so concurrent callers can't race on a shared path inside the analysis dir.
  2. TLP check in _resolve_task_id normalises with .lower() so RED / Red / red all match.
  3. pcapng Content-Type is now application/x-pcapng.

The commit message also drops the earlier mischaracterisation of PolarProxy — it's still a first-class TLS pipeline (its aux module writes polarproxy/tls.pcap and its processor mergecaps it into dump.pcap), the tasks_tlspcap fix just teaches the endpoint about dump_decrypted.pcap so the SSLproxy + decryptpcap pipeline is served too.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the update, @wmetcalf. The refactoring into parameterized routes and the implementation of the requested security and performance improvements (like the NamedTemporaryFile usage and TLP normalization) look excellent. The approach of using a static whitelist for URL parameters is a robust way to prevent directory traversal. This PR is now in a great state.

@kevoreilly kevoreilly merged commit 781161f into kevoreilly:master May 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants