apiv2: add download endpoints for pcap variants, TLS keys, ETW logs, bulk archives#2994
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces several new API endpoints for downloading analysis artifacts, such as decrypted PCAPs, ETW logs, and bulk ZIP archives of logs, memory dumps, and network data. It also refactors the download logic into reusable helper functions and enhances the create_zip utility to support recursive directory compression and disk-backed temporary files. Feedback highlights a potential race condition when generating PCAPNG files at a fixed path, suggests making TLP checks case-insensitive for better consistency, and recommends using a more accurate MIME type for PCAPNG downloads.
| tls_log_path = os.path.join(adir, "tlsdump", "tlsdump.log") | ||
| ssl_key_log_path = os.path.join(adir, "aux", "sslkeylogfile", "sslkeys.log") | ||
| pcapng = PcapToNg(pcap_path, tls_log_path, ssl_key_log_path) | ||
| pcapng.generate(out_path) |
There was a problem hiding this comment.
Generating the PCAPNG file at a fixed path (dump.pcapng) within the analysis directory introduces a race condition. Concurrent API requests for the same task will attempt to write to the same file simultaneously, potentially leading to file corruption or serving incomplete data. Consider using a file lock or generating the PCAPNG to a unique temporary file for each request if it is not intended to be a persistent, shared artifact.
| return Response(check) | ||
|
|
||
| return None, Response(check) | ||
| if check_tlp and check.get("tlp", "") in ("red", "Red"): |
There was a problem hiding this comment.
The TLP check is case-sensitive and only accounts for "red" and "Red". It should be made case-insensitive to correctly handle "RED" as well, ensuring consistency with other TLP checks in the codebase.
| if check_tlp and check.get("tlp", "") in ("red", "Red"): | |
| if check_tlp and (check.get("tlp") or "").lower() == "red": |
| return _serve_analysis_file(task_id, "dump.pcapng", "dump.pcapng", | ||
| content_type="application/vnd.tcpdump.pcap") |
There was a problem hiding this comment.
The Content-Type for PCAPNG files should be application/x-pcapng or application/vnd.tcpdump.pcapng to accurately reflect the file format and ensure proper handling by client applications.
| return _serve_analysis_file(task_id, "dump.pcapng", "dump.pcapng", | |
| content_type="application/vnd.tcpdump.pcap") | |
| return _serve_analysis_file(task_id, "dump.pcapng", "dump.pcapng", | |
| content_type="application/x-pcapng") |
|
that is a bit overflow, would be easier to do: |
There was a problem hiding this comment.
Pull request overview
Adds new apiv2 download endpoints to bring REST API artifact access closer to the web UI, including multiple PCAP variants, TLS key material, ETW telemetry, and bulk directory archives; also fixes/extends zip creation to support recursive folder zipping and disk-backed streaming.
Changes:
- Added shared download helpers and 16 new artifact endpoints in
web/apiv2/views.py(pcap variants, pcapng, TLS keys, ETW, bulk zips) and updatedtasks_tlspcapfallback behavior. - Registered new routes for these endpoints in
web/apiv2/urls.py. - Enhanced
create_zip()to walk folders recursively, preserve relative paths, and optionally use aNamedTemporaryFilefor streaming large archives; added new API config gates.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| web/apiv2/views.py | Introduces helpers + new streaming download endpoints (pcaps, pcapng, TLS keys, ETW, bulk zips) and updates legacy TLS pcap behavior |
| web/apiv2/urls.py | Wires up new task download routes |
| lib/cuckoo/common/utils.py | Improves zip creation (recursive walk, relative paths, temp-file option) to support new bulk download endpoints |
| conf/default/api.conf.default | Adds new API gating sections for TLS keys, ETW logs, and bulk archive downloads |
| Uses `create_zip` (password = ZIP_PWD) for parity with tasks_dropped / | ||
| tasks_payloadfiles. Returns a Response with a JSON error if the folder | ||
| doesn't exist or is empty.""" | ||
| srcdir = os.path.join(CUCKOO_ROOT, "storage", "analyses", "%s" % task_id, rel_folder) | ||
| if not os.path.normpath(srcdir).startswith(ANALYSIS_BASE_PATH): | ||
| return Response({"error": True, "error_value": "Invalid path"}) | ||
| if not path_exists(srcdir) or not os.listdir(srcdir): | ||
| return Response({"error": True, "error_value": empty_msg or f"No {rel_folder} artifacts for task {task_id}"}) | ||
| mem_zip = create_zip(folder=srcdir, encrypted=True, temp_file=True) | ||
| if mem_zip is False: | ||
| return Response({"error": True, "error_value": "Can't create zip archive"}) |
There was a problem hiding this comment.
_serve_folder_zip() assumes encrypted=True yields an AES-encrypted archive, but create_zip() only encrypts when pyzipper is available; otherwise it silently produces an unencrypted zip. If these endpoints must always be encrypted for safety, consider explicitly checking HAVE_PYZIPPER and returning a clear error when it's missing (or adjust the messaging/docs accordingly).
| Uses `create_zip` (password = ZIP_PWD) for parity with tasks_dropped / | |
| tasks_payloadfiles. Returns a Response with a JSON error if the folder | |
| doesn't exist or is empty.""" | |
| srcdir = os.path.join(CUCKOO_ROOT, "storage", "analyses", "%s" % task_id, rel_folder) | |
| if not os.path.normpath(srcdir).startswith(ANALYSIS_BASE_PATH): | |
| return Response({"error": True, "error_value": "Invalid path"}) | |
| if not path_exists(srcdir) or not os.listdir(srcdir): | |
| return Response({"error": True, "error_value": empty_msg or f"No {rel_folder} artifacts for task {task_id}"}) | |
| mem_zip = create_zip(folder=srcdir, encrypted=True, temp_file=True) | |
| if mem_zip is False: | |
| return Response({"error": True, "error_value": "Can't create zip archive"}) | |
| Uses AES encryption with `pyzipper` (password = ZIP_PWD). Returns a | |
| Response with a JSON error if the folder doesn't exist, is empty, or the | |
| archive cannot be created.""" | |
| srcdir = os.path.join(CUCKOO_ROOT, "storage", "analyses", "%s" % task_id, rel_folder) | |
| if not os.path.normpath(srcdir).startswith(ANALYSIS_BASE_PATH): | |
| return Response({"error": True, "error_value": "Invalid path"}) | |
| if not path_exists(srcdir) or not os.listdir(srcdir): | |
| return Response({"error": True, "error_value": empty_msg or f"No {rel_folder} artifacts for task {task_id}"}) | |
| mem_zip = tempfile.NamedTemporaryFile(delete=True) | |
| written = 0 | |
| try: | |
| with pyzipper.AESZipFile(mem_zip, "w", compression=pyzipper.ZIP_DEFLATED, encryption=pyzipper.WZ_AES) as zf: | |
| zf.setpassword(ZIP_PWD.encode()) | |
| for root, _, files in os.walk(srcdir): | |
| for filename in files: | |
| filepath = os.path.join(root, filename) | |
| if path_exists(filepath) and os.path.getsize(filepath) > 0: | |
| arcname = os.path.relpath(filepath, srcdir) | |
| zf.write(filepath, arcname) | |
| written += 1 | |
| except Exception: | |
| mem_zip.close() | |
| return Response({"error": True, "error_value": "Can't create zip archive"}) | |
| if not written: | |
| mem_zip.close() | |
| return Response({"error": True, "error_value": empty_msg or f"No {rel_folder} artifacts for task {task_id}"}) |
| if check_tlp and check.get("tlp", "") in ("red", "Red"): | ||
| return None, Response({"error": True, "error_value": "Task has a TLP of RED"}) |
There was a problem hiding this comment.
The TLP:RED guard only checks for "red"/"Red". Elsewhere the codebase treats "RED" as a possible value too, so tasks with tlp="RED" would bypass this check and allow downloads. Consider normalizing (e.g., tlp.lower() == "red") or including "RED" in the allowed list.
| the same key. TLP:RED checks are skipped only for endpoints that need | ||
| to serve regardless (none at present).""" |
There was a problem hiding this comment.
The _resolve_task_id docstring says TLP:RED checks are skipped only for endpoints that need it ("none at present"), but tasks_tlspcap() calls it with check_tlp=False. Either update the docstring (and rationale) or enable the TLP check here for consistency with other download endpoints.
| the same key. TLP:RED checks are skipped only for endpoints that need | |
| to serve regardless (none at present).""" | |
| the same key. By default, TLP:RED tasks are blocked; callers may pass | |
| `check_tlp=False` for endpoints that intentionally need to bypass that | |
| restriction.""" |
| resp["Content-Length"] = os.path.getsize(srcfile) | ||
| resp["Content-Disposition"] = f"attachment; filename={task_id}_{fname}" | ||
| return resp |
There was a problem hiding this comment.
tasks_tlspcap is described as a back-compat endpoint, but it changes the downloaded filename from the previous *_tls.pcap to *_{dump_decrypted.pcap|tls.pcap}. Clients that key off the attachment name may break; consider keeping the historical filename (e.g., always <task>_tls.pcap) while serving the preferred content.
| srcfile = os.path.join(CUCKOO_ROOT, "storage", "analyses", "%s" % task_id, rel_path) | ||
| if not os.path.normpath(srcfile).startswith(ANALYSIS_BASE_PATH): | ||
| return render(request, "error.html", {"error": f"File not found: {os.path.basename(srcfile)}"}) | ||
| if path_exists(srcfile): | ||
| fname = "%s_tls.pcap" % task_id | ||
| resp = StreamingHttpResponse(FileWrapper(open(srcfile, "rb"), 8096), content_type="application/vnd.tcpdump.pcap") | ||
| resp["Content-Length"] = os.path.getsize(srcfile) | ||
| resp["Content-Disposition"] = "attachment; filename=" + fname | ||
| return resp | ||
| return Response({"error": True, "error_value": "Invalid path"}) | ||
| if not path_exists(srcfile) or os.path.getsize(srcfile) == 0: | ||
| return Response({"error": True, "error_value": f"{os.path.basename(rel_path)} does not exist"}) |
There was a problem hiding this comment.
The path validation uses normpath(...).startswith(ANALYSIS_BASE_PATH), which does not account for symlinks. If an analysis artifact path is a symlink to a file outside the analysis directory, this check would still pass and could expose host files. Consider validating with path_safe()/Path.resolve() and/or rejecting symlinks before opening the file.
| parent_folder = os.path.dirname(file).rsplit(os.sep, 1)[-1] | ||
| path = os.path.join(parent_folder, os.path.basename(file)) | ||
|
|
||
| zf.write(file, path) |
There was a problem hiding this comment.
zf.write(file, path) will follow symlinks by default. If any archived entry is a symlink (e.g., introduced via extracted artifacts), this can zip and expose the link target outside the intended folder. Consider skipping symlinks and/or using follow_symlinks=False plus path_safe()/Path.resolve() validation for each file before archiving.
| so Wireshark can decrypt flows inline, no sidecar required.""" | ||
| task_id, err = _resolve_task_id(task_id, "taskpcap") | ||
| if err: | ||
| return err |
There was a problem hiding this comment.
The pcapng endpoint embeds TLS secrets into the capture, but it is currently gated only by [taskpcap]. This effectively bypasses the new [tasktlskeys] control and may leak decryption keys even when TLS key downloads are disabled. Consider requiring both gates (taskpcap + tasktlskeys) or moving pcapng behind a dedicated TLS-keys-related gate.
| return err | |
| return err | |
| task_id, err = _resolve_task_id(task_id, "tasktlskeys") | |
| if err: | |
| return err |
…bulk archives
CAPE supports several TLS-interception pipelines that each produce
different on-disk artifacts — PolarProxy writes polarproxy/tls.pcap and
its processor mergecaps it into dump.pcap; SSLproxy writes a synthetic
sslproxy/sslproxy.pcap plus an NSS keylog; the tlsdump / sslkeylogfile
hooks produce in-guest keylogs that decryptpcap feeds through GoGoRoboCap
to yield dump_decrypted.pcap and dump_mixed.pcap. The web UI
file-download view already surfaces all of these, but the REST API only
served dump.pcap plus a legacy tasks_tlspcap endpoint hard-coded to
polarproxy/tls.pcap — fine for PolarProxy operators, 404s for everyone
else. The new ETW / AMSI telemetry and the three in-guest keylogs have
no download path at all.
This brings apiv2 to parity with the web UI and wires up the newer
artifacts. Endpoints are parameterised rather than split per-artifact so
operators see four route shapes instead of sixteen:
tasks/get/pcap/<id>/ dump.pcap (existing, unchanged)
tasks/get/pcap/<id>/<variant>/ variant ∈ {decrypted, mixed,
sslproxy, zip, pcapng}
tasks/get/keys/<id>/<kind>/ kind ∈ {tls, ssl, master} —
NSS-format keylogs from the three
hook sources (tls: MockSSL;
ssl: bcrypt+ncrypt; master:
SSLproxy)
tasks/get/etw/<id>/<kind>/ kind ∈ {dns, network, wmi} NDJSON
streams; kind == amsi zips the
AMSI script buffers
tasks/get/bulkzip/<id>/<folder>/ folder ∈ {logs, network, memory,
selfextracted} — AES-zipped with
ZIP_PWD for parity with
tasks_dropped / tasks_payloadfiles
/ tasks_procdumpfiles
tasks/get/tlspcap/<id>/ existing endpoint; now prefers
dump_decrypted.pcap and falls
back to polarproxy/tls.pcap, so
both TLS pipelines serve from the
same URL
Three new apiconf sections gate the sensitive / bulk categories
separately:
[tasktlskeys] TLS key material (decrypts captured flows)
[tasketw] ETW JSON logs
[taskbulkzip] whole-directory archives
PCAP variants reuse [taskpcap] since operators who opted into pcap access
already implicitly trust the caller with packet-capture data.
create_zip gains a recursive os.walk (replacing os.listdir) with
relative-path preservation, so bulk archives of nested directories —
notably logs/filestore/<bucket>/* — now include their contents instead
of silently dropping everything below the top level. A new temp_file=True
option routes the archive through a disk-backed NamedTemporaryFile so
large folders stream without loading the full archive into RAM; the
bulkzip handler uses this mode.
The pcapng variant generates into a per-request NamedTemporaryFile and
unlinks it as soon as the fd is handed to FileWrapper. Writing the
pcapng to a shared path inside the analysis dir raced: two concurrent
callers could stream each other truncated or partially-overwritten
output.
Variant / kind / folder inputs are matched against a static whitelist
before any path is built, so the URL parameter can't be used to probe
paths outside the analysis dir.
Implementation uses shared helpers — _resolve_task_id,
_serve_analysis_file, _zip_paths, _serve_folder_zip, _pcapng_response,
_pcapzip_response — so each of the four new handlers reduces to a small
dispatch table.
f5f3e55 to
eeddca8
Compare
|
Force-pushed an updated version addressing both reviews: @doomedraven — collapsed the 16 separate endpoints into four parameterised routes:
Each handler validates the parameter against a static whitelist before touching the filesystem, so the URL param can't probe outside the analysis dir. The apiconf sections stay the same ( @gemini-code-assist — all three inline comments addressed:
The commit message also drops the earlier mischaracterisation of PolarProxy — it's still a first-class TLS pipeline (its aux module writes |
|
Thank you for the update, @wmetcalf. The refactoring into parameterized routes and the implementation of the requested security and performance improvements (like the |
Summary
The web UI file-download view has long exposed
dump_decrypted.pcap,dump_mixed.pcap, pcap bundles, pcapng (with embedded keys), tlsdump keylog, and whole-directory archives, but the REST API only serveddump.pcapplus atasks_tlspcapendpoint hard-coded topolarproxy/tls.pcap— a path no longer produced since CAPE moved to SSLproxy + GoGoRoboCap. Programmatic consumers currently have no way to pull the decrypted / merged pcaps or any of the ETW-based telemetry that now drives process attribution.This PR brings apiv2 to parity with the web UI and adds endpoints for several artifacts that have no download path at all today.
New endpoints
PCAP variants (gated by existing
[taskpcap]):tasks/get/decrypted_pcap/<id>/dump_decrypted.pcaptasks/get/mixed_pcap/<id>/dump_mixed.pcaptasks/get/sslproxy_pcap/<id>/sslproxy/sslproxy.pcaptasks/get/pcapzip/<id>/tasks/get/pcapng/<id>/TLS key material (new
[tasktlskeys]— gated separately since these decrypt captured flows):tasks/get/tlskeys/<id>/tlsdump/tlsdump.log(NSS keylog)tasks/get/sslkeys/<id>/aux/sslkeylogfile/sslkeys.logtasks/get/masterkeys/<id>/sslproxy/master_keys.logETW telemetry (new
[tasketw]):tasks/get/dns_etw/<id>/aux/dns_etw.json(NDJSON)tasks/get/network_etw/<id>/aux/network_etw.jsontasks/get/wmi_etw/<id>/aux/wmi_etw.jsontasks/get/amsi_etw/<id>/aux/amsi_etw/Bulk directory archives (new
[taskbulkzip]— AES-encrypted withZIP_PWDfor parity withtasks_dropped/tasks_payloadfiles/tasks_procdumpfiles):tasks/get/logs_all/<id>/logs/tasks/get/network_all/<id>/network/tasks/get/memdump_all/<id>/memory/tasks/get/selfextracted_all/<id>/selfextracted/Fixes
tasks_tlspcapnow prefersdump_decrypted.pcapwith apolarproxy/tls.pcapfallback so legacy callers keep working on modern analyses (previously returned 404 on every post-PolarProxy run).create_zippicks up a recursiveos.walk(replacingos.listdir) and now preserves relative paths. Previously, bulk archives of nested directories — notablylogs/filestore/<bucket>/*— silently dropped everything below the top level. Also gains an optionaltemp_file=Truethat routes the archive through a disk-backedNamedTemporaryFilefor memory-friendly streaming of large folders (used by the new bulk endpoints).Gating
PCAP variants reuse the existing
[taskpcap]section on the theory that operators who opted into pcap access already implicitly trust the caller with packet-capture data. TLS keys, ETW JSON, and whole-directory archives get their own sections so they can be enabled/disabled independently of standard pcap access.Implementation
Four shared helpers —
_resolve_task_id,_serve_analysis_file,_zip_paths,_serve_folder_zip— factor the boilerplate so each new handler reduces to 4–8 lines of actual logic.Test plan
Content-Typeand body (verified against live CAPE deployment on tasks with and without TLS decryption data)tasks_tlspcapnow servesdump_decrypted.pcapon modern analyses (same SHA256 astasks_decrypted_pcap) and falls back cleanly when nothing is presentpcapngendpoint generates a valid PCAPNG v1.0 with TLS keys embedded (file /tmp/t.pcapng→pcapng capture file - version 1.0)ZIP_PWD=infected(7z t -pinfected){"error": true, "error_value": "Task does not exist"}logs_allon a task wherelogs/has 22 files total, 12 top-level and 10 underlogs/filestore/*/*; old code shipped only 12 + empty directory entry, new code ships all 22create_zipcallers (flat dirs produce identical output)