Guac session auth, EVTX periodic snapshots, and web fixes by wmetcalf · Pull Request #2964 · kevoreilly/CAPEv2

wmetcalf · 2026-04-02T21:32:53Z

Summary

Guac session security: Replace client-side VNC port/IP with server-side libvirt lookup and database-backed session tokens. WebSocket consumer validates session cookie, verifies task is running, and auto-disconnects on task completion. VNC display scales to fit browser viewport.
EVTX periodic snapshots: Collect EVTX snapshots every 30s during analysis to protect against malware clearing event logs. Web viewer groups snapshots by channel with regex search filtering and analyzer noise removal.
Web fixes: Add missing crispy_forms_tags to account templates; recover original filename on task resubmit.

New files

lib/cuckoo/core/data/guac_session.py — GuacSession SQLAlchemy model (table auto-created on startup via create_all)

Test plan

Open interactive desktop session — verify VNC connects with no IP/port in browser URL or JS
Confirm VNC auto-disconnects when task completes
Submit malware that clears event logs — verify snapshots preserve pre-cleared events
Test event log search filtering in web UI
Verify account pages (password change/reset, email) render without template errors
Resubmit a task and confirm original filename is preserved

Note

The companion modules/processing/sigma.py changes for multi-snapshot EVTX support will be submitted as a separate PR against CAPEsandbox/community.

- Replace client-side VNC port/IP passing with server-side libvirt lookup and database-backed session tokens - Add GuacSession model for tying browser sessions to task lifecycle - WebSocket consumer validates session cookie, checks task is running, and auto-disconnects when task ends - Remove guest_ip/vncport from JS client (no longer sent to browser) - Scale VNC display to fit browser viewport with correct mouse mapping

- Collect EVTX snapshots every 30s to protect against log clearing - Add regex search filtering for event log viewer - Group snapshot files by channel with proper EVTX header validation - Filter analyzer process noise from displayed events

- Add missing crispy_forms_tags load to account templates - Recover original filename from task DB on resubmit

- Handle multiple evtx zip files (snapshots + final) instead of single evtx.zip - Extract each snapshot into separate subdirectory for zircolite processing - Add evtx_dump-based pre-filtering to strip analyzer process noise - Load exclude patterns from filters.json pre_filters config - Formatting fixes (black/ruff compliance) Companion to kevoreilly/CAPEv2#2964 which adds the periodic EVTX snapshot collection in the guest analyzer.

gemini-code-assist

Code Review

This pull request introduces periodic EVTX snapshots in the Windows analyzer to protect against log wiping by malware and adds regex search capabilities for event logs in the web UI. It also implements a more secure Guacamole session management system using database-backed tokens tied to task lifecycles. Key feedback includes addressing lexicographical sorting issues for snapshots that would break chronological order after nine snapshots, refactoring duplicated noise filter logic to improve performance, and using more appropriate database API calls for retrieving task metadata.

gemini-code-assist · 2026-04-02T21:35:44Z

+
+        with zipfile.ZipFile(self.evtx_dump, "w", zipfile.ZIP_DEFLATED) as zip_obj:
+            if os.path.isdir(snapshot_base):
+                for snap_name in sorted(os.listdir(snapshot_base)):


Sorting directory names lexicographically will result in an incorrect chronological order when the snapshot count exceeds 9 (e.g., '10' will come before '2'). This affects the order in which events are merged into the final archive and subsequently displayed in the web UI.

Suggested change

for snap_name in sorted(os.listdir(snapshot_base)):

for snap_name in sorted(os.listdir(snapshot_base), key=lambda x: int(x) if x.isdigit() else x):

Fixed in 4a93260 — using numeric sort key.

gemini-code-assist · 2026-04-02T21:35:44Z

+                    members_to_extract.append(m)
+            if not members_to_extract:
+                raise ValueError(f"No EVTX members found for channel: {channel}")
+            members_to_extract.sort()


Lexicographical sorting of snapshot-prefixed filenames will cause events to be chained in the wrong order once the snapshot count reaches double digits (e.g., 10_Security.evtx will be processed before 2_Security.evtx). A numeric sort based on the snapshot prefix should be used to maintain chronological integrity.

Suggested change

members_to_extract.sort()

members_to_extract.sort(key=lambda x: int(x.split("_")[0]) if "_" in x and x.split("_")[0].isdigit() else x)

Fixed in 4a93260 — numeric sort on snapshot prefix.

gemini-code-assist · 2026-04-02T21:35:44Z

+        evtx_paths = sorted(
+            os.path.join(tmpdir, m) for m in members_to_extract
+            if os.path.exists(os.path.join(tmpdir, m))
+        )


This sorted() call is redundant if members_to_extract is already sorted correctly. Furthermore, it re-introduces the lexicographical sorting issue on the full file paths. It should be replaced with a simple list comprehension that preserves the existing order.

Suggested change

evtx_paths = sorted(

os.path.join(tmpdir, m) for m in members_to_extract

if os.path.exists(os.path.join(tmpdir, m))

)

evtx_paths = [

os.path.join(tmpdir, m) for m in members_to_extract

if os.path.exists(os.path.join(tmpdir, m))

]

Fixed in 4a93260 — replaced with list comprehension preserving pre-sorted order.

gemini-code-assist · 2026-04-02T21:35:44Z

+        # Load analyzer noise filter from shared config
+        _ANALYZER_PARENTS = set()
+        _ANALYZER_IMAGES = set()
+        try:
+            import os as _os
+            _cape_root = _os.path.dirname(_os.path.dirname(_os.path.dirname(_os.path.abspath(__file__))))
+            for _fp in ["data/sigma/filters_local.json", "data/sigma/filters.json"]:
+                _full = _os.path.join(_cape_root, _fp)
+                if _os.path.exists(_full):
+                    with open(_full) as _f:
+                        _data = json.load(_f)
+                    _pf = _data.get("pre_filters", {})
+                    for _p in _pf.get("exclude_parent_processes", []):
+                        _ANALYZER_PARENTS.add(_p.lower())
+                    for _p in _pf.get("exclude_image_processes", []):
+                        _ANALYZER_IMAGES.add(_p.lower())
+            _ANALYZER_PATHS = set()
+            for _fp2 in ["data/sigma/filters_local.json", "data/sigma/filters.json"]:
+                _full2 = _os.path.join(_cape_root, _fp2)
+                if _os.path.exists(_full2):
+                    with open(_full2) as _f2:
+                        _data2 = json.load(_f2)
+                    for _p in _data2.get("pre_filters", {}).get("exclude_target_paths", []):
+                        _ANALYZER_PATHS.add(_p.lower())
+        except Exception:
+            pass
+        if not _ANALYZER_PARENTS:
+            _ANALYZER_PARENTS = {"icacls.exe", "python.exe", "wevtutil.exe"}
+        if not _ANALYZER_IMAGES:
+            _ANALYZER_IMAGES = {"wevtutil.exe", "conhost.exe"}


The logic for loading analyzer noise filters from JSON files is duplicated in _load_evtx_channel_page_cached and _count_evtx_channel_events_cached. It also performs redundant file I/O and manual path calculations instead of using the existing CUCKOO_ROOT constant. This should be refactored into a helper function or a module-level cached property to improve maintainability and performance.

Fixed in 4a93260 — extracted into _load_evtx_noise_filters() helper using CUCKOO_ROOT.

gemini-code-assist · 2026-04-02T21:35:44Z

+                    if task_id:
+                        resubmit_tasks = db.find_sample(task_id=task_id)
+                        if resubmit_tasks:
+                            original_target = resubmit_tasks[0].to_dict().get("target", "")
+                            if original_target:
+                                original_filename = sanitize_filename(os.path.basename(original_target))
+                    filename = base_dir + "/" + (original_filename or sanitize_filename(hash))


Using db.find_sample(task_id=task_id) to retrieve the original filename of a task is suboptimal. In the standard database API, db.view_task(task_id) should be used to get the Task object, which contains the target field. Additionally, find_sample typically returns Sample objects which do not have a target attribute in the standard schema.

if task_id: task = db.view_task(task_id) if task and task.target: original_filename = sanitize_filename(os.path.basename(task.target)) filename = base_dir + "/" + (original_filename or sanitize_filename(hash))

Fixed in 4a93260 — switched to db.view_task().

Copilot

Pull request overview

This PR enhances CAPEv2’s interactive Guacamole desktop access and Windows EVTX log retention/viewing by moving VNC connection details server-side with DB-backed session tokens, adding periodic EVTX snapshotting during analysis, and addressing a few web UI regressions.

Changes:

Add DB-backed Guacamole session tokens validated by the WebSocket consumer, with server-side libvirt VNC port lookup.
Implement periodic (30s) EVTX snapshots on the Windows analyzer and update the web EVTX viewer to group snapshot files by channel and support regex filtering + analyzer-noise removal.
Fix account templates missing crispy_forms_tags and preserve original filenames on task resubmit.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
web/web/urls.py	Routes `/guac/` into the main Django URL configuration.
web/web/settings.py	Enables Channels/Guac app in the main Django settings.
web/web/guac_settings.py	Initializes CAPE DB for guac ASGI settings and attempts table creation.
web/templates/analysis/eventlogs/index.html	Adds client-side regex search UI and passes `search` to the EVTX loader endpoint.
web/templates/analysis/eventlogs/_evtx_channel.html	Displays active regex filter in the EVTX channel header.
web/templates/account/password_reset.html	Loads `crispy_forms_tags` to prevent template errors.
web/templates/account/password_reset_from_key.html	Loads `crispy_forms_tags` to prevent template errors.
web/templates/account/password_change.html	Loads `crispy_forms_tags` to prevent template errors.
web/templates/account/email.html	Loads `crispy_forms_tags` to prevent template errors.
web/submission/views.py	Attempts to recover original filename during task resubmission.
web/static/js/guac-main.js	Removes client-side guest IP/VNC port usage; connects using session-only params.
web/guac/views.py	Creates DB token + cookie for guac session and renders guac page without exposing VNC details.
web/guac/templates/guac/index.html	Updates JS invocation to match new `GuacMe()` signature.
web/guac/consumers.py	Validates cookie token, checks task running, looks up VNC port via libvirt, and auto-disconnects.
web/analysis/views.py	Groups snapshot-prefixed EVTX files by channel; adds regex filtering and analyzer-noise filtering.
lib/cuckoo/core/database.py	Adds DB helper methods for creating/reading/deleting guac session tokens.
lib/cuckoo/core/data/guac_session.py	Introduces `GuacSession` SQLAlchemy model.
analyzer/windows/modules/auxiliary/evtx.py	Implements periodic EVTX snapshot export/wipe and merges snapshots into `evtx.zip` on stop.

Copilot · 2026-04-02T21:39:31Z

+    def create_guac_session(self, token, task_id, vm_label, guest_ip):
+        """Create a new guac session for a task."""
+        from lib.cuckoo.core.data.guac_session import GuacSession
+        session = self.session()
+        try:
+            guac = GuacSession(token=str(token), task_id=task_id, vm_label=vm_label, guest_ip=guest_ip)
+            session.add(guac)
+            session.commit()
+            return guac
+        except Exception:
+            session.rollback()
+            raise


The new Guac session helpers use a new scoped session instance and call commit()/rollback() directly. This can break when these helpers are called from Django views wrapped by DBTransactionMiddleware (which already manages a transaction with db.session.begin()), potentially causing invalid transaction state/errors. Prefer using the existing thread-local self.session within a nested transaction (e.g., begin_nested()) and avoid explicit commits here, letting the outer transaction middleware commit; also ensure the session is properly removed/closed when used outside request middleware (e.g., from async consumers).

Copilot · 2026-04-02T21:39:32Z

+_db = init_database(exists_ok=True)
+
+# Create guac_sessions table if guacamole is enabled
+if _CapeConfig('web').guacamole.get('enabled', False):
+    from lib.cuckoo.core.data.guac_session import GuacSession  # noqa: F401
+    from lib.cuckoo.core.data.db_common import Base
+    Base.metadata.create_all(_db.engine)
+


This settings module performs CAPE DB initialization and then calls Base.metadata.create_all() again. _Database.__init__() already calls Base.metadata.create_all(self.engine), so this is redundant and adds extra side effects at settings-import time (and can race in multi-worker startup). Consider removing the extra create_all() here and rely on init_database() (with the GuacSession model imported) to create the table.

Suggested change

_db = init_database(exists_ok=True)

# Create guac_sessions table if guacamole is enabled

if _CapeConfig('web').guacamole.get('enabled', False):

from lib.cuckoo.core.data.guac_session import GuacSession # noqa: F401

from lib.cuckoo.core.data.db_common import Base

Base.metadata.create_all(_db.engine)

# Import GuacSession before database initialization so init_database()

# creates the table as part of the normal metadata setup.

if _CapeConfig('web').guacamole.get('enabled', False):

from lib.cuckoo.core.data.guac_session import GuacSession # noqa: F401

_db = init_database(exists_ok=True)

Copilot · 2026-04-02T21:39:32Z

@@ -40,55 +130,41 @@ async def connect(self):
            guest_username = web_cfg.guacamole.username or ""
            guest_password = web_cfg.guacamole.password or ""


Several config values are coerced with int(...) and then or-defaulted (e.g., guacd_port = int(...) or 4822). If the config value is missing/None/empty, int() will raise and the connection will fail. Use a safer defaulting pattern (check for truthy value before casting, or catch (TypeError, ValueError)) for guacd_port/guest_width/guest_height/guest_rdp_port.

Copilot · 2026-04-02T21:39:32Z

+                channel_members[channel].append(normalized)
+                if not channel_has_records[channel]:
+                    # Read just the first 32 bytes to check the header
+                    header = zf.read(member)[:32]


zf.read(member)[:32] reads the entire EVTX file into memory and then slices it, which is expensive for large logs and repeated across channels. Use zf.open(member) and read only the needed header bytes (32) to determine whether records exist.

Suggested change

header = zf.read(member)[:32]

with zf.open(member) as member_fp:

header = member_fp.read(32)

Copilot · 2026-04-02T21:39:32Z

+    for channel, member_list in sorted(channel_members.items()):
+        if not channel_has_records.get(channel, False):
+            continue
+        member_list.sort()
+        members.append({
+            "member": member_list[0],
+            "members": member_list,
+            "channel": channel,
+        })


Snapshot member ordering is currently lexicographic (member_list.sort()), so after 9 snapshots you'll get 10_... ordered before 2_.... This can misorder snapshots in the UI and later parsing. Consider sorting by the numeric snapshot prefix when present (and then by filename) so snapshots remain chronological.

Copilot · 2026-04-02T21:39:33Z

+            channel = _evtx_member_display_name(member)
+            members_to_extract = []
+            for m in zf.namelist():
+                if _evtx_member_display_name(m) == channel and m.lower().endswith(".evtx"):
+                    members_to_extract.append(m)
+            if not members_to_extract:
+                raise ValueError(f"No EVTX members found for channel: {channel}")
+            members_to_extract.sort()



members_to_extract.sort() sorts snapshot-prefixed filenames lexicographically, which can put 10_*.evtx before 2_*.evtx and cause records to be chained out of chronological order. Sort using the numeric snapshot prefix (when present) to preserve snapshot ordering during parsing.

Copilot · 2026-04-02T21:39:33Z

+def _compile_evtx_search_pattern(search_query):
+    search_query = (search_query or "").strip()
+    if not search_query:
+        return None, ""
+
+    try:
+        return re.compile(search_query, re.IGNORECASE), ""
+    except re.error as e:
+        return None, str(e)


The regex search feature compiles and applies user-provided patterns against raw event JSON. When re2 isn't installed, Python's re is used, which is vulnerable to catastrophic backtracking (ReDoS) with crafted patterns and large inputs. Consider requiring re2 for this feature, or enforcing strict limits (pattern length, rejecting nested quantifiers) to prevent CPU exhaustion.

Copilot · 2026-04-02T21:39:33Z

+        try:
+            for channel in self.windows_logs:
+                safe_name = channel.replace("/", "%4") + ".evtx"
+                export_path = os.path.join(snapshot_dir, safe_name)
+                try:
+                    cmd = f'wevtutil epl "{channel}" "{export_path}" /ow:true'
+                    subprocess.call(cmd, startupinfo=self.startupinfo, timeout=30)
+                except Exception:
+                    pass
+
            self.wipe_windows_logs()
-            return True
-        return False
+            log.debug("Took evtx snapshot %d", self.snapshot_count)


take_snapshot() wipes all event logs unconditionally after attempting exports. If wevtutil epl fails for any/all channels, this will still clear the logs and permanently lose events (opposite of the goal of snapshots). Only wipe logs for channels that were successfully exported (or skip wiping entirely if the snapshot export failed).

Copilot · 2026-04-02T21:39:33Z

+        with zipfile.ZipFile(self.evtx_dump, "w", zipfile.ZIP_DEFLATED) as zip_obj:
+            if os.path.isdir(snapshot_base):
+                for snap_name in sorted(os.listdir(snapshot_base)):
+                    snap_dir = os.path.join(snapshot_base, snap_name)
+                    if not os.path.isdir(snap_dir):
+                        continue
+                    for evtx_file in os.listdir(snap_dir):
+                        if not evtx_file.lower().endswith(".evtx"):
+                            continue
+                        full_path = os.path.join(snap_dir, evtx_file)
+                        if os.path.getsize(full_path) == 0:
+                            continue
+                        # Add with snapshot number prefix to avoid name collisions
+                        arc_name = f"{snap_name}_{evtx_file}"
+                        try:
+                            zip_obj.write(full_path, arc_name)


Snapshot directories are merged using sorted(os.listdir(snapshot_base)), which is lexicographic and will order 10 before 2. This affects archive naming (arc_name) and can later lead to misordered snapshot processing. Sort snapshot directory names numerically when they are digits.

Copilot · 2026-04-02T21:39:33Z

+        # Load analyzer noise filter from shared config
+        _ANALYZER_PARENTS = set()
+        _ANALYZER_IMAGES = set()
+        try:
+            import os as _os
+            _cape_root = _os.path.dirname(_os.path.dirname(_os.path.dirname(_os.path.abspath(__file__))))
+            for _fp in ["data/sigma/filters_local.json", "data/sigma/filters.json"]:
+                _full = _os.path.join(_cape_root, _fp)
+                if _os.path.exists(_full):
+                    with open(_full) as _f:
+                        _data = json.load(_f)
+                    _pf = _data.get("pre_filters", {})
+                    for _p in _pf.get("exclude_parent_processes", []):
+                        _ANALYZER_PARENTS.add(_p.lower())
+                    for _p in _pf.get("exclude_image_processes", []):
+                        _ANALYZER_IMAGES.add(_p.lower())
+            _ANALYZER_PATHS = set()
+            for _fp2 in ["data/sigma/filters_local.json", "data/sigma/filters.json"]:
+                _full2 = _os.path.join(_cape_root, _fp2)
+                if _os.path.exists(_full2):
+                    with open(_full2) as _f2:
+                        _data2 = json.load(_f2)
+                    for _p in _data2.get("pre_filters", {}).get("exclude_target_paths", []):
+                        _ANALYZER_PATHS.add(_p.lower())
+        except Exception:


The analyzer-noise filter loading/parsing logic is duplicated in both _load_evtx_channel_page_cached() and _count_evtx_channel_events_cached(). This increases the risk of the two paths drifting (e.g., different defaults / keys) and makes the hot path harder to maintain. Consider extracting a shared helper (and possibly caching the parsed filter sets) so both code paths use exactly the same filter data without repeated file IO/JSON parsing.

- Use numeric sort for snapshot directories and filenames (fixes ordering at 10+) - Extract analyzer noise filter loading into shared helper function - Use db.view_task() instead of db.find_sample() for resubmit filename recovery - Remove redundant sorted() on evtx_paths (preserve pre-sorted order)

Removed unused import statement for itertools.

kevoreilly · 2026-04-09T22:35:28Z

Was hoping to merge but hit some more ruff errors:

Error: analyzer/windows/modules/auxiliary/evtx.py:1:8: F401 `itertools` imported but unused
Error: lib/cuckoo/core/database.py:229:55: F811 Redefinition of unused `GuacSession` from line 27

Addresses ruff F401 (evtx.py) and F811 (database.py) flagged in PR review.

wmetcalf added 3 commits April 2, 2026 21:31

Web: fix crispy_forms template loading and resubmit filename recovery

fe50008

- Add missing crispy_forms_tags load to account templates - Recover original filename from task DB on resubmit

Copilot AI review requested due to automatic review settings April 2, 2026 21:32

Copilot started reviewing on behalf of wmetcalf April 2, 2026 21:33 View session

wmetcalf mentioned this pull request Apr 2, 2026

Sigma: support EVTX periodic snapshots and analyzer noise filtering CAPESandbox/community#553

Merged

3 tasks

gemini-code-assist Bot reviewed Apr 2, 2026

View reviewed changes

Copilot AI reviewed Apr 2, 2026

View reviewed changes

wmetcalf and others added 3 commits April 2, 2026 22:13

Remove unused import of itertools

41716ba

Removed unused import statement for itertools.

Remove unused import of ElementTree

e2f88be

fix: remove unused itertools import and redundant GuacSession re-import

8d2a302

Addresses ruff F401 (evtx.py) and F811 (database.py) flagged in PR review.

kevoreilly merged commit a4f9fdf into kevoreilly:master Apr 10, 2026
4 checks passed

	for snap_name in sorted(os.listdir(snapshot_base)):
	for snap_name in sorted(os.listdir(snapshot_base), key=lambda x: int(x) if x.isdigit() else x):

	members_to_extract.sort()
	members_to_extract.sort(key=lambda x: int(x.split("_")[0]) if "_" in x and x.split("_")[0].isdigit() else x)

		@@ -40,55 +130,41 @@ async def connect(self):
		guest_username = web_cfg.guacamole.username or ""
		guest_password = web_cfg.guacamole.password or ""

	header = zf.read(member)[:32]
	with zf.open(member) as member_fp:
	header = member_fp.read(32)

Conversation

wmetcalf commented Apr 2, 2026

Summary

New files

Test plan

Note

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

wmetcalf Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

wmetcalf Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

wmetcalf Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

wmetcalf Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

wmetcalf Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

kevoreilly commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers