Skip to content

Guac session auth, EVTX periodic snapshots, and web fixes#2964

Merged
kevoreilly merged 7 commits into
kevoreilly:masterfrom
wmetcalf:feature/guac-auth-evtx-snapshots
Apr 10, 2026
Merged

Guac session auth, EVTX periodic snapshots, and web fixes#2964
kevoreilly merged 7 commits into
kevoreilly:masterfrom
wmetcalf:feature/guac-auth-evtx-snapshots

Conversation

@wmetcalf
Copy link
Copy Markdown
Contributor

@wmetcalf wmetcalf commented Apr 2, 2026

Summary

  • Guac session security: Replace client-side VNC port/IP with server-side libvirt lookup and database-backed session tokens. WebSocket consumer validates session cookie, verifies task is running, and auto-disconnects on task completion. VNC display scales to fit browser viewport.
  • EVTX periodic snapshots: Collect EVTX snapshots every 30s during analysis to protect against malware clearing event logs. Web viewer groups snapshots by channel with regex search filtering and analyzer noise removal.
  • Web fixes: Add missing crispy_forms_tags to account templates; recover original filename on task resubmit.

New files

  • lib/cuckoo/core/data/guac_session.py — GuacSession SQLAlchemy model (table auto-created on startup via create_all)

Test plan

  • Open interactive desktop session — verify VNC connects with no IP/port in browser URL or JS
  • Confirm VNC auto-disconnects when task completes
  • Submit malware that clears event logs — verify snapshots preserve pre-cleared events
  • Test event log search filtering in web UI
  • Verify account pages (password change/reset, email) render without template errors
  • Resubmit a task and confirm original filename is preserved

Note

The companion modules/processing/sigma.py changes for multi-snapshot EVTX support will be submitted as a separate PR against CAPEsandbox/community.

wmetcalf added 3 commits April 2, 2026 21:31
- Replace client-side VNC port/IP passing with server-side libvirt
  lookup and database-backed session tokens
- Add GuacSession model for tying browser sessions to task lifecycle
- WebSocket consumer validates session cookie, checks task is running,
  and auto-disconnects when task ends
- Remove guest_ip/vncport from JS client (no longer sent to browser)
- Scale VNC display to fit browser viewport with correct mouse mapping
- Collect EVTX snapshots every 30s to protect against log clearing
- Add regex search filtering for event log viewer
- Group snapshot files by channel with proper EVTX header validation
- Filter analyzer process noise from displayed events
- Add missing crispy_forms_tags load to account templates
- Recover original filename from task DB on resubmit
Copilot AI review requested due to automatic review settings April 2, 2026 21:32
wmetcalf added a commit to wmetcalf/community that referenced this pull request Apr 2, 2026
- Handle multiple evtx zip files (snapshots + final) instead of single evtx.zip
- Extract each snapshot into separate subdirectory for zircolite processing
- Add evtx_dump-based pre-filtering to strip analyzer process noise
- Load exclude patterns from filters.json pre_filters config
- Formatting fixes (black/ruff compliance)

Companion to kevoreilly/CAPEv2#2964 which adds the periodic EVTX
snapshot collection in the guest analyzer.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces periodic EVTX snapshots in the Windows analyzer to protect against log wiping by malware and adds regex search capabilities for event logs in the web UI. It also implements a more secure Guacamole session management system using database-backed tokens tied to task lifecycles. Key feedback includes addressing lexicographical sorting issues for snapshots that would break chronological order after nine snapshots, refactoring duplicated noise filter logic to improve performance, and using more appropriate database API calls for retrieving task metadata.


with zipfile.ZipFile(self.evtx_dump, "w", zipfile.ZIP_DEFLATED) as zip_obj:
if os.path.isdir(snapshot_base):
for snap_name in sorted(os.listdir(snapshot_base)):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Sorting directory names lexicographically will result in an incorrect chronological order when the snapshot count exceeds 9 (e.g., '10' will come before '2'). This affects the order in which events are merged into the final archive and subsequently displayed in the web UI.

Suggested change
for snap_name in sorted(os.listdir(snapshot_base)):
for snap_name in sorted(os.listdir(snapshot_base), key=lambda x: int(x) if x.isdigit() else x):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4a93260 — using numeric sort key.

Comment thread web/analysis/views.py Outdated
members_to_extract.append(m)
if not members_to_extract:
raise ValueError(f"No EVTX members found for channel: {channel}")
members_to_extract.sort()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Lexicographical sorting of snapshot-prefixed filenames will cause events to be chained in the wrong order once the snapshot count reaches double digits (e.g., 10_Security.evtx will be processed before 2_Security.evtx). A numeric sort based on the snapshot prefix should be used to maintain chronological integrity.

Suggested change
members_to_extract.sort()
members_to_extract.sort(key=lambda x: int(x.split("_")[0]) if "_" in x and x.split("_")[0].isdigit() else x)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4a93260 — numeric sort on snapshot prefix.

Comment thread web/analysis/views.py Outdated
Comment on lines +736 to +739
evtx_paths = sorted(
os.path.join(tmpdir, m) for m in members_to_extract
if os.path.exists(os.path.join(tmpdir, m))
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This sorted() call is redundant if members_to_extract is already sorted correctly. Furthermore, it re-introduces the lexicographical sorting issue on the full file paths. It should be replaced with a simple list comprehension that preserves the existing order.

Suggested change
evtx_paths = sorted(
os.path.join(tmpdir, m) for m in members_to_extract
if os.path.exists(os.path.join(tmpdir, m))
)
evtx_paths = [
os.path.join(tmpdir, m) for m in members_to_extract
if os.path.exists(os.path.join(tmpdir, m))
]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4a93260 — replaced with list comprehension preserving pre-sorted order.

Comment thread web/analysis/views.py Outdated
Comment on lines +755 to +784
# Load analyzer noise filter from shared config
_ANALYZER_PARENTS = set()
_ANALYZER_IMAGES = set()
try:
import os as _os
_cape_root = _os.path.dirname(_os.path.dirname(_os.path.dirname(_os.path.abspath(__file__))))
for _fp in ["data/sigma/filters_local.json", "data/sigma/filters.json"]:
_full = _os.path.join(_cape_root, _fp)
if _os.path.exists(_full):
with open(_full) as _f:
_data = json.load(_f)
_pf = _data.get("pre_filters", {})
for _p in _pf.get("exclude_parent_processes", []):
_ANALYZER_PARENTS.add(_p.lower())
for _p in _pf.get("exclude_image_processes", []):
_ANALYZER_IMAGES.add(_p.lower())
_ANALYZER_PATHS = set()
for _fp2 in ["data/sigma/filters_local.json", "data/sigma/filters.json"]:
_full2 = _os.path.join(_cape_root, _fp2)
if _os.path.exists(_full2):
with open(_full2) as _f2:
_data2 = json.load(_f2)
for _p in _data2.get("pre_filters", {}).get("exclude_target_paths", []):
_ANALYZER_PATHS.add(_p.lower())
except Exception:
pass
if not _ANALYZER_PARENTS:
_ANALYZER_PARENTS = {"icacls.exe", "python.exe", "wevtutil.exe"}
if not _ANALYZER_IMAGES:
_ANALYZER_IMAGES = {"wevtutil.exe", "conhost.exe"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for loading analyzer noise filters from JSON files is duplicated in _load_evtx_channel_page_cached and _count_evtx_channel_events_cached. It also performs redundant file I/O and manual path calculations instead of using the existing CUCKOO_ROOT constant. This should be refactored into a helper function or a module-level cached property to improve maintainability and performance.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4a93260 — extracted into _load_evtx_noise_filters() helper using CUCKOO_ROOT.

Comment thread web/submission/views.py
Comment on lines +496 to +502
if task_id:
resubmit_tasks = db.find_sample(task_id=task_id)
if resubmit_tasks:
original_target = resubmit_tasks[0].to_dict().get("target", "")
if original_target:
original_filename = sanitize_filename(os.path.basename(original_target))
filename = base_dir + "/" + (original_filename or sanitize_filename(hash))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using db.find_sample(task_id=task_id) to retrieve the original filename of a task is suboptimal. In the standard database API, db.view_task(task_id) should be used to get the Task object, which contains the target field. Additionally, find_sample typically returns Sample objects which do not have a target attribute in the standard schema.

                    if task_id:
                        task = db.view_task(task_id)
                        if task and task.target:
                            original_filename = sanitize_filename(os.path.basename(task.target))
                    filename = base_dir + "/" + (original_filename or sanitize_filename(hash))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4a93260 — switched to db.view_task().

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances CAPEv2’s interactive Guacamole desktop access and Windows EVTX log retention/viewing by moving VNC connection details server-side with DB-backed session tokens, adding periodic EVTX snapshotting during analysis, and addressing a few web UI regressions.

Changes:

  • Add DB-backed Guacamole session tokens validated by the WebSocket consumer, with server-side libvirt VNC port lookup.
  • Implement periodic (30s) EVTX snapshots on the Windows analyzer and update the web EVTX viewer to group snapshot files by channel and support regex filtering + analyzer-noise removal.
  • Fix account templates missing crispy_forms_tags and preserve original filenames on task resubmit.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
web/web/urls.py Routes /guac/ into the main Django URL configuration.
web/web/settings.py Enables Channels/Guac app in the main Django settings.
web/web/guac_settings.py Initializes CAPE DB for guac ASGI settings and attempts table creation.
web/templates/analysis/eventlogs/index.html Adds client-side regex search UI and passes search to the EVTX loader endpoint.
web/templates/analysis/eventlogs/_evtx_channel.html Displays active regex filter in the EVTX channel header.
web/templates/account/password_reset.html Loads crispy_forms_tags to prevent template errors.
web/templates/account/password_reset_from_key.html Loads crispy_forms_tags to prevent template errors.
web/templates/account/password_change.html Loads crispy_forms_tags to prevent template errors.
web/templates/account/email.html Loads crispy_forms_tags to prevent template errors.
web/submission/views.py Attempts to recover original filename during task resubmission.
web/static/js/guac-main.js Removes client-side guest IP/VNC port usage; connects using session-only params.
web/guac/views.py Creates DB token + cookie for guac session and renders guac page without exposing VNC details.
web/guac/templates/guac/index.html Updates JS invocation to match new GuacMe() signature.
web/guac/consumers.py Validates cookie token, checks task running, looks up VNC port via libvirt, and auto-disconnects.
web/analysis/views.py Groups snapshot-prefixed EVTX files by channel; adds regex filtering and analyzer-noise filtering.
lib/cuckoo/core/database.py Adds DB helper methods for creating/reading/deleting guac session tokens.
lib/cuckoo/core/data/guac_session.py Introduces GuacSession SQLAlchemy model.
analyzer/windows/modules/auxiliary/evtx.py Implements periodic EVTX snapshot export/wipe and merges snapshots into evtx.zip on stop.

Comment on lines +227 to +238
def create_guac_session(self, token, task_id, vm_label, guest_ip):
"""Create a new guac session for a task."""
from lib.cuckoo.core.data.guac_session import GuacSession
session = self.session()
try:
guac = GuacSession(token=str(token), task_id=task_id, vm_label=vm_label, guest_ip=guest_ip)
session.add(guac)
session.commit()
return guac
except Exception:
session.rollback()
raise
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Guac session helpers use a new scoped session instance and call commit()/rollback() directly. This can break when these helpers are called from Django views wrapped by DBTransactionMiddleware (which already manages a transaction with db.session.begin()), potentially causing invalid transaction state/errors. Prefer using the existing thread-local self.session within a nested transaction (e.g., begin_nested()) and avoid explicit commits here, letting the outer transaction middleware commit; also ensure the session is properly removed/closed when used outside request middleware (e.g., from async consumers).

Copilot uses AI. Check for mistakes.
Comment thread web/web/guac_settings.py
Comment on lines +175 to +182
_db = init_database(exists_ok=True)

# Create guac_sessions table if guacamole is enabled
if _CapeConfig('web').guacamole.get('enabled', False):
from lib.cuckoo.core.data.guac_session import GuacSession # noqa: F401
from lib.cuckoo.core.data.db_common import Base
Base.metadata.create_all(_db.engine)

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This settings module performs CAPE DB initialization and then calls Base.metadata.create_all() again. _Database.__init__() already calls Base.metadata.create_all(self.engine), so this is redundant and adds extra side effects at settings-import time (and can race in multi-worker startup). Consider removing the extra create_all() here and rely on init_database() (with the GuacSession model imported) to create the table.

Suggested change
_db = init_database(exists_ok=True)
# Create guac_sessions table if guacamole is enabled
if _CapeConfig('web').guacamole.get('enabled', False):
from lib.cuckoo.core.data.guac_session import GuacSession # noqa: F401
from lib.cuckoo.core.data.db_common import Base
Base.metadata.create_all(_db.engine)
# Import GuacSession before database initialization so init_database()
# creates the table as part of the normal metadata setup.
if _CapeConfig('web').guacamole.get('enabled', False):
from lib.cuckoo.core.data.guac_session import GuacSession # noqa: F401
_db = init_database(exists_ok=True)

Copilot uses AI. Check for mistakes.
Comment thread web/guac/consumers.py
Comment on lines 123 to 131
@@ -40,55 +130,41 @@ async def connect(self):
guest_username = web_cfg.guacamole.username or ""
guest_password = web_cfg.guacamole.password or ""
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several config values are coerced with int(...) and then or-defaulted (e.g., guacd_port = int(...) or 4822). If the config value is missing/None/empty, int() will raise and the connection will fail. Use a safer defaulting pattern (check for truthy value before casting, or catch (TypeError, ValueError)) for guacd_port/guest_width/guest_height/guest_rdp_port.

Copilot uses AI. Check for mistakes.
Comment thread web/analysis/views.py
channel_members[channel].append(normalized)
if not channel_has_records[channel]:
# Read just the first 32 bytes to check the header
header = zf.read(member)[:32]
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zf.read(member)[:32] reads the entire EVTX file into memory and then slices it, which is expensive for large logs and repeated across channels. Use zf.open(member) and read only the needed header bytes (32) to determine whether records exist.

Suggested change
header = zf.read(member)[:32]
with zf.open(member) as member_fp:
header = member_fp.read(32)

Copilot uses AI. Check for mistakes.
Comment thread web/analysis/views.py
Comment on lines +682 to +690
for channel, member_list in sorted(channel_members.items()):
if not channel_has_records.get(channel, False):
continue
member_list.sort()
members.append({
"member": member_list[0],
"members": member_list,
"channel": channel,
})
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot member ordering is currently lexicographic (member_list.sort()), so after 9 snapshots you'll get 10_... ordered before 2_.... This can misorder snapshots in the UI and later parsing. Consider sorting by the numeric snapshot prefix when present (and then by filename) so snapshots remain chronological.

Copilot uses AI. Check for mistakes.
Comment thread web/analysis/views.py
Comment on lines +716 to 724
channel = _evtx_member_display_name(member)
members_to_extract = []
for m in zf.namelist():
if _evtx_member_display_name(m) == channel and m.lower().endswith(".evtx"):
members_to_extract.append(m)
if not members_to_extract:
raise ValueError(f"No EVTX members found for channel: {channel}")
members_to_extract.sort()

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

members_to_extract.sort() sorts snapshot-prefixed filenames lexicographically, which can put 10_*.evtx before 2_*.evtx and cause records to be chained out of chronological order. Sort using the numeric snapshot prefix (when present) to preserve snapshot ordering during parsing.

Copilot uses AI. Check for mistakes.
Comment thread web/analysis/views.py
Comment on lines +621 to +629
def _compile_evtx_search_pattern(search_query):
search_query = (search_query or "").strip()
if not search_query:
return None, ""

try:
return re.compile(search_query, re.IGNORECASE), ""
except re.error as e:
return None, str(e)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex search feature compiles and applies user-provided patterns against raw event JSON. When re2 isn't installed, Python's re is used, which is vulnerable to catastrophic backtracking (ReDoS) with crafted patterns and large inputs. Consider requiring re2 for this feature, or enforcing strict limits (pattern length, rejecting nested quantifiers) to prevent CPU exhaustion.

Copilot uses AI. Check for mistakes.
Comment on lines +266 to +277
try:
for channel in self.windows_logs:
safe_name = channel.replace("/", "%4") + ".evtx"
export_path = os.path.join(snapshot_dir, safe_name)
try:
cmd = f'wevtutil epl "{channel}" "{export_path}" /ow:true'
subprocess.call(cmd, startupinfo=self.startupinfo, timeout=30)
except Exception:
pass

self.wipe_windows_logs()
return True
return False
log.debug("Took evtx snapshot %d", self.snapshot_count)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take_snapshot() wipes all event logs unconditionally after attempting exports. If wevtutil epl fails for any/all channels, this will still clear the logs and permanently lose events (opposite of the goal of snapshots). Only wipe logs for channels that were successfully exported (or skip wiping entirely if the snapshot export failed).

Copilot uses AI. Check for mistakes.
Comment on lines +318 to +333
with zipfile.ZipFile(self.evtx_dump, "w", zipfile.ZIP_DEFLATED) as zip_obj:
if os.path.isdir(snapshot_base):
for snap_name in sorted(os.listdir(snapshot_base)):
snap_dir = os.path.join(snapshot_base, snap_name)
if not os.path.isdir(snap_dir):
continue
for evtx_file in os.listdir(snap_dir):
if not evtx_file.lower().endswith(".evtx"):
continue
full_path = os.path.join(snap_dir, evtx_file)
if os.path.getsize(full_path) == 0:
continue
# Add with snapshot number prefix to avoid name collisions
arc_name = f"{snap_name}_{evtx_file}"
try:
zip_obj.write(full_path, arc_name)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot directories are merged using sorted(os.listdir(snapshot_base)), which is lexicographic and will order 10 before 2. This affects archive naming (arc_name) and can later lead to misordered snapshot processing. Sort snapshot directory names numerically when they are digits.

Copilot uses AI. Check for mistakes.
Comment thread web/analysis/views.py Outdated
Comment on lines +755 to +779
# Load analyzer noise filter from shared config
_ANALYZER_PARENTS = set()
_ANALYZER_IMAGES = set()
try:
import os as _os
_cape_root = _os.path.dirname(_os.path.dirname(_os.path.dirname(_os.path.abspath(__file__))))
for _fp in ["data/sigma/filters_local.json", "data/sigma/filters.json"]:
_full = _os.path.join(_cape_root, _fp)
if _os.path.exists(_full):
with open(_full) as _f:
_data = json.load(_f)
_pf = _data.get("pre_filters", {})
for _p in _pf.get("exclude_parent_processes", []):
_ANALYZER_PARENTS.add(_p.lower())
for _p in _pf.get("exclude_image_processes", []):
_ANALYZER_IMAGES.add(_p.lower())
_ANALYZER_PATHS = set()
for _fp2 in ["data/sigma/filters_local.json", "data/sigma/filters.json"]:
_full2 = _os.path.join(_cape_root, _fp2)
if _os.path.exists(_full2):
with open(_full2) as _f2:
_data2 = json.load(_f2)
for _p in _data2.get("pre_filters", {}).get("exclude_target_paths", []):
_ANALYZER_PATHS.add(_p.lower())
except Exception:
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyzer-noise filter loading/parsing logic is duplicated in both _load_evtx_channel_page_cached() and _count_evtx_channel_events_cached(). This increases the risk of the two paths drifting (e.g., different defaults / keys) and makes the hot path harder to maintain. Consider extracting a shared helper (and possibly caching the parsed filter sets) so both code paths use exactly the same filter data without repeated file IO/JSON parsing.

Copilot uses AI. Check for mistakes.
wmetcalf and others added 3 commits April 2, 2026 22:13
- Use numeric sort for snapshot directories and filenames (fixes ordering at 10+)
- Extract analyzer noise filter loading into shared helper function
- Use db.view_task() instead of db.find_sample() for resubmit filename recovery
- Remove redundant sorted() on evtx_paths (preserve pre-sorted order)
Removed unused import statement for itertools.
@kevoreilly
Copy link
Copy Markdown
Owner

Was hoping to merge but hit some more ruff errors:

Error: analyzer/windows/modules/auxiliary/evtx.py:1:8: F401 `itertools` imported but unused
Error: lib/cuckoo/core/database.py:229:55: F811 Redefinition of unused `GuacSession` from line 27

Addresses ruff F401 (evtx.py) and F811 (database.py) flagged in PR review.
@kevoreilly kevoreilly merged commit a4f9fdf into kevoreilly:master Apr 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants