- File:
filepaths_strings_adversarial.full.bin - Layer: 3 —
Adversarial
This fixture exercises IOCX’s filepath extractor against a mix of:
- valid Windows, UNC, Unix, relative, tilde, and env‑var paths
- split‑line paths
- URL‑like strings
- log keys and garbage with path‑like fragments
The extractor is intentionally permissive and syntax‑driven: any substring that looks like a path according to its patterns is extracted, even if it is only a fragment (e.g. split across lines or truncated before a space).
The following categories must be extracted as filepaths:
C:\Users\Public\document.txtD:\Program Files\App\bin.exeC:\Windows\System32\cmd.exeC:\Windows\System32\wscript.exeC:\Windows\System32\mshta.exeC:\Windows\System32evil(syntactically valid, no extension required)
\\server01\share\folder\file.log\\10.0.0.5\data$\dump.bin
/usr/local/bin/script.sh/opt/app/config.yaml/usr/bin/python3.11/usr/bin/openssl(no extension, still treated as a valid path)
.\temp\run.cmd../logs/error.log
~/projects/code/main.py~user/docs/readme.md%APPDATA%\MyApp\config.json$HOME/.config/tool/settings.ini
For these inputs:
C:\Users\Pubn\lic\broken.txt
/usr/loc\nal/bin/bad.sh
the extractor matches the first syntactically valid fragment on each split:
C:\Users\Pub/usr/loc
This behaviour is intentional: the extractor does not reconstruct across newlines; it simply extracts what looks like a path up to the break.
For:
C:\Temp\my file.txt
/var/log/my file.log
the extractor stops at the first space and extracts:
C:\Temp\my/var/log/my
Spaces are treated as hard terminators for filepath tokens.
The following inputs must not be classified as filepaths:
network.connection.errorauth.failure.reason- dotted log keys, no leading drive/UNC/tilde/slash
xxx/usr/local/binxxx- embedded path‑like fragment inside a larger token
http://example.com/path/file.txt(classified as a URL, not a filepath; appears under urls)
The filepath extractor:
- accepts Windows, UNC, Unix, relative, tilde, and env‑var styles
- does not require file extensions
- allows executables and directories with no extension
- treats spaces as terminators for path tokens
- does not reconstruct paths across newlines, but does extract valid leading fragments
- ignores embedded path‑like substrings inside larger tokens
- defers URL‑like strings to the URL detector
This permissive, syntax‑first behaviour is intentional and matches real‑world DFIR expectations: extract anything that looks like a path, even if it’s partial, and let higher layers decide how to use it.