- File:
malformed_urls_adversarial.full.bin - Layer: 3
Adversarial
This adversarial fixture validates IOCX’s string‑based IOC extraction pipeline, including:
- String extraction
- Deobfuscation
- Strict URL/domain detection
- IOC‑safe normalisation
- Post‑processing (dedupe, suppression, ordering)
It is intentionally designed to stress the URL and domain detectors with malformed schemes, nested encodings, truncated hosts, and extremely long paths.
The binary is generated by the following C program:
- Writes broken schemes
- Writes valid URLs
- Writes nested and repeated encodings
- Writes truncated URLs
- Writes an extremely long but syntactically valid URL (~2500 chars)
This ensures coverage of:
- scheme validation
- host validation
- percent‑encoding handling
- traversal sequences
- long‑path robustness
- newline‑terminated URL extraction
This appendix reflects the actual IOCX pipeline:
All lines in the file become candidate text.
Patterns such as:
hxxp→http[.]→.(\.)→.[:]→:
are applied before URL extraction.
- Valid schemes only (
http,https) - Hostname must be syntactically valid
- Percent‑encoded paths preserved
- Truncated URLs rejected
- Domains extracted even from malformed schemes
- lowercase scheme
- lowercase hostname
- strip trailing dots
- preserve path/query/fragment
- preserve userinfo + port
- handle IPv6 correctly
- handle bare domains
- dedupe
- suppress false positives
- final JSON assembly
This is the true, final output produced by IOCX.
http://obfuscated.example.com
http://valid.example.com/path?param=value
https://sub.domain.example.org/index.html
http://example.com/%2525252e%252e/%252e/
https://example.com/path/%2e%2e/%2e%2e/
http://example.com/aaaa…aaaa?q=1 (full 2500‑character path preserved)
broken-scheme.example.com
htp://broken-scheme.example.com→ invalid schemehttp://example.→ incomplete TLDhttps://→ missing host
- no emails
- no IPs
- no filepaths
- no hashes
- no crypto addresses
- no base64
This behaviour is exactly what a hardened IOC extractor should produce.
| Case | Expected | Actual | Result |
|---|---|---|---|
Deobfuscate hxxp:// → http:// |
✔ | ✔ | Pass |
Reject invalid scheme htp:// |
✔ | ✔ | Pass |
| Extract valid URLs | ✔ | ✔ | Pass |
| Extract nested‑encoded URLs | ✔ | ✔ | Pass |
| Extract traversal‑encoded URLs | ✔ | ✔ | Pass |
| Ignore truncated URLs | ✔ | ✔ | Pass |
| Extract extremely long URL | ✔ | ✔ | Pass |
| Extract domain from malformed scheme | ✔ | ✔ | Pass |
| No false positives | ✔ | ✔ | Pass |
- syntactically valid URLs
- deobfuscated URLs
- nested‑encoded URLs
- traversal‑encoded URLs
- extremely long URLs
- scheme → lowercase
- hostname → lowercase
- strip trailing dots
- preserve path/query/fragment
- preserve userinfo + port
- invalid schemes
- truncated URLs
- incomplete hostnames
- deterministic
- encoding‑aware
- newline‑aware
- non‑hallucinatory
This adversarial fixture confirms that IOCX’s URL extraction pipeline is:
- robust
- conservative
- deterministic
- adversarially hardened
- safe for automated threat‑intel ingestion
The output is correct, stable, and fully aligned with the engine’s design goals.