This adversarial fixture validates IOCX’s strict URL extraction pipeline under intentionally malformed, obfuscated, and adversarial URL‑like byte sequences. It ensures that the engine:
- Extracts only syntactically valid URLs
- Rejects malformed or partially reconstructed URLs
- Handles IPv6 URL forms correctly
- Preserves salvage behavior for URL‑legal garbage
- Correctly ignores obfuscation patterns unless explicitly deobfuscated
- Maintains deterministic behavior under adversarial input
This fixture is designed to stress the URL detector with split sequences, malformed IPv6 hosts, reversed URLs, wide‑char interspersed nulls, and deobfuscation‑like patterns.
The binary is generated by a C program that embeds:
These are intentionally broken across multiple bytes and should not be reconstructed into valid URLs.
Examples include:
http://[::::]/badhttp://[2001:db8::g]
These must be rejected.
moc.live//:ptth — should not be extracted.
h\0t\0t\0p\0:\0/\0/… — should not be interpreted as a URL.
hxxp://evil[.dev/path — should not be extracted unless deobfuscation is explicitly enabled.
These must be extracted exactly:
http://example.comhttps://sub.example.co.uk/path?x=1#fragsftp://files.example.com/homehttps://[2001:db8::1]/c2ftps://secure.example.org/downloadhttp://gateway.local/redirect?target=example.comhttps://156.65.42.8/access.php
These test salvage behavior and termination logic.
This appendix reflects the actual IOCX pipeline as executed on the compiled binary.
All printable sequences from .rdata, .obfs, and other sections become candidates.
This fixture intentionally does not trigger deobfuscation, so patterns like hxxp:// and [.] remain literal.
The URL extractor:
- Accepts only valid schemes (
http,https,sftp,ftps) - Requires syntactically valid hosts
- Supports IPv6 bracketed hosts
- Rejects malformed IPv6
- Rejects reversed or wide‑char URLs
- Does not reconstruct split sequences
- Does not treat
hxxp://as a URL
- Lowercase scheme
- Lowercase hostname
- Preserve path/query/fragment
- Preserve IPv6 bracket notation
- Preserve userinfo and port
- Deduplicate
- Suppress false positives
- Preserve deterministic ordering
This is the exact output produced by IOCX for this fixture.
http://example.com
https://sub.example.co.uk/path?x=1#frag
sftp://files.example.com/home
https://[2001:db8::1]/c2
ftps://secure.example.org/download
http://gateway.local/redirect?target=example.com
https://156.65.42.8/access.php
http://example.com/pathhttp://[::::]/badhttp://[2001:db8::g]moc.live//:ptthh
http://bad.test
The long concatenated blob beginning with http://example.com/path… is expected.
It is a single syntactically valid URL prefix followed by URL‑legal garbage, and the extractor correctly consumes the entire run.
http://bad.test is extracted from the wide‑char sequence because the ASCII bytes appear in order.
(none)
/gateway.local/redirect
/156.65.42.8/access.php
- Split URL fragments
- Reversed URL sequences
- Wide‑char interspersed nulls
hxxp://evil[.dev/path(no deobfuscation)- Malformed IPv6 hosts
- Broken IPv6 URL (
http://[::::]/bad) - Reversed URL (
moc.live//:ptth)
- No IPs
- No hashes
- No emails
- No crypto addresses
- No base64
| Case | Expected | Actual | Result |
|---|---|---|---|
| Reject split URL fragments | ✔ | ✔ | Pass |
| Reject malformed IPv6 hosts | ✔ | ✔ | Pass |
| Reject reversed URLs | ✔ | ✔ | Pass |
| Reject wide‑char URLs | ✔ | ✔ | Pass |
| Reject deobfuscation‑like patterns | ✔ | ✔ | Pass |
| Extract all literal valid URLs | ✔ | ✔ | Pass |
| Extract IPv6 bracketed URL | ✔ | ✔ | Pass |
| Extract URL with IP host | ✔ | ✔ | Pass |
| Salvage URL‑legal garbage blob | ✔ | ✔ | Pass |
| Extract wide‑char ASCII URL (bad.test) | ✔ | ✔ | Pass |
| No domain extraction | ✔ | ✔ | Pass |
| No false positives | ✔ | ✔ | Pass |
- syntactically valid URLs
- IPv6 bracketed URLs
- URLs with IP hosts
- salvageable URL‑legal garbage sequences
- malformed IPv6
- reversed URLs
- split URL fragments
- wide‑char interspersed nulls
- obfuscation patterns without deobfuscation
- scheme
- hostname
- preserve path/query/fragment
- deterministic
- conservative
- adversarially hardened
This adversarial fixture confirms that IOCX’s URL extraction engine is:
- robust against malformed and obfuscated input
- strict about URL syntax
- permissive only where intentionally designed (salvage behavior)
- deterministic and stable
- safe for automated ingestion in threat‑intel pipelines
The output is correct, stable, and fully aligned with IOCX’s design goals.