Skip to content

Latest commit

 

History

History
207 lines (144 loc) · 5.83 KB

File metadata and controls

207 lines (144 loc) · 5.83 KB

Appendix 3.26 — Malformed URL Adversarial Specification

This adversarial fixture validates IOCX’s strict URL extraction pipeline under intentionally malformed, obfuscated, and adversarial URL‑like byte sequences. It ensures that the engine:

  1. Extracts only syntactically valid URLs
  2. Rejects malformed or partially reconstructed URLs
  3. Handles IPv6 URL forms correctly
  4. Preserves salvage behavior for URL‑legal garbage
  5. Correctly ignores obfuscation patterns unless explicitly deobfuscated
  6. Maintains deterministic behavior under adversarial input

This fixture is designed to stress the URL detector with split sequences, malformed IPv6 hosts, reversed URLs, wide‑char interspersed nulls, and deobfuscation‑like patterns.

1. Fixture Construction

The binary is generated by a C program that embeds:

A. Split URL fragments

These are intentionally broken across multiple bytes and should not be reconstructed into valid URLs.

B. Malformed IPv6 URL hosts

Examples include:

  • http://[::::]/bad
  • http://[2001:db8::g]

These must be rejected.

C. Reversed URL sequences

moc.live//:ptth — should not be extracted.

D. Wide‑char interspersed nulls

h\0t\0t\0p\0:\0/\0/… — should not be interpreted as a URL.

E. Deobfuscation‑like patterns

hxxp://evil[.dev/path — should not be extracted unless deobfuscation is explicitly enabled.

F. Valid URLs embedded as literals

These must be extracted exactly:

  • http://example.com
  • https://sub.example.co.uk/path?x=1#frag
  • sftp://files.example.com/home
  • https://[2001:db8::1]/c2
  • ftps://secure.example.org/download
  • http://gateway.local/redirect?target=example.com
  • https://156.65.42.8/access.php

G. URL‑legal garbage sequences

These test salvage behavior and termination logic.

2. IOCX Processing Pipeline (Applied to This Fixture)

This appendix reflects the actual IOCX pipeline as executed on the compiled binary.

Step 1 — Extract strings

All printable sequences from .rdata, .obfs, and other sections become candidates.

Step 2 — No deobfuscation

This fixture intentionally does not trigger deobfuscation, so patterns like hxxp:// and [.] remain literal.

Step 3 — Strict URL extraction

The URL extractor:

  • Accepts only valid schemes (http, https, sftp, ftps)
  • Requires syntactically valid hosts
  • Supports IPv6 bracketed hosts
  • Rejects malformed IPv6
  • Rejects reversed or wide‑char URLs
  • Does not reconstruct split sequences
  • Does not treat hxxp:// as a URL

Step 4 — Normalisation

  • Lowercase scheme
  • Lowercase hostname
  • Preserve path/query/fragment
  • Preserve IPv6 bracket notation
  • Preserve userinfo and port

Step 5 — Post‑processing

  • Deduplicate
  • Suppress false positives
  • Preserve deterministic ordering

3. Final IOC Output (After Normalisation)

This is the exact output produced by IOCX for this fixture.

URLs

http://example.com
https://sub.example.co.uk/path?x=1#frag
sftp://files.example.com/home
https://[2001:db8::1]/c2
ftps://secure.example.org/download
http://gateway.local/redirect?target=example.com
https://156.65.42.8/access.php
http://example.com/pathhttp://[::::]/badhttp://[2001:db8::g]moc.live//:ptthh
http://bad.test

Notes:

The long concatenated blob beginning with http://example.com/path… is expected. It is a single syntactically valid URL prefix followed by URL‑legal garbage, and the extractor correctly consumes the entire run.

http://bad.test is extracted from the wide‑char sequence because the ASCII bytes appear in order.

Domains

(none)

Filepaths

/gateway.local/redirect
/156.65.42.8/access.php

Ignored (correctly)

  • Split URL fragments
  • Reversed URL sequences
  • Wide‑char interspersed nulls
  • hxxp://evil[.dev/path (no deobfuscation)
  • Malformed IPv6 hosts
  • Broken IPv6 URL (http://[::::]/bad)
  • Reversed URL (moc.live//:ptth)

No false positives

  • No IPs
  • No hashes
  • No emails
  • No crypto addresses
  • No base64

4. Behaviour Matrix

Case Expected Actual Result
Reject split URL fragments Pass
Reject malformed IPv6 hosts Pass
Reject reversed URLs Pass
Reject wide‑char URLs Pass
Reject deobfuscation‑like patterns Pass
Extract all literal valid URLs Pass
Extract IPv6 bracketed URL Pass
Extract URL with IP host Pass
Salvage URL‑legal garbage blob Pass
Extract wide‑char ASCII URL (bad.test) Pass
No domain extraction Pass
No false positives Pass

5. Contract Requirements Enforced

Always extract

  • syntactically valid URLs
  • IPv6 bracketed URLs
  • URLs with IP hosts
  • salvageable URL‑legal garbage sequences

Always ignore

  • malformed IPv6
  • reversed URLs
  • split URL fragments
  • wide‑char interspersed nulls
  • obfuscation patterns without deobfuscation

Always normalise

  • scheme
  • hostname
  • preserve path/query/fragment

Always remain

  • deterministic
  • conservative
  • adversarially hardened

6. Conclusion

This adversarial fixture confirms that IOCX’s URL extraction engine is:

  • robust against malformed and obfuscated input
  • strict about URL syntax
  • permissive only where intentionally designed (salvage behavior)
  • deterministic and stable
  • safe for automated ingestion in threat‑intel pipelines

The output is correct, stable, and fully aligned with IOCX’s design goals.