Skip to content

Latest commit

 

History

History
169 lines (122 loc) · 4.14 KB

File metadata and controls

169 lines (122 loc) · 4.14 KB

Appendix 3.19 — Malformed URLs Adversarial Fixture

  • File: malformed_urls_adversarial.full.bin
  • Layer: 3 Adversarial

This adversarial fixture validates IOCX’s string‑based IOC extraction pipeline, including:

  1. String extraction
  2. Deobfuscation
  3. Strict URL/domain detection
  4. IOC‑safe normalisation
  5. Post‑processing (dedupe, suppression, ordering)

It is intentionally designed to stress the URL and domain detectors with malformed schemes, nested encodings, truncated hosts, and extremely long paths.

1. Fixture Construction

The binary is generated by the following C program:

  • Writes broken schemes
  • Writes valid URLs
  • Writes nested and repeated encodings
  • Writes truncated URLs
  • Writes an extremely long but syntactically valid URL (~2500 chars)

This ensures coverage of:

  • scheme validation
  • host validation
  • percent‑encoding handling
  • traversal sequences
  • long‑path robustness
  • newline‑terminated URL extraction

2. IOCX Processing Pipeline (Applied to This Fixture)

This appendix reflects the actual IOCX pipeline:

Step 1 — Extract strings

All lines in the file become candidate text.

Step 2 — Deobfuscate text

Patterns such as:

  • hxxphttp
  • [.].
  • (\.).
  • [:]:

are applied before URL extraction.

Step 3 — Extract strict URLs and domains

  • Valid schemes only (http, https)
  • Hostname must be syntactically valid
  • Percent‑encoded paths preserved
  • Truncated URLs rejected
  • Domains extracted even from malformed schemes

Step 4 — Normalise

  • lowercase scheme
  • lowercase hostname
  • strip trailing dots
  • preserve path/query/fragment
  • preserve userinfo + port
  • handle IPv6 correctly
  • handle bare domains

Step 5 — Post‑process

  • dedupe
  • suppress false positives
  • final JSON assembly

3. Final IOC Output (After Deobfuscation + Normalisation)

This is the true, final output produced by IOCX.

URLs

http://obfuscated.example.com
http://valid.example.com/path?param=value
https://sub.domain.example.org/index.html
http://example.com/%2525252e%252e/%252e/
https://example.com/path/%2e%2e/%2e%2e/
http://example.com/aaaa…aaaa?q=1 (full 2500‑character path preserved)

Domains

broken-scheme.example.com

Ignored (correctly)

  • htp://broken-scheme.example.com → invalid scheme
  • http://example. → incomplete TLD
  • https:// → missing host

No false positives

  • no emails
  • no IPs
  • no filepaths
  • no hashes
  • no crypto addresses
  • no base64

This behaviour is exactly what a hardened IOC extractor should produce.

4. Behaviour Matrix

Case Expected Actual Result
Deobfuscate hxxp://http:// Pass
Reject invalid scheme htp:// Pass
Extract valid URLs Pass
Extract nested‑encoded URLs Pass
Extract traversal‑encoded URLs Pass
Ignore truncated URLs Pass
Extract extremely long URL Pass
Extract domain from malformed scheme Pass
No false positives Pass

5. Contract Requirements Enforced

Always extract

  • syntactically valid URLs
  • deobfuscated URLs
  • nested‑encoded URLs
  • traversal‑encoded URLs
  • extremely long URLs

Always normalise

  • scheme → lowercase
  • hostname → lowercase
  • strip trailing dots
  • preserve path/query/fragment
  • preserve userinfo + port

Always ignore

  • invalid schemes
  • truncated URLs
  • incomplete hostnames

Always remain

  • deterministic
  • encoding‑aware
  • newline‑aware
  • non‑hallucinatory

6. Conclusion

This adversarial fixture confirms that IOCX’s URL extraction pipeline is:

  • robust
  • conservative
  • deterministic
  • adversarially hardened
  • safe for automated threat‑intel ingestion

The output is correct, stable, and fully aligned with the engine’s design goals.