Same file, same output, every time.
This document defines the layered, snapshot‑based testing strategy that ensures IOCX remains deterministic, stable, and predictable across versions.
This document defines the testing strategy for deterministic PE parsing, heuristic evaluation, IOC extraction, and schema normalisation within IOCX.
Deterministic IOC extraction is a foundational requirement for reproducible malware analysis, longitudinal threat intelligence, and automated security pipelines. Variability in extraction results—whether caused by nondeterministic parsing, heuristic drift, or environment‑dependent behaviour—introduces noise that propagates through downstream systems, undermining correlation, deduplication, and historical comparison. To address this, IOCX adopts a contract‑safe testing model in which each binary is treated as an immutable input–output pair. Once a file enters the test suite, its complete structured output is frozen as a golden snapshot. Any deviation from this snapshot is treated as a contract violation unless explicitly reviewed and approved. This approach ensures that the IOCX pipeline remains stable across code changes, dependency updates, and heuristic refinements. By enforcing deterministic behaviour at every stage—PE parsing, heuristic evaluation, IOC extraction, and schema normalisation—IOCX provides a reproducible analytical foundation suitable for research, automation, and long‑term threat intelligence operations.
Deterministic IOC extraction is critical for reliable threat intelligence, automated triage, and reproducible malware analysis. However, many commercial and open‑source tools exhibit nondeterministic behaviour due to heuristic instability, inconsistent parsing logic, environment‑specific dependencies, or silent updates that alter output formats. These inconsistencies lead to divergent results for identical inputs, breaking correlation pipelines, invalidating baselines, and eroding analyst trust. IOCX addresses this systemic problem through a contract‑safe testing strategy that treats each binary as a fixed behavioural contract. Once a sample is added to the test suite, its full structured output is captured as a golden snapshot and must remain stable across all future versions of the tool. Any deviation—whether caused by code changes, library upgrades, or heuristic adjustments—is flagged as a contract violation unless explicitly approved. This methodology ensures reproducibility, guards against regression, and provides a stable analytical substrate that other tools often fail to guarantee. By formalising determinism as a first‑class requirement, IOCX avoids the common pitfalls of heuristic drift and nondeterministic extraction, delivering consistent results suitable for long‑term operational use.
Contract-safe testing is split into four distinct layers. The following sections formalise the layered testing model and describe how IOCX enforces deterministic behaviour across all classes of inputs.
Layer 1 exists to guarantee that IOCX’s fundamental behaviour is stable, predictable, and correct under normal operating conditions. These inputs are intentionally simple, well‑formed, and representative of the kinds of binaries encountered in everyday triage workflows. The goal is not to test edge cases or adversarial conditions, but to ensure that the core extraction engine, metadata pipeline, and section‑level analysis behave deterministically when the input is valid and unambiguous.
This layer establishes the baseline contract for IOCX:
- literal IOCs must be extracted consistently
- metadata fields must be populated correctly
- section parsing must be stable
- no false positives should appear
- output structure must remain unchanged across versions
Layer 1 provides the “ground truth” against which all higher layers are measured. If a change breaks a Layer 1 test, it indicates a regression in fundamental behaviour rather than an improvement in edge‑case handling. These tests ensure that IOCX’s core remains reliable even as the heuristics engine and adversarial handling evolve.
Layer 2 exists to validate IOCX’s behaviour on inputs that are technically valid but structurally unusual, ambiguous, or borderline. These binaries sit between “normal” and “adversarial”: they follow the PE specification, but they stress the parser in ways that real‑world samples often do — unusual alignments, sparse sections, oversized directories, mixed encodings, or uncommon metadata layouts.
The purpose of this layer is to ensure that IOCX handles these edge‑case conditions:
- without crashing
- without misclassifying benign anomalies as malicious
- without producing inconsistent or unstable output
- without leaking internal parsing state into the public API
Layer 2 tests the robustness of the extraction and parsing logic when confronted with inputs that are legal but unexpected. These cases frequently appear in:
- packer stubs
- compiler‑generated oddities
- embedded resources
- installers
- non‑malicious but unconventional binaries
This layer ensures IOCX remains resilient and predictable even when the input stretches the boundaries of what “normal” looks like.
Layer 3 exists to ensure IOCX behaves predictably when confronted with inputs that are malformed, adversarial, or structurally contradictory — the kinds of binaries real‑world DFIR tools encounter but compilers never produce. These samples are designed to break assumptions, violate the PE specification, and trigger edge‑case logic paths. The goal is not to test correctness against “valid” binaries, but to guarantee that IOCX remains stable, deterministic, and safe even when the input is hostile, corrupted, or intentionally evasive.
Layer 4 exists to ensure that previously fixed bugs never reappear. These samples are not designed to be adversarial or structurally interesting — they are historical reproductions of issues that IOCX has already encountered and resolved. Each binary in this layer corresponds to a specific past failure mode: a crash, a hang, a mis‑extraction, a mis‑classification, or an incorrect metadata interpretation.
The purpose of this layer is simple but critical:
- If IOCX ever regresses on a previously fixed behaviour, Layer 4 catches it immediately.
- If a refactor or heuristic change alters output in an unintended way, Layer 4 highlights it.
- If a new feature accidentally reintroduces an old bug, Layer 4 prevents it from shipping.
Regression tests form the long‑term memory of the project. They ensure that as IOCX grows more capable — with new heuristics, deeper analysis, and more complex adversarial handling — it never loses correctness on the behaviours it has already mastered.
Layer 4 is what allows IOCX to evolve confidently without fear of breaking the past.
tests/
└── contract/
│
├── fixtures/
│ ├── layer1_core/
│ ├── layer2_edge/
│ ├── layer3_adversarial/
│ └── layer4_regressions/
├── snapshots/
│ ├── layer1_core/
│ ├── layer2_edge/
│ ├── layer3_adversarial/
│ └── layer4_regressions/
└── test_pipeline.py
Use:
<category>_<descriptive_name>.<analysis_level>.<ext>
Examples:
clean_iocx_demo.core.exeupx_packed.full.exeunicode_homoglyph_domains.full.bin2026_04_bug1234_minimal_repro.full.exe
Mirror the fixture name:
<same_name>.json
This ensures:
- 1:1 mapping
- Easy diffing
- Easy regeneration
Use:
<YYYY>_<MM>_<bug_id>_<short_description>.exe
This encodes:
- chronology
- bug lineage
- reproducibility
This matrix defines the minimum viable set of binaries required to lock in deterministic behaviour across normal, edge‑case, adversarial, and regression scenarios.
Representative, non-complex, realistic binaries that exercise the main parsing paths.
These are the baseline contract. If any of these outputs change, it must be intentional and reviewed.
| Sample | Why it matters |
|---|---|
| 1. Clean IOCX demo PE | Locks in baseline behaviour for simple EXEs, fixed strings, normal imports. |
| 2. Typical Windows‑like system binary (e.g., notepad‑like) | Tests imports, exports, signatures, timestamps, sections. |
| 3. Statically linked executable | Minimal imports, simple section layout, tests fallback logic. |
| 4. Typical compiler‑produced PE (MSVC or MinGW) | Normal import table, standard sections, realistic metadata. |
| 5. .NET assembly | Tests CLR header, metadata directories, managed PE quirks. |
| 6. Signed binary | Tests deterministic signature extraction and certificate chain handling. |
This is an aspirational list and does not represent the core behaviour input corpus. It will be added to gradually.
Tests for each sample
- PE metadata snapshot
- IOC extraction snapshot
- Heuristic snapshot
- End‑to‑end final JSON snapshot
These snapshots become the IOCX contract.
Weird, malformed, or unusual binaries that stress the parser but are not hostile.
| Sample | Why it matters |
|---|---|
| 1. UPX‑packed binary | Tests high entropy, packer heuristics, section anomalies. |
| 2. Import‑by‑ordinal binary | Tests ordinal handling and import table robustness. |
| 3. Binary with broken imports | Tests graceful degradation and fallback logic. |
| 4. Binary with weird TLS directory | Tests TLS parsing, callback handling, anomalies. |
5. Binary with oversized .rsrc |
Tests resource directory traversal. |
6. Binary with tiny .text section |
Tests entropy heuristics and section validation. |
| 7. Binary with overlapping sections | Tests section boundary validation. |
| 8. Binary with malformed PE header | Tests “best effort” parsing. |
| 9. Binary with unusual subsystem | Tests subsystem parsing and normalisation. |
| 10. Binary with sparse import table | Tests import enumeration stability. |
This is an aspirational list and does not represent the current edge case input corpus. It will be added to gradually.
Tests for each sample:
- Metadata snapshot
- Heuristic snapshot
- End‑to‑end snapshot
- Assertions that the parser does not crash
- Assertions that heuristics fire predictably
Inputs designed to stress IOC extraction, PE parsing, RVA mapping, section validation, and heuristic stability under malformed or hostile conditions.
| Sample | Why it matters |
|---|---|
| heuristic_rich.full.exe | Exercises the full heuristic engine across imports, sections, TLS, Rich header, and metadata anomalies. Appendix 3.1 |
| crypto_entropy_payload.full.exe | Tests entropy heuristics, high‑entropy .text, and compressed‑looking overlays. Appendix 3.2 |
| string_obfuscation_tricks.full.exe | Ensures only literal IOCs are extracted; validates suppression of obfuscated or misleading patterns. Appendix 3.3 |
| franken_malformed_pe.full.exe | Hand‑crafted malformed PE combining contradictory headers, invalid directories, overlapping sections, and out‑of‑bounds entrypoints. Appendix 3.4 |
| franken_malformed_pe.pe32.full.exe | PE32 variant of the franken sample; validates optional‑header consistency and PE32‑specific edge cases. Appendix 3.5 |
| malformed_import_table.full.exe | Tests invalid import descriptors, truncated thunks, and out‑of‑range import RVAs. Appendix 3.6 |
| invalid_section_alignment.full.exe | Validates behaviour when raw/virtual sizes contradict alignment rules. Appendix 3.7 |
| corrupted_data_directories.full.exe | Tests overlapping, out‑of‑range, and impossible data‑directory entries. Appendix 3.8 |
| truncated_rich_header.full.exe | Ensures safe handling of malformed or truncated Rich headers. Appendix 3.9 |
| packed_lookalike.full.exe | Positive test for packer heuristics: high entropy + fake packer names + overlay. Appendix 3.10 |
| upx_name_only.full.exe | Negative test for packer heuristics: UPX‑like names only, low entropy, no overlay. Appendix 3.11 |
| broken_rva_addresses.full.exe | Tests invalid RVAs, zero‑length regions, and directory entries pointing outside any section. Appendix 3.12 |
| overlapping_sections.full.exe | Tests overlapping virtual/raw ranges and invalid virtual‑size vs raw‑size relationships. Appendix 3.13 |
| invalid_optional_header.full.exe | Tests malformed PE32+ optional header fields. Appendix 3.14 |
| invalid_optional_header.pe32.full.exe | Tests malformed PE32 optional header fields. Appendix 3.15 |
| long_paths_adversarial.full.bin | Tests extraction limits and boundary handling for extremely long path‑like strings. Appendix 3.16 |
These fixtures provide full adversarial coverage for every IOC category.
| Sample | Why it matters |
|---|---|
| crypto_strings_adversarial.full.bin | Tests BTC/ETH extraction, Base58Check validation, reversed/embedded wallets, and near‑miss patterns. Appendix 3.17 |
| homoglyph_domains_adversarial.full.bin | Tests Unicode homoglyphs, mixed‑script domains, and IDN punycode behaviour. Appendix 3.18 |
| malformed_urls_adversarial.full.bin | Tests broken schemes, nested encodings, truncated URLs, and extremely long URL patterns. Appendix 3.19 |
| filepaths_strings_adversarial.full.bin | Tests MAX_PATH‑breaking Windows paths, malformed UNC prefixes, and deeply nested directory structures. Appendix 3.20 |
| emails_strings_adversarial.full.bin | Tests malformed local parts, Unicode variants, and deceptive email‑like strings. Appendix 3.21 |
| hashes_strings_adversarial.full.bin | Tests truncated digests, near‑miss hex sequences, and false‑positive suppression. Appendix 3.22 |
| base64_strings_adversarial.full.bin | Tests invalid padding, embedded noise, and extremely long base64 runs. Appendix 3.23 |
| malformed_domain.full.exe | Tests domain extraction under malformed, embedded, or deceptive domain‑like patterns. Appendix 3.24 |
| malformed_ip.full.exe | Tests IPv4/IPv6 extraction under corrupted, concatenated, or partial IP patterns. Appendix 3.25 |
| malformed_url.full.exe | Tests URL extraction under broken schemes, malformed IPv6, reversed URLs, and salvage behaviour. Appendix 3.26 |
| franken_url_domain_ip.full.exe | Combined adversarial sample mixing malformed URLs, domains, and IPs inside a PE container. Appendix 3.27 |
- heuristic_rich.full.exe
- crypto_entropy_payload.full.exe
- string_obfuscation_tricks.full.exe
- franken_malformed_pe.full.exe
- franken_malformed_pe.pe32.full.exe
- malformed_import_table.full.exe
- invalid_section_alignment.full.exe
- corrupted_data_directories.full.exe
- truncated_rich_header.full.exe
- packed_lookalike.full.exe
- upx_name_only.full.exe
- broken_rva_addresses.full.exe
- overlapping_sections.full.exe
- invalid_optional_header.full.exe
- invalid_optional_header.pe32.full.exe
- long_paths_adversarial.full.bin
- crypto_strings_adversarial.full.bin
- homoglyph_domains_adversarial.full.bin
- malformed_urls_adversarial.full.bin
- filepaths_strings_adversarial.full.bin
- emails_strings_adversarial.full.bin
- hashes_strings_adversarial.full.bin
- base64_strings_adversarial.full.bin
- malformed_domain.full.exe
- malformed_ip.full.exe
- malformed_url.full.exe
- franken_url_domain_ip.full.exe
Tests for each sample:
- End‑to‑end snapshot
- Assertions that:
- Output is valid JSON
- No crashes or hangs occur
- Fallback behaviour is deterministic
Every bug fixed becomes a new golden test.
Process:
- Capture the offending file (or minimal reproducer).
- Add it to
tests/contract/fixtures/layer4_regressions/. - Generate the correct output.
- Snapshot it.
- Never delete it.
Guarantee:
No fixed bug ever returns.
Layer 1 — Core (6 samples)
- Clean IOCX demo PE
- Windows‑like system binary
- Statically linked EXE
- Typical compiler‑produced EXE
- .NET assembly
- Signed binary
Layer 2 — Edge cases (10 samples)
- UPX‑packed
- Ordinal imports
- Broken imports
- Weird TLS
- Oversized
.rsrc - Tiny
.text - Overlapping sections
- Malformed header
- Unusual subsystem
- Sparse import table
Layer 3 — Adversarial (27 samples)
- Fake PE headers
- Full heuristics and metadata anomalies
- Unicode homoglyph domains
- Malformed URLs
- Mixed‑script IOCs
- Deep escape sequences
- Random entropy strings
- Malformed import table
- Invalid section alignment
- Corrupted data directories
- Truncated rich header
- Packed lookalikes
- Broken RVAs
- Overlapping sections
- Invalid optional header
- Very long paths
Layer 4 — Regression (unbounded)
Every bug fixed becomes a new golden test.
Together, these samples form the minimal baseline required to guarantee deterministic behaviour across the full spectrum of realistic, edge-case, and adversarial PE inputs. This matrix gives:
- Breadth (normal → weird → hostile)
- Depth (metadata → heuristics → IOCs → final schema)
- Determinism (snapshots freeze behaviour)
- Longevity (regressions accumulate forever)
It is the smallest set that enforces the promise:
Same file, same output, every time.
- Stored under version control or content‑addressed storage.
- Never rebuilt during tests.
- Immutable.
- Stored as JSON snapshots.
- Any change requires:
- Explicit regeneration
- Code review
- A commit message explaining why the contract changed
Snapshots are the contract.
Below is a minimal, production‑ready test harness using pytest.
It performs:
- fixture discovery
- pipeline execution
- snapshot comparison
- automatic diffing
- deterministic output enforcement
import json
import pathlib
import pytest
from iocx.engine import Engine
@pytest.fixture
def engine():
return Engine()
FIXTURES_DIR = pathlib.Path("tests/contract/fixtures")
SNAPSHOTS_DIR = pathlib.Path("tests/contract/snapshots")
def load_snapshot(snapshot_path):
with open(snapshot_path, "r", encoding="utf-8") as f:
return json.load(f)
def save_snapshot(snapshot_path, data):
snapshot_path.parent.mkdir(parents=True, exist_ok=True)
with open(snapshot_path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, sort_keys=True)
def discover_fixtures():
"""Yield (fixture_path, snapshot_path) pairs for all layers."""
for fixture in FIXTURES_DIR.rglob("*"):
if fixture.is_file() and fixture.suffix.lower() in ('.exe', '.bin'):
rel = fixture.relative_to(FIXTURES_DIR)
snapshot = SNAPSHOTS_DIR / rel.with_suffix(".json")
yield fixture, snapshot
@pytest.mark.contract
@pytest.mark.parametrize("fixture_path,snapshot_path", discover_fixtures())
def test_contract_safe_pipeline(engine, fixture_path, snapshot_path):
output = engine.extract(fixture_path)
# Normalise file path to string for deterministic snapshot comparison
if isinstance(output.get("file"), pathlib.Path):
output["file"] = str(output["file"])
if not snapshot_path.exists():
# First run: create snapshot
save_snapshot(snapshot_path, output)
pytest.fail(f"Snapshot created for {fixture_path}, please review and re-run.")
expected = load_snapshot(snapshot_path)
assert output == expected, (
f"Contract violation for {fixture_path}.\n"
f"Snapshot: {snapshot_path}\n"
f"Output differs from expected."
)1. Automatic fixture discovery
Every binary in tests/contract/fixtures/** is automatically tested.
2. Snapshot enforcement
If a snapshot doesn’t exist:
- It is created
- The test fails
- You review and commit intentionally
3. Byte‑for‑byte comparison
The final JSON output must match the snapshot exactly:
- field names
- field order
- normalisation
- casing
- heuristics
- IOCs
- metadata
4. Zero tolerance for accidental changes
Any deviation is a contract violation.
- A deterministic, reproducible, future‑proof test suite.
- A clear separation between:
- normal behaviour
- edge cases
- adversarial inputs
- regressions
- A structure that scales indefinitely.
- A harness that enforces the core promise:
Same file, same output, every time