Contract‑Safe Testing Strategy for IOCX

Philosophy

Same file, same output, every time.

This document defines the layered, snapshot‑based testing strategy that ensures IOCX remains deterministic, stable, and predictable across versions.

Scope

This document defines the testing strategy for deterministic PE parsing, heuristic evaluation, IOC extraction, and schema normalisation within IOCX.

Abstract

Deterministic IOC extraction is a foundational requirement for reproducible malware analysis, longitudinal threat intelligence, and automated security pipelines. Variability in extraction results—whether caused by nondeterministic parsing, heuristic drift, or environment‑dependent behaviour—introduces noise that propagates through downstream systems, undermining correlation, deduplication, and historical comparison. To address this, IOCX adopts a contract‑safe testing model in which each binary is treated as an immutable input–output pair. Once a file enters the test suite, its complete structured output is frozen as a golden snapshot. Any deviation from this snapshot is treated as a contract violation unless explicitly reviewed and approved. This approach ensures that the IOCX pipeline remains stable across code changes, dependency updates, and heuristic refinements. By enforcing deterministic behaviour at every stage—PE parsing, heuristic evaluation, IOC extraction, and schema normalisation—IOCX provides a reproducible analytical foundation suitable for research, automation, and long‑term threat intelligence operations.

Introduction

Deterministic IOC extraction is critical for reliable threat intelligence, automated triage, and reproducible malware analysis. However, many commercial and open‑source tools exhibit nondeterministic behaviour due to heuristic instability, inconsistent parsing logic, environment‑specific dependencies, or silent updates that alter output formats. These inconsistencies lead to divergent results for identical inputs, breaking correlation pipelines, invalidating baselines, and eroding analyst trust. IOCX addresses this systemic problem through a contract‑safe testing strategy that treats each binary as a fixed behavioural contract. Once a sample is added to the test suite, its full structured output is captured as a golden snapshot and must remain stable across all future versions of the tool. Any deviation—whether caused by code changes, library upgrades, or heuristic adjustments—is flagged as a contract violation unless explicitly approved. This methodology ensures reproducibility, guards against regression, and provides a stable analytical substrate that other tools often fail to guarantee. By formalising determinism as a first‑class requirement, IOCX avoids the common pitfalls of heuristic drift and nondeterministic extraction, delivering consistent results suitable for long‑term operational use.

Contract-safe testing is split into four distinct layers. The following sections formalise the layered testing model and describe how IOCX enforces deterministic behaviour across all classes of inputs.

Layer Model

Layer 1: Core behaviour

Layer 1 exists to guarantee that IOCX’s fundamental behaviour is stable, predictable, and correct under normal operating conditions. These inputs are intentionally simple, well‑formed, and representative of the kinds of binaries encountered in everyday triage workflows. The goal is not to test edge cases or adversarial conditions, but to ensure that the core extraction engine, metadata pipeline, and section‑level analysis behave deterministically when the input is valid and unambiguous.

This layer establishes the baseline contract for IOCX:

literal IOCs must be extracted consistently
metadata fields must be populated correctly
section parsing must be stable
no false positives should appear
output structure must remain unchanged across versions

Layer 1 provides the “ground truth” against which all higher layers are measured. If a change breaks a Layer 1 test, it indicates a regression in fundamental behaviour rather than an improvement in edge‑case handling. These tests ensure that IOCX’s core remains reliable even as the heuristics engine and adversarial handling evolve.

Layer 2: Edge cases

Layer 2 exists to validate IOCX’s behaviour on inputs that are technically valid but structurally unusual, ambiguous, or borderline. These binaries sit between “normal” and “adversarial”: they follow the PE specification, but they stress the parser in ways that real‑world samples often do — unusual alignments, sparse sections, oversized directories, mixed encodings, or uncommon metadata layouts.

The purpose of this layer is to ensure that IOCX handles these edge‑case conditions:

without crashing
without misclassifying benign anomalies as malicious
without producing inconsistent or unstable output
without leaking internal parsing state into the public API

Layer 2 tests the robustness of the extraction and parsing logic when confronted with inputs that are legal but unexpected. These cases frequently appear in:

packer stubs
compiler‑generated oddities
embedded resources
installers
non‑malicious but unconventional binaries

This layer ensures IOCX remains resilient and predictable even when the input stretches the boundaries of what “normal” looks like.

Layer 3: Adversarial inputs

Layer 3 exists to ensure IOCX behaves predictably when confronted with inputs that are malformed, adversarial, or structurally contradictory — the kinds of binaries real‑world DFIR tools encounter but compilers never produce. These samples are designed to break assumptions, violate the PE specification, and trigger edge‑case logic paths. The goal is not to test correctness against “valid” binaries, but to guarantee that IOCX remains stable, deterministic, and safe even when the input is hostile, corrupted, or intentionally evasive.

Layer 4: Regression tests

Layer 4 exists to ensure that previously fixed bugs never reappear. These samples are not designed to be adversarial or structurally interesting — they are historical reproductions of issues that IOCX has already encountered and resolved. Each binary in this layer corresponds to a specific past failure mode: a crash, a hang, a mis‑extraction, a mis‑classification, or an incorrect metadata interpretation.

The purpose of this layer is simple but critical:

If IOCX ever regresses on a previously fixed behaviour, Layer 4 catches it immediately.
If a refactor or heuristic change alters output in an unintended way, Layer 4 highlights it.
If a new feature accidentally reintroduces an old bug, Layer 4 prevents it from shipping.

Regression tests form the long‑term memory of the project. They ensure that as IOCX grows more capable — with new heuristics, deeper analysis, and more complex adversarial handling — it never loses correctness on the behaviours it has already mastered.

Layer 4 is what allows IOCX to evolve confidently without fear of breaking the past.

Directory Structure

tests/
└── contract/
    │
    ├── fixtures/
    │ ├── layer1_core/
    │ ├── layer2_edge/
    │ ├── layer3_adversarial/
    │ └── layer4_regressions/
    ├── snapshots/
    │ ├── layer1_core/
    │ ├── layer2_edge/
    │ ├── layer3_adversarial/
    │ └── layer4_regressions/
    └── test_pipeline.py

Naming Conventions

Fixtures (binaries)

Use:

<category>_<descriptive_name>.<analysis_level>.<ext>

Examples:

clean_iocx_demo.core.exe
upx_packed.full.exe
unicode_homoglyph_domains.full.bin
2026_04_bug1234_minimal_repro.full.exe

Snapshots (JSON)

Mirror the fixture name:

<same_name>.json

This ensures:

1:1 mapping
Easy diffing
Easy regeneration

Regression naming

Use:

<YYYY>_<MM>_<bug_id>_<short_description>.exe

This encodes:

chronology
bug lineage
reproducibility

Matrix

This matrix defines the minimum viable set of binaries required to lock in deterministic behaviour across normal, edge‑case, adversarial, and regression scenarios.

Layer 1 — Core Behaviour (4–6 binaries)

Representative, non-complex, realistic binaries that exercise the main parsing paths.

These are the baseline contract. If any of these outputs change, it must be intentional and reviewed.

Sample	Why it matters
1. Clean IOCX demo PE	Locks in baseline behaviour for simple EXEs, fixed strings, normal imports.
2. Typical Windows‑like system binary (e.g., notepad‑like)	Tests imports, exports, signatures, timestamps, sections.
3. Statically linked executable	Minimal imports, simple section layout, tests fallback logic.
4. Typical compiler‑produced PE (MSVC or MinGW)	Normal import table, standard sections, realistic metadata.
5. .NET assembly	Tests CLR header, metadata directories, managed PE quirks.
6. Signed binary	Tests deterministic signature extraction and certificate chain handling.

This is an aspirational list and does not represent the core behaviour input corpus. It will be added to gradually.

Tests for each sample

PE metadata snapshot
IOC extraction snapshot
Heuristic snapshot
End‑to‑end final JSON snapshot

These snapshots become the IOCX contract.

Layer 2 — Edge Cases (6–10 binaries)

Weird, malformed, or unusual binaries that stress the parser but are not hostile.

Sample	Why it matters
1. UPX‑packed binary	Tests high entropy, packer heuristics, section anomalies.
2. Import‑by‑ordinal binary	Tests ordinal handling and import table robustness.
3. Binary with broken imports	Tests graceful degradation and fallback logic.
4. Binary with weird TLS directory	Tests TLS parsing, callback handling, anomalies.
5. Binary with oversized `.rsrc`	Tests resource directory traversal.
6. Binary with tiny `.text` section	Tests entropy heuristics and section validation.
7. Binary with overlapping sections	Tests section boundary validation.
8. Binary with malformed PE header	Tests “best effort” parsing.
9. Binary with unusual subsystem	Tests subsystem parsing and normalisation.
10. Binary with sparse import table	Tests import enumeration stability.

This is an aspirational list and does not represent the current edge case input corpus. It will be added to gradually.

Tests for each sample:

Metadata snapshot
Heuristic snapshot
End‑to‑end snapshot
Assertions that the parser does not crash
Assertions that heuristics fire predictably

Layer 3 — Adversarial Inputs (20-30 binaries)

Inputs designed to stress IOC extraction, PE parsing, RVA mapping, section validation, and heuristic stability under malformed or hostile conditions.

A. Adversarial PE Binaries

Sample	Why it matters
heuristic_rich.full.exe	Exercises the full heuristic engine across imports, sections, TLS, Rich header, and metadata anomalies. Appendix 3.1
crypto_entropy_payload.full.exe	Tests entropy heuristics, high‑entropy `.text`, and compressed‑looking overlays. Appendix 3.2
string_obfuscation_tricks.full.exe	Ensures only literal IOCs are extracted; validates suppression of obfuscated or misleading patterns. Appendix 3.3
franken_malformed_pe.full.exe	Hand‑crafted malformed PE combining contradictory headers, invalid directories, overlapping sections, and out‑of‑bounds entrypoints. Appendix 3.4
franken_malformed_pe.pe32.full.exe	PE32 variant of the franken sample; validates optional‑header consistency and PE32‑specific edge cases. Appendix 3.5
malformed_import_table.full.exe	Tests invalid import descriptors, truncated thunks, and out‑of‑range import RVAs. Appendix 3.6
invalid_section_alignment.full.exe	Validates behaviour when raw/virtual sizes contradict alignment rules. Appendix 3.7
corrupted_data_directories.full.exe	Tests overlapping, out‑of‑range, and impossible data‑directory entries. Appendix 3.8
truncated_rich_header.full.exe	Ensures safe handling of malformed or truncated Rich headers. Appendix 3.9
packed_lookalike.full.exe	Positive test for packer heuristics: high entropy + fake packer names + overlay. Appendix 3.10
upx_name_only.full.exe	Negative test for packer heuristics: UPX‑like names only, low entropy, no overlay. Appendix 3.11
broken_rva_addresses.full.exe	Tests invalid RVAs, zero‑length regions, and directory entries pointing outside any section. Appendix 3.12
overlapping_sections.full.exe	Tests overlapping virtual/raw ranges and invalid virtual‑size vs raw‑size relationships. Appendix 3.13
invalid_optional_header.full.exe	Tests malformed PE32+ optional header fields. Appendix 3.14
invalid_optional_header.pe32.full.exe	Tests malformed PE32 optional header fields. Appendix 3.15
long_paths_adversarial.full.bin	Tests extraction limits and boundary handling for extremely long path‑like strings. Appendix 3.16

B. Adversarial IOC‑String Corpora

These fixtures provide full adversarial coverage for every IOC category.

Sample	Why it matters
crypto_strings_adversarial.full.bin	Tests BTC/ETH extraction, Base58Check validation, reversed/embedded wallets, and near‑miss patterns. Appendix 3.17
homoglyph_domains_adversarial.full.bin	Tests Unicode homoglyphs, mixed‑script domains, and IDN punycode behaviour. Appendix 3.18
malformed_urls_adversarial.full.bin	Tests broken schemes, nested encodings, truncated URLs, and extremely long URL patterns. Appendix 3.19
filepaths_strings_adversarial.full.bin	Tests MAX_PATH‑breaking Windows paths, malformed UNC prefixes, and deeply nested directory structures. Appendix 3.20
emails_strings_adversarial.full.bin	Tests malformed local parts, Unicode variants, and deceptive email‑like strings. Appendix 3.21
hashes_strings_adversarial.full.bin	Tests truncated digests, near‑miss hex sequences, and false‑positive suppression. Appendix 3.22
base64_strings_adversarial.full.bin	Tests invalid padding, embedded noise, and extremely long base64 runs. Appendix 3.23
malformed_domain.full.exe	Tests domain extraction under malformed, embedded, or deceptive domain‑like patterns. Appendix 3.24
malformed_ip.full.exe	Tests IPv4/IPv6 extraction under corrupted, concatenated, or partial IP patterns. Appendix 3.25
malformed_url.full.exe	Tests URL extraction under broken schemes, malformed IPv6, reversed URLs, and salvage behaviour. Appendix 3.26
franken_url_domain_ip.full.exe	Combined adversarial sample mixing malformed URLs, domains, and IPs inside a PE container. Appendix 3.27

C. Consolidated Summary (Current State)

PE Adversarial Fixtures (16 total)

heuristic_rich.full.exe
crypto_entropy_payload.full.exe
string_obfuscation_tricks.full.exe
franken_malformed_pe.full.exe
franken_malformed_pe.pe32.full.exe
malformed_import_table.full.exe
invalid_section_alignment.full.exe
corrupted_data_directories.full.exe
truncated_rich_header.full.exe
packed_lookalike.full.exe
upx_name_only.full.exe
broken_rva_addresses.full.exe
overlapping_sections.full.exe
invalid_optional_header.full.exe
invalid_optional_header.pe32.full.exe
long_paths_adversarial.full.bin

IOC‑String Adversarial Fixtures (11 total)

crypto_strings_adversarial.full.bin
homoglyph_domains_adversarial.full.bin
malformed_urls_adversarial.full.bin
filepaths_strings_adversarial.full.bin
emails_strings_adversarial.full.bin
hashes_strings_adversarial.full.bin
base64_strings_adversarial.full.bin
malformed_domain.full.exe
malformed_ip.full.exe
malformed_url.full.exe
franken_url_domain_ip.full.exe

Tests for each sample:

End‑to‑end snapshot
Assertions that:
- Output is valid JSON
- No crashes or hangs occur
- Fallback behaviour is deterministic

Layer 4 — Regression Tests (grows over time)

Every bug fixed becomes a new golden test.

Process:

Capture the offending file (or minimal reproducer).
Add it to tests/contract/fixtures/layer4_regressions/.
Generate the correct output.
Snapshot it.
Never delete it.

Guarantee:

No fixed bug ever returns.

Matrix Summary

Layer 1 — Core (6 samples)

Clean IOCX demo PE
Windows‑like system binary
Statically linked EXE
Typical compiler‑produced EXE
.NET assembly
Signed binary

Layer 2 — Edge cases (10 samples)

UPX‑packed
Ordinal imports
Broken imports
Weird TLS
Oversized .rsrc
Tiny .text
Overlapping sections
Malformed header
Unusual subsystem
Sparse import table

Layer 3 — Adversarial (27 samples)

Fake PE headers
Full heuristics and metadata anomalies
Unicode homoglyph domains
Malformed URLs
Mixed‑script IOCs
Deep escape sequences
Random entropy strings
Malformed import table
Invalid section alignment
Corrupted data directories
Truncated rich header
Packed lookalikes
Broken RVAs
Overlapping sections
Invalid optional header
Very long paths

Layer 4 — Regression (unbounded)

Every bug fixed becomes a new golden test.

Final note

Together, these samples form the minimal baseline required to guarantee deterministic behaviour across the full spectrum of realistic, edge-case, and adversarial PE inputs. This matrix gives:

Breadth (normal → weird → hostile)
Depth (metadata → heuristics → IOCs → final schema)
Determinism (snapshots freeze behaviour)
Longevity (regressions accumulate forever)

It is the smallest set that enforces the promise:

Same file, same output, every time.

Snapshot Policy

Golden binaries

Stored under version control or content‑addressed storage.
Never rebuilt during tests.
Immutable.

Golden outputs

Stored as JSON snapshots.
Any change requires:
- Explicit regeneration
- Code review
- A commit message explaining why the contract changed

Snapshots are the contract.

Test Harness

Below is a minimal, production‑ready test harness using pytest.

It performs:

fixture discovery
pipeline execution
snapshot comparison
automatic diffing
deterministic output enforcement

import json
import pathlib
import pytest
from iocx.engine import Engine

@pytest.fixture
def engine():
    return Engine()


FIXTURES_DIR = pathlib.Path("tests/contract/fixtures")
SNAPSHOTS_DIR = pathlib.Path("tests/contract/snapshots")


def load_snapshot(snapshot_path):
    with open(snapshot_path, "r", encoding="utf-8") as f:
        return json.load(f)


def save_snapshot(snapshot_path, data):
    snapshot_path.parent.mkdir(parents=True, exist_ok=True)
    with open(snapshot_path, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, sort_keys=True)


def discover_fixtures():
    """Yield (fixture_path, snapshot_path) pairs for all layers."""
    for fixture in FIXTURES_DIR.rglob("*"):
        if fixture.is_file() and fixture.suffix.lower() in ('.exe', '.bin'):
            rel = fixture.relative_to(FIXTURES_DIR)
            snapshot = SNAPSHOTS_DIR / rel.with_suffix(".json")
            yield fixture, snapshot

@pytest.mark.contract
@pytest.mark.parametrize("fixture_path,snapshot_path", discover_fixtures())
def test_contract_safe_pipeline(engine, fixture_path, snapshot_path):

    output = engine.extract(fixture_path)

    # Normalise file path to string for deterministic snapshot comparison
    if isinstance(output.get("file"), pathlib.Path):
        output["file"] = str(output["file"])

    if not snapshot_path.exists():
        # First run: create snapshot
        save_snapshot(snapshot_path, output)
        pytest.fail(f"Snapshot created for {fixture_path}, please review and re-run.")

    expected = load_snapshot(snapshot_path)

    assert output == expected, (
        f"Contract violation for {fixture_path}.\n"
        f"Snapshot: {snapshot_path}\n"
        f"Output differs from expected."
    )

How This Harness Enforces The Contract

1. Automatic fixture discovery

Every binary in tests/contract/fixtures/** is automatically tested.

2. Snapshot enforcement

If a snapshot doesn’t exist:

It is created
The test fails
You review and commit intentionally

3. Byte‑for‑byte comparison

The final JSON output must match the snapshot exactly:

field names
field order
normalisation
casing
heuristics
IOCs
metadata

4. Zero tolerance for accidental changes

Any deviation is a contract violation.

What This Gives You

A deterministic, reproducible, future‑proof test suite.
A clear separation between:
- normal behaviour
- edge cases
- adversarial inputs
- regressions
A structure that scales indefinitely.
A harness that enforces the core promise:

Same file, same output, every time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contract‑Safe Testing Strategy for IOCX

Philosophy

Scope

Abstract

Introduction

Layer Model

Layer 1: Core behaviour

Layer 2: Edge cases

Layer 3: Adversarial inputs

Layer 4: Regression tests

Directory Structure

Naming Conventions

Fixtures (binaries)

Snapshots (JSON)

Regression naming

Matrix

Layer 1 — Core Behaviour (4–6 binaries)

Layer 2 — Edge Cases (6–10 binaries)

Layer 3 — Adversarial Inputs (20-30 binaries)

A. Adversarial PE Binaries

B. Adversarial IOC‑String Corpora

C. Consolidated Summary (Current State)

PE Adversarial Fixtures (16 total)

IOC‑String Adversarial Fixtures (11 total)

Layer 4 — Regression Tests (grows over time)

Matrix Summary

Final note

Snapshot Policy

Golden binaries

Golden outputs

Test Harness

How This Harness Enforces The Contract

What This Gives You

FilesExpand file tree

contract_safe_testing.md

Latest commit

History

contract_safe_testing.md

File metadata and controls

Contract‑Safe Testing Strategy for IOCX

Philosophy

Scope

Abstract

Introduction

Layer Model

Layer 1: Core behaviour

Layer 2: Edge cases

Layer 3: Adversarial inputs

Layer 4: Regression tests

Directory Structure

Naming Conventions

Fixtures (binaries)

Snapshots (JSON)

Regression naming

Matrix

Layer 1 — Core Behaviour (4–6 binaries)

Layer 2 — Edge Cases (6–10 binaries)

Layer 3 — Adversarial Inputs (20-30 binaries)

A. Adversarial PE Binaries

B. Adversarial IOC‑String Corpora

C. Consolidated Summary (Current State)

PE Adversarial Fixtures (16 total)

IOC‑String Adversarial Fixtures (11 total)

Layer 4 — Regression Tests (grows over time)

Matrix Summary

Final note

Snapshot Policy

Golden binaries

Golden outputs

Test Harness

How This Harness Enforces The Contract

What This Gives You