Skip to content

Structural validators audit and hardening (PE engine) #27

@malx-labs

Description

@malx-labs

Structural validators audit and hardening (PE engine)

Summary

Our current PE structural validators are solid but incomplete. The recent refactor gave us a clean architecture and green tests, but a detailed review shows:

  • Some important structural checks are missing.
  • A few inconsistencies and conceptual bugs exist (notably around RVA vs file offsets).
  • Some validators are too minimal given what the PE format allows us to validate.

This issue tracks a comprehensive hardening pass over all structural validators and their interaction with the heuristics layer.

Scope:

  • validate_entrypoint
  • validate_sections
  • validate_optional_header
  • validate_rva_graph
  • validate_tls
  • validate_signature
  • validate_entropy
  • Their consumption via run_structural_validators and _analyse_structural

Goals

  • No meaningful structural anomaly goes undetected (within the data we already parse).
  • No conceptual inconsistencies (e.g., mixing RVA and file offsets).
  • No contradictions between validators.
  • Clear, consistent, and well‑named ReasonCodes for all structural issues.
  • Tests that cover both happy paths and malformed edge cases.

1. Entrypoint validator (validate_entrypoint)

Current behaviour

  • Extracts entry_point from extended["header"].
  • Maps EP RVA to a section via _map_rva_to_section.
  • Emits:
    • ENTRYPOINT_OUT_OF_BOUNDS if no section maps.
    • ENTRYPOINT_SECTION_NOT_EXECUTABLE if section is not executable.
    • ENTRYPOINT_IN_TRUNCATED_REGION for:
      • zero‑length section
      • EP beyond virtual size
    • ENTRYPOINT_IN_OVERLAY if entry_point >= overlay_offset.

Problems

  • Conceptual bug: entry_point is an RVA, overlay_offset is a file offset. Comparing them directly is invalid.
  • Redundancy: zero‑length section + “beyond virtual size” can both fire for the same case.
  • Omissions:
    • EP < SizeOfHeaders (EP inside headers).
    • EP < image base / obviously bogus EP (e.g., 0).
    • EP inside non‑code sections (e.g., .rsrc, .reloc).
    • EP inside discardable section.
    • EP inside relocations.
    • EP inside resources.
    • EP not mapping to any section but still within SizeOfImage (already partially covered, but worth making explicit).

Tasks

  • Fix RVA vs file offset comparison:
    • Add a helper to map EP RVA → file offset using section table.
    • Re‑implement ENTRYPOINT_IN_OVERLAY using EP file offset vs overlay_offset.
  • Add header/region checks:
    • Use SizeOfHeaders (from optional header) to flag:
      • ENTRYPOINT_IN_HEADERS when ep < size_of_headers.
    • Optionally flag ENTRYPOINT_ZERO_OR_NEGATIVE when EP is 0 or negative.
  • Add section‑type checks:
    • If EP maps to .rsrc, .reloc, or other non‑code sections, emit a dedicated reason:
      • e.g., ENTRYPOINT_IN_NON_CODE_SECTION.
    • If EP maps to a discardable section, emit:
      • ENTRYPOINT_IN_DISCARDABLE_SECTION.
  • Deduplicate truncated checks:
    • Ensure we don’t emit multiple ENTRYPOINT_IN_TRUNCATED_REGION reasons for the same underlying condition unless we explicitly want that.
  • Tests:
    • EP in header.
    • EP in .rsrc.
    • EP in .reloc.
    • EP in discardable section.
    • EP in overlay (using correct RVA→file mapping).
    • EP zero / obviously bogus.

2. Section validator (validate_sections)

Current behaviour

  • Per‑section:
    • SECTION_RWX for executable + writable.
    • SECTION_NON_EXECUTABLE_CODE_LIKE for IMAGE_SCN_CNT_CODE but not executable.
    • SECTION_EXEC_IN_SUSPICIOUS_NAME for code‑like names but not executable.
    • SECTION_NAME_NON_ASCII for non‑ASCII names.
    • SECTION_NAME_EMPTY_OR_PADDING for padding/empty names.
    • SECTION_IMPOSSIBLE_FLAGS for discardable + executable + writable.
    • SECTION_RAW_MISALIGNED for raw misalignment vs FileAlignment.
  • Global:
    • SECTION_OVERLAP for virtual overlaps.

Problems

  • Omissions:
    • No raw overlap detection.
    • No virtual_size < raw_size check.
    • No raw_address < SizeOfHeaders check (sections overlapping headers).
    • No unsorted sections (by raw or virtual address).
    • No zero‑length sections flagged here (only indirectly via entrypoint).
    • No discardable + executable (without writable) flagged.
    • No checks for contradictory flags (e.g., code but no read, write but no read).
  • Minor naming oddity:
    • SECTION_EXEC_IN_SUSPICIOUS_NAME is emitted when the section is not executable but has a code‑like name—name suggests the opposite.

Tasks

  • Add raw overlap detection:
    • Compute raw ranges (raw_address, raw_size) and emit:
      • SECTION_RAW_OVERLAP when they intersect.
  • Add size consistency checks:
    • If virtual_size and raw_size are ints and virtual_size < raw_size, emit:
      • SECTION_VIRTUAL_SMALLER_THAN_RAW.
  • Add header overlap check:
    • Use SizeOfHeaders to flag:
      • SECTION_OVERLAPS_HEADERS when raw_address < size_of_headers.
  • Add ordering checks:
    • Ensure sections are sorted by raw_address and/or virtual_address.
    • Emit e.g. SECTION_OUT_OF_ORDER_RAW / SECTION_OUT_OF_ORDER_VIRTUAL when violated.
  • Add zero‑length section detection:
    • If both virtual_size == 0 and raw_size == 0, emit:
      • SECTION_ZERO_LENGTH.
  • Add discardable code checks:
    • If discardable and executable (even without writable), emit:
      • SECTION_DISCARDABLE_CODE.
  • Add contradictory flag checks:
    • Examples:
      • IMAGE_SCN_CNT_CODE but no MEM_READ.
      • MEM_WRITE but no MEM_READ.
    • Emit appropriate ReasonCodes (e.g., SECTION_FLAGS_INCONSISTENT with details).
  • Consider renaming / clarifying SECTION_EXEC_IN_SUSPICIOUS_NAME:
    • Either rename to reflect “code‑like name but not executable” or adjust logic to match the name.
  • Tests:
    • Raw overlap.
    • Virtual_size < raw_size.
    • Section overlapping headers.
    • Out‑of‑order sections.
    • Zero‑length section.
    • Discardable + executable.
    • Contradictory flags.

3. Optional header validator (validate_optional_header)

Current behaviour

  • Extracts size_of_image.
  • Computes max section end (va + vs).
  • Emits:
    • OPTIONAL_HEADER_INCONSISTENT_SIZE if max_end > size_of_image.

Problems

  • Very minimal given the richness of the optional header.
  • Omissions:
    • SizeOfHeaders consistency.
    • SectionAlignment validity.
    • FileAlignment validity.
    • SizeOfCode, SizeOfInitializedData, SizeOfUninitializedData consistency.
    • AddressOfEntryPoint consistency (partially handled in entrypoint validator).
    • ImageBase alignment.
    • DllCharacteristics sanity.
    • Subsystem validity.
    • NumberOfRvaAndSizes consistency.
    • SizeOfImage alignment to SectionAlignment.

Tasks

  • Add SizeOfHeaders checks:
    • Ensure SizeOfHeaders:
      • ≥ end of headers + section table.
      • aligned to FileAlignment.
    • Emit e.g. OPTIONAL_HEADER_INVALID_SIZE_OF_HEADERS.
  • Add SectionAlignment checks:
    • Ensure:
      • SectionAlignment >= FileAlignment.
      • SectionAlignment is a power of 2.
    • Emit e.g. OPTIONAL_HEADER_INVALID_SECTION_ALIGNMENT.
  • Add FileAlignment checks:
    • Ensure:
      • FileAlignment is a power of 2.
      • FileAlignment within reasonable bounds (e.g., ≥ 512, ≤ 64K).
    • Emit e.g. OPTIONAL_HEADER_INVALID_FILE_ALIGNMENT.
  • Add size field consistency checks:
    • Compare SizeOfCode, SizeOfInitializedData, SizeOfUninitializedData against section totals.
    • Emit e.g. OPTIONAL_HEADER_SIZE_FIELDS_INCONSISTENT.
  • Add ImageBase alignment check:
    • Ensure ImageBase is 64K aligned.
    • Emit e.g. OPTIONAL_HEADER_IMAGE_BASE_MISALIGNED.
  • Add NumberOfRvaAndSizes checks:
    • Ensure it is:
      • ≥ number of directories actually present.
      • ≤ 16 (standard max).
    • Emit e.g. OPTIONAL_HEADER_INVALID_NUMBER_OF_RVA_AND_SIZES.
  • Add SizeOfImage alignment check:
    • Ensure SizeOfImage % SectionAlignment == 0.
    • Emit e.g. OPTIONAL_HEADER_SIZE_OF_IMAGE_MISALIGNED.
  • (Optional) Add Subsystem and DllCharacteristics sanity checks:
    • Validate known subsystem values.
    • Flag obviously bogus combinations (e.g., DYNAMIC_BASE without relocations).
  • Tests:
    • Misaligned SizeOfImage.
    • Invalid FileAlignment / SectionAlignment.
    • Invalid SizeOfHeaders.
    • Inconsistent size fields.
    • Misaligned ImageBase.
    • Invalid NumberOfRvaAndSizes.

4. RVA graph validator (validate_rva_graph)

Current behaviour

  • Uses data_directories and size_of_image.
  • Emits:
    • DATA_DIRECTORY_ZERO_RVA_NONZERO_SIZE for rva == 0 and size > 0.
    • DATA_DIRECTORY_OUT_OF_RANGE for rva + size > size_of_image.
    • DATA_DIRECTORY_OVERLAP for overlapping directory ranges.

Problems

  • Omissions:
    • Negative RVAs.
    • Negative sizes.
    • Zero‑size directories where size should be > 0 (e.g., import, resource).
    • Directories pointing into headers.
    • Directories pointing into overlays.
    • Directories pointing into no section (even if within SizeOfImage).
    • Directories spanning multiple sections.
    • Directory ordering.
  • Inconsistency with other validators:
    • EntryPoint and TLS/signature validators don’t leverage this for some of their checks (e.g., overlays, bounds).

Tasks

  • Add basic sanity checks:
    • If rva < 0 or size < 0, emit:
      • DATA_DIRECTORY_INVALID_RANGE.
  • Add zero‑size directory checks (where applicable):
    • For known directories that must have size > 0 when present, emit:
      • DATA_DIRECTORY_ZERO_SIZE_UNEXPECTED.
  • Add header overlap checks:
    • Use SizeOfHeaders to flag:
      • DATA_DIRECTORY_IN_HEADERS when rva < size_of_headers.
  • Add overlay checks:
    • If we can map directory RVA → file offset, flag:
      • DATA_DIRECTORY_IN_OVERLAY when directory lies in overlay.
  • Add section mapping checks:
    • Use section table to ensure:
      • directory range maps to at least one section.
      • directory does not span across multiple sections (unless explicitly allowed).
    • Emit e.g.:
      • DATA_DIRECTORY_NOT_MAPPED_TO_SECTION.
      • DATA_DIRECTORY_SPANS_MULTIPLE_SECTIONS.
  • Add ordering checks (optional):
    • Ensure directories are in ascending order by index or RVA if that’s a requirement we care about.
  • Tests:
    • Negative RVAs / sizes.
    • Directory in headers.
    • Directory in overlay.
    • Directory not mapped to any section.
    • Directory spanning multiple sections.
    • Zero‑size directory where it shouldn’t be.

5. TLS validator (validate_tls)

Current behaviour

  • Iterates extended entries with value == "tls_directory".
  • Extracts start_address, end_address, callbacks.
  • Emits:
    • TLS_CALLBACK_OUTSIDE_RANGE if callbacks not in [start, end).

Problems

  • Omissions:
    • Multiple TLS directories.
    • start > end.
    • Zero‑length TLS directory (start == end).
    • callbacks == 0 with non‑zero TLS directory.
    • Callback array termination (0‑terminated list).
    • Callback RVA mapping to valid section.
    • Callback RVA being executable.
    • Callback RVA not in overlay / header.
    • TLS directory bounds vs SizeOfImage (currently left to parser / RVA graph).
  • No integration with section / RVA graph checks for callback mapping.

Tasks

  • Handle multiple TLS directories:
    • If more than one tls_directory entry exists, emit:
      • TLS_MULTIPLE_DIRECTORIES.
  • Add range sanity checks:
    • If start >= end, emit:
      • TLS_INVALID_RANGE.
  • Add zero‑length directory check:
    • If start == end but TLS directory is present, emit:
      • TLS_ZERO_LENGTH_DIRECTORY.
  • Add callback presence checks:
    • If callbacks == 0 but TLS directory is non‑empty, emit:
      • TLS_CALLBACKS_MISSING.
  • Add callback mapping checks:
    • Map callback RVA(s) to sections:
      • If no section maps, emit TLS_CALLBACK_NOT_MAPPED_TO_SECTION.
      • If mapped section is not executable, emit TLS_CALLBACK_IN_NON_EXECUTABLE_SECTION.
      • If mapped into overlay or header, emit appropriate reasons.
  • (Optional) Add callback array termination checks:
    • If we have access to the raw callback array, ensure it is 0‑terminated.
  • Tests:
    • Multiple TLS directories.
    • Invalid start/end.
    • Zero‑length TLS directory.
    • Missing callbacks.
    • Callback outside any section.
    • Callback in non‑executable section.
    • Callback in overlay / header.

6. Signature validator (validate_signature)

Current behaviour

  • Reads has_signature and signatures from metadata.
  • Emits:
    • SIGNATURE_FLAG_SET_BUT_NO_METADATA when has_signature is true but signatures is empty.

Problems

  • Omissions:
    • Signature present but flag not set.
    • Multiple certificates.
    • Certificate size mismatches.
    • Certificate offset out of file bounds.
    • Invalid certificate type / revision.
    • Certificate directory overlap / out‑of‑range (partially RVA graph’s job).
  • Very minimal given the complexity of WIN_CERTIFICATE.

Tasks

  • Add flag/metadata symmetry check:
    • If signatures non‑empty but has_signature is false, emit:
      • SIGNATURE_PRESENT_BUT_FLAG_NOT_SET.
  • Add multiplicity checks:
    • If more than one certificate is present, emit:
      • SIGNATURE_MULTIPLE_CERTIFICATES (or similar).
  • Add basic certificate sanity checks (using what metadata we have):
    • Validate dwLength, wRevision, wCertificateType if available.
    • Emit:
      • SIGNATURE_INVALID_LENGTH.
      • SIGNATURE_INVALID_TYPE.
      • SIGNATURE_INVALID_REVISION.
  • Add bounds checks:
    • If we have certificate offsets/sizes, ensure they lie within file bounds and don’t overlap critical structures.
    • Emit:
      • SIGNATURE_OUT_OF_FILE_BOUNDS.
      • SIGNATURE_OVERLAPS_OTHER_DATA.
  • Coordinate with validate_rva_graph:
    • Ensure the security directory (certificate) is included in data_directories and benefits from out‑of‑range/overlap checks.
  • Tests:
    • Flag set, no metadata (already covered).
    • Metadata present, flag not set.
    • Multiple certificates.
    • Invalid length/type/revision (as far as metadata allows).
    • Certificate out of file bounds.

7. Entropy validator (validate_entropy)

Current behaviour

  • Per‑section:
    • ENTROPY_HIGH_SECTION for high entropy sections above size threshold.
  • Overlay:
    • ENTROPY_HIGH_OVERLAY for high entropy overlay above size threshold.
  • Global:
    • ENTROPY_UNIFORM_ACROSS_SECTIONS for uniformly high entropy across sections.

Problems

  • Omissions:
    • Extremely low entropy (zero‑filled / padding abuse).
    • Entropy of specific regions (resources, relocations, imports, TLS, certificate).
    • Entropy spikes inside sections (sub‑section granularity).
  • By design, very conservative—which is good, but we can still add a few safe checks.

Tasks

  • Add low entropy detection:
    • For sufficiently large sections with entropy below a very low threshold, emit:
      • ENTROPY_VERY_LOW_SECTION.
  • Add region‑specific entropy checks (optional but valuable):
    • If we have per‑region entropy (e.g., .rsrc, .reloc, import table, TLS, certificate), add:
      • ENTROPY_HIGH_RESOURCES.
      • ENTROPY_HIGH_RELOCATIONS.
      • ENTROPY_HIGH_IMPORTS.
      • ENTROPY_HIGH_TLS.
      • ENTROPY_HIGH_CERTIFICATE.
    • Only if the data is already available—no need to extend parsing for now.
  • Keep structural vs heuristic separation:
    • Ensure entropy issues remain structural and continue to be skipped by _analyse_structural via _SKIP_ENTROPY where appropriate.
  • Tests:
    • Very low entropy section.
    • High entropy in specific regions (if supported by metadata).
    • Uniform entropy behaviour remains unchanged.

8. Heuristics layer & dispatcher alignment

Current behaviour

  • run_structural_validators returns a dict of lists of StructuralIssue.
  • _analyse_structural turns them into pe_structure_anomaly detections, skipping entropy issues via _SKIP_ENTROPY.

Tasks

  • Ensure new ReasonCodes are wired correctly:
    • Add new structural reasons to ReasonCodes.
    • Ensure _SKIP_ENTROPY remains correct (only entropy‑related reasons).
  • Add tests for dispatcher stability:
    • Ensure run_structural_validators always returns all keys, even if empty.
    • Add a smoke test that runs all validators on a synthetic PE and asserts:
      • no crashes
      • structural output shape is stable.

Acceptance criteria

  • All validators (entrypoint, sections, optional_header, rva_graph, tls, signature, entropy) have:
    • Comprehensive, well‑documented checks for their domain.
    • No conceptual bugs (e.g., RVA vs file offset confusion).
    • No obvious omissions for anomalies we can detect with existing metadata.
  • The heuristics layer:
    • Correctly surfaces structural issues as pe_structure_anomaly where appropriate.
    • Continues to skip entropy issues intentionally.
  • The test suite:
    • Covers all new checks with focused, readable tests.
    • Keeps synthetic “full coverage” tests passing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions