Skip to content

IOMeshSWC: replace float sample/parent identifiers with an integer type (precision + map-key collisions) #6391

@hjmjohnson

Description

@hjmjohnson

itk::SWCMeshIO stores SWC sample/parent/type identifiers as float (SampleIdentifierType, TypeIdentifierType, ParentIdentifierType in Modules/IO/IOMeshSWC/include/itkSWCMeshIO.h). SWC identifiers are integers by specification; representing them as float causes precision loss and is used directly as std::unordered_map keys for topology lookup. This is a design/API change deferred from the parser-hardening work, tracked here separately because it alters the exposed point-data component type and the Python wrapping.

Problem detail
  • SampleIdentifierType = float and ParentIdentifierType = float are used as keys in m_SampleIdentifierToPointIndex (std::unordered_map<float, IdentifierType>). Integer sample ids above 2^24 are not exactly representable in float, so two distinct ids just above 2^24 collapse to the same key, mismapping parent->point connectivity in ReadCells.
  • Root detection uses an exact float comparison parentIdentifier != -1; while -1 is exactly representable, the broader pattern of exact float equality on identifiers is fragile.
  • Machine-generated tracings routinely use large monotonic ids, so this is reachable with real data, not just adversarial input.

The header already carries commented-out integer alternatives:

// using SampleIdentifierType = int16_t;
using SampleIdentifierType = float;   // "For Python wrapping"

suggesting the float choice was a wrapping workaround, not a deliberate data-model decision.

Why this is not a drop-in fix

Changing the identifier types to an integer type is an API and wrapping change, not a localized bug fix:

  • ReadPointData exposes these containers as mesh point data; m_PointPixelComponentType is set to FLOAT. Switching the type changes the component type seen by consumers reading SampleIdentifier/ParentIdentifier point data.
  • Modules/IO/IOMeshSWC/wrapping/itkSWCMeshIO.wrap and the Python interface would need corresponding updates.
  • Round-trip and baseline expectations may shift.

It therefore needs a deliberate data-model decision (which integer width) plus wrapping review, rather than riding along with the defensive parser-hardening changes.

Proposed direction
  • Choose an integer identifier type wide enough for real SWC ids (e.g. IdentifierType / uint32_t / int64_t) for SampleIdentifierType / ParentIdentifierType (and likely TypeIdentifierType, which is a small enum-like code).
  • Keep RadiusType = double.
  • Update the unordered_map key types, ReadPointData component type, and the .wrap accordingly.
  • Add a regression test reading an SWC file with ids above 2^24 and asserting connectivity is preserved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions