Skip to content

feat: add compute_polarization() to collective behavior metrics#875

Draft
khan-u wants to merge 18 commits into
neuroinformatics-unit:mainfrom
khan-u:add-collective-behavior-metrics
Draft

feat: add compute_polarization() to collective behavior metrics#875
khan-u wants to merge 18 commits into
neuroinformatics-unit:mainfrom
khan-u:add-collective-behavior-metrics

Conversation

@khan-u
Copy link
Copy Markdown

@khan-u khan-u commented Mar 7, 2026

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

Movement lacks metrics for analyzing collective behavior in multi-individual tracking data. Polarization quantifies group alignment, a foundational measure in collective behavior research used to classify group states such as schooling in fish.

What does this PR do?

Adds compute_polarization() to the kinematics module. Polarization measures how aligned individuals' direction vectors are, supporting two modes:

  • Orientation polarization (body-axis mode): Alignment of body orientations (which way they're facing), computed from keypoint pairs like (tail_base, neck)
  • Heading polarization (displacement mode): Alignment of movement directions (which way they're traveling), computed from position displacement

Output range:

  • 1.0 = perfect alignment (all direction vectors point the same way)
  • 0.0 = no alignment (random directions or exact cancellation)
$$\Phi = \frac{1}{N} \left\| \sum_{i=1}^{N} \hat{u}_i \right\|$$

where $\hat{u}_i$ is the unit direction vector for individual $i$, and $N$ is the number of valid individuals at each time point.

Redesigned API – Newer Comment

Usage

from movement.kinematics import compute_polarization

# Orientation polarization: body orientation alignment from keypoint pair
polarization = compute_polarization(
    ds.position,
    body_axis_keypoints=("tail_base", "neck"),
)

# Heading polarization: movement direction alignment from displacement
polarization = compute_polarization(ds.position.sel(keypoints="thorax"))

# Heading polarization: first keypoint (index 0) used if none selected
polarization = compute_polarization(ds.position)

# Orientation polarization with mean body orientation angle (radians)
polarization, mean_angle = compute_polarization(
    ds.position,
    body_axis_keypoints=("tail_base", "neck"),
    return_angle=True,
)

# Heading polarization with mean movement direction angle (degrees)
polarization, mean_angle = compute_polarization(
    ds.position.sel(keypoints="thorax"),
    return_angle=True,
    in_degrees=True,
)

Parameters

Parameter Type Default Description
data xarray.DataArray Position data with time, space, and individuals dims. Requires x, y space coordinates. Must include keypoints dim when using body_axis_keypoints; for heading polarization, pre-select a keypoint via .sel(keypoints="...") else the first (index 0) will be used.
body_axis_keypoints tuple[Hashable, Hashable] None (origin, target) keypoint pair defining body axis for orientation polarization. When None, computes heading polarization from displacement.
displacement_frames int 1 Frame interval for displacement-based heading polarization. Ignored when body_axis_keypoints is set.
return_angle bool False Return mean angle as second output. Returns mean body orientation angle (orientation polarization) or mean movement direction angle (heading polarization).
in_degrees bool False If True, mean angle is returned in degrees; otherwise radians. Only relevant when return_angle=True.

Returns

Condition Return Value
return_angle=False DataArray named "polarization", dims ("time",), values clipped to [0, 1]
return_angle=True Tuple (polarization, mean_angle) where mean_angle is a DataArray named "mean_angle", dims ("time",)

Edge Case Handling

Condition Behavior
Missing data (NaN) Excluded per individual, per frame
Zero-length direction vector Excluded (stationary individuals in heading mode, or coincident keypoints in orientation mode)
All direction vectors invalid Returns NaN for that frame
Exact vector cancellation Polarization = 0, angle = NaN
3D position data Z-coordinate ignored; computed in x/y plane
Heading polarization, first N frames NaN (no prior frame for displacement)

References

How has this PR been tested?

49 test methods across 5 test classes with comprehensive coverage on collective.py:

Test Class # Tests Coverage
TestComputePolarizationValidation 13 DataArray type, required dims, space coord labels, empty keypoints, keypoint validation, displacement_frames type/range
TestComputePolarizationBehavior 18 Aligned→1, opposite→0, cardinal-cancellation→0, partial alignment, single individual, NaN exclusion, stationary exclusion, zero-length body-axis exclusion, permutation invariance, translation/scaling/rotation invariance (displacement mode), body-axis mode invariance
TestHeadingSourceSelection 5 Body-axis vs displacement mode, first-keypoint fallback, explicit .sel() selection, z-coord ignored, first-frame validity with angle
TestDisplacementFrames 4 First-N-frames NaN, reference-frame NaN propagation, multi-frame smoothing, oversized displacement window
TestReturnAngle 9 Output naming/dims, cardinal directions, diagonal motion, cancellation→NaN, angle rotation under global rotation, wraparound near ±π, in_degrees conversion

Is this a breaking change?

No.

Does this PR require an update to the documentation?

No — API docs auto-generate from docstrings.

Checklist

  • Code tested locally
  • Tests added for new functionality
  • Documentation updated (docstrings)
  • Formatted with pre-commit

Polarization Visualization Demos

Visualization script that produces a multi-row figure showing polarization dynamics over a selected time segment, integrating the AP inference introduced later in PR #945 .

  • Uses compute_polarization across multiple displacement timescales:

    • Body-axis orientation (static pose-based alignment)
    • Single-frame heading (instantaneous motion direction)
    • Sustained heading (~1-second displacement; temporally smoothed motion)
  • Drives a segment scoring pipeline that:

    • Automatically identifies time windows where collective alignment emerges, changes, or dissolves
  • Constructs a composite score by integrating:

    • Polarization range and net directional change (orientation and heading independently)
    • Angular sweep agreement (concordance between orientation and heading trajectories)
    • Collective motion magnitude (minimum per-individual displacement, weighted by heading polarization)
  • Prioritizes segments that:

    • Exhibit wide polarization variation rather than static states
    • Show coherent multi-individual dynamics
    • Reward segments where all animals are visibly moving
  • Resulting visualizations demonstrate that polarization:

    • Captures transitions between disordered and aligned group states
    • Shows concordance between:
      • Where individuals point (orientation)
      • Where they move (heading)

Polarization dynamics visualization

A 3-row × N-panel figure (N = NUM_PANELS, default 5) showing the temporal evolution of collective alignment over a selected segment. Each column corresponds to one frame, spaced 1 second apart (= fps frames).

Row 1: Heading Across Frames (velocity_node → velocity_node)

Each panel overlays two frames using adaptive blending

  • Current frame
  • Reference frame:
    • 1 second earlier (continuous mode)
    • 1 frame earlier (sparse mode)
  • prev_frames[p] is valid for every panel (including the first)
  • Ghost overlay is always applied (no missing references)

Overlay content:

  • Per-animal displacement arrows:
    • Anchored at velocity_node
    • Colored by individual
    • Represent motion over the panel interval
Row 2: Orientation Within Frame (from_node → to_node)

Each panel shows the same frame as Row 1 (without ghost overlay), with body-axis annotations.

  • Directed line: from_node → to_node
  • Markers:
    • ■ = from_node
    • ● = to_node

Labeling logic:

  • If ap_validated = True:

    • ■ = Posterior
    • ● = Anterior
  • If ap_validated = False:

    • ■ = (from_node)
    • ● = (to_node)
Row 3: Polar Plots
  • Radius: r ∈ [0, 1], Angle: θ ∈ [0°, 360°]
  • Convention: 0° = East, Counter-clockwise positive

Vectors (origin-centered):

  • Orientation polarization p_b (yellow):

    • Magnitude: resultant length of per-animal body-axis unit vectors
    • Angle:
      • Circular mean of body-axis orientations
      • Transformed from image → Cartesian (Y-axis flipped)
  • Heading polarization p_h (orange):

    • Magnitude: resultant length of per-animal displacement unit vectors
    • Angle:
      • Circular mean of displacement directions
      • Transformed from image → Cartesian (Y-axis flipped)

Segment Selection

Sparse vs Continuous
  • When FPS is known (from user override or video metadata), the script selects the best contiguous segment of NUM_PANELS frames spaced 1 second apart (frame_interval = fps).

  • If no full segment is found, the script searches for partial continuous segments down to MIN_CONTINUOUS_PANELS (default 3).

  • If no continuous segment of any valid length is found, it falls back to sparse mode.

  • In sparse mode, individual frames are selected greedily to maximize variance in polarization, using single-frame displacement for heading.

  • Frames where the combined polarization is high enough to produce a visible resultant vector on the polar plot (≥ MIN_VISUAL_POL, default 0.15) are prioritized, then lower-polarization frames are added if needed.

Coordinate Conventions
  • All angles use image coordinates with a Y-flip: dy_cartesian = -(to_y − from_y) for orientation, and dy_cartesian = -(curr_y − prev_y) for heading displacement, converting from image coordinates (y increases downward) to Cartesian (y increases upward).
  • The library's compute_polarization returns angles in the input coordinate system (image); the script negates these angles (-theta) when rendering on polar plots.
  • Polar plots follow math convention (0° = East, 90° = North, counter-clockwise positive).

Candidate segments are scored by a composite metric combining:

  • Polarization dynamics (orientation and heading summed independently): Rewards segments where polarization sweeps across a wide range with net directional change, plus a small bonus for higher mean.
    • Formula per metric: range × (0.3 + 0.7 × |end−start|/range) + 0.15 × mean
    • Final: orientation_score + heading_score
  • Collective sweep bonus (weight 8.0): Rewards segments where orientation and heading angles both undergo large angular displacement, scaled by directional agreement.
    • Formula: 8.0 × mean(|Δθ_orient|/π, |Δθ_heading|/π) × agreement
    • Agreement: same sign = 1.0, either near-zero = 0.6, opposite sign = 0.3
  • Collective motion bonus (weight 10.0): Rewards segments where heading polarization is high and the per-frame minimum displacement across individuals is large — ensuring every animal is visibly moving.
    • Formula: 10.0 × mean(heading_pol_window) × mean(min_disp_window) / speed_scale
    • Speed scale: max(75th percentile of all per-frame min-displacements, 1.0)

Together, these scoring components select for segments that best demonstrate the temporal evolution of collective alignment: visible movement, changing group coordination, and concordance between where animals point and where they go.


Visualizations of Other Datasets

2 Mice

2Mice polarization

4 Gerbils

4Gerbils polarization

5 Mice

5Mice polarization

2 Bees

2Bees polarization

@khan-u khan-u mentioned this pull request Mar 7, 2026
13 tasks
@khan-u khan-u changed the title feat: Add compute_polarization for collective behavior analysis feat: add compute_polarization() (#873) Mar 9, 2026
@khan-u khan-u changed the title feat: add compute_polarization() (#873) feat: add compute_polarization() to collective behavior metrics Mar 9, 2026
@khan-u khan-u force-pushed the add-collective-behavior-metrics branch 2 times, most recently from 64435ff to 52d8477 Compare March 18, 2026 01:51
@khan-u khan-u marked this pull request as ready for review March 18, 2026 12:38
@khan-u khan-u marked this pull request as draft April 7, 2026 05:45
@khan-u
Copy link
Copy Markdown
Author

khan-u commented Apr 7, 2026

Hey @niksirbi - I’ve held off on tagging you until my ideas felt more developed and cohesive around a clear high-level goal: extending movement with novel utility, especially for social neuroscience researchers, beyond what existing packages already provide. I’m sharing this after substantial thought and self-driven effort. I’m still relatively new to software engineering, but I’ve aimed to bring the same rigor I’ve learned from my mentors in this field while developing clean, maintainable code. At this stage, your feedback and guidance would be especially valuable to me as I continue the work. I’ve been enjoying it a lot.

Redesigned compute_polarization API

Further consideration for real-world experimental / research use cases (e.g. events/epochs, group subset, etc.)

Motivation for why this PR

Quantifying collective behavior requires knowing which way each animal faces. For orientation polarization, this means identifying the body axis - which keypoint is posterior, which anterior. For heading polarization, it means selecting a stable reference point to track displacement. In practice, both tasks require manual specification: researchers must know their skeleton's naming conventions, verify that "node_0" truly lies behind "node_5," and trust that these choices hold across individuals with varying postures or tracking quality.

This PR introduces an inference pipeline that resolves body-axis orientation from first principles. The approach rests on a simple observation: animals generally move head-first. By extracting the body's principal geometric axis and determining which end leads during locomotion, the system identifies anterior-posterior orientation without requiring species-specific anatomy or keypoint naming conventions. The same analysis that reveals orientation also ranks keypoints by their stability along the body axis - distinguishing nodes that track whole-body translation from those that reflect appendage motion. Both polarization types thus emerge from a single underlying inference.

The pipeline is being built to integrate directly with polarization computation. When users omit the body-axis pair, the system infers it. When users provide one, the system validates it against the inferred orientation and rejects incorrect orderings. When heading polarization needs a velocity node, the system supplies the most stable candidate. This makes multi-species studies practical: the same analysis code runs on flies, mice, and gerbils without per-skeleton configuration.

Threshold parameters governing the filter cascade are optimized via grid search, scored by their ability to correctly order ground-truth node pairs - keypoints with manually curated anatomical rankings - across 41 individuals spanning 10 datasets and 4 species. Each individual's skeleton is projected onto a shared reference axis derived from the dataset's most reliably moving individual. This tests whether thresholds generalize beyond the individual on which they were tuned, under the geometric variability that arises when different body shapes project onto a common axis.

Optimizing on ground-truth alone, however, risks overfitting: a configuration might exploit statistical regularities specific to the curated subset rather than capturing genuine anatomical structure. The critical validation therefore examines nodes that survive the lateral filter but lie outside the ground-truth set. These survivors were invisible to the optimization objective - their ordering contributed nothing to configuration selection. Correct placement of these held-out nodes, manually verified against species-specific anatomical expectation, is out-of-sample evidence that the cascade learned the body's true anterior-posterior structure. Systematic misordering would indicate the configuration memorized ground-truth-specific patterns without generalizing to the full skeleton.

Revised body_axis.py API complementing these changes

Revised API – Newer Comment

API Signature

compute_polarization(
    ds,
    polarization_type,
    frame_interval,
    individuals,
    velocity_node,
    body_axis_keypoints,
    displacement_frames,
    return_angle,
    in_degrees,
)

Parameters to be removed from current collective.py

validate_ap          → Integrated: AP validation now runs automatically
                       when body_axis_keypoints is provided (orientation/both)
                       or when velocity_node is auto-detected (heading/both).
                       No longer a user-facing toggle.

ap_validation_config → Integrated into body_axis public API.
                       infer_body_axis() and validate_ap_pair() accept
                       config internally via ValidateAPConfig defaults.

Function Workflow

compute_polarization(ds, ...)
│
│
├── Step 0: Resolve polarization_type
│   │   Resolved first because frame_interval rejection rules
│   │   and Step 3 branching both depend on it
│   │
│   ├── polarization_type: Literal["heading", "orientation", "both"]
│   │   Default: "both"
│   │
│   ├── "heading"    → Step 3 runs heading leg only
│   ├── "orientation"→ Step 3 runs orientation leg only
│   └── "both"       → Step 3 runs orientation leg then heading leg sequentially
│
│
├── Step 1: Resolve frame_interval
│   │   Scopes all downstream per-frame operations
│   │
│   ├── frame_interval: tuple[int, int, int] | None
│   │   Format: (start, end, step)
│   │
│   ├── Default: None
│   │   └── (first_valid, last_valid, 1)
│   │       └── "valid" = ≥2 individuals have non-NaN positions
│   │
│   └── Override:
│       │
│       ├── step only
│       │   ├── start/end remain auto-detected
│       │   └── Reject: step < 1
│       │
│       ├── start + end
│       │   └── Reject:
│       │       ├── start < 0
│       │       ├── end > total_frames
│       │       ├── start >= end (if resolved polarization_type in {"heading", "both"})
│       │       └── start > end (if resolved polarization_type == "orientation")
│       │
│       ├── start only → Reject: must provide end
│       └── end only → Reject: must provide start
│
│
├── Step 2: Resolve individuals
│   │   Determines which individuals contribute to the polarization metric
│   │
│   ├── individuals: list[str] | None
│   │
│   ├── Default: None → all individuals in dataset
│   │   └── Reject: dataset has < 2 individuals
│   │
│   └── Override: subset of individual names
│       └── Reject:
│           ├── len < 2
│           ├── any name not in dataset
│           └── any individual has NaN fraction > 0.6
│               └── NaN fraction computed over resolved frame_interval
│
│
├── Step 3: Resolve direction vectors
│   │   Branches on resolved polarization_type
│   │   Produces raw direction vector per individual per frame
│   │
│   │   NOTE: "both" runs the orientation leg and heading leg
│   │   sequentially. Each leg is described once below.
│   │
│   ├── Orientation leg (runs when polarization_type is "orientation" or "both")
│   │   │
│   │   ├── Resolve body_axis_keypoints → (posterior, anterior)
│   │   │   │
│   │   │   ├── body_axis_keypoints: tuple[str, str] | None
│   │   │   │   Format: (posterior_node, anterior_node)
│   │   │   │
│   │   │   ├── Provided
│   │   │   │   │
│   │   │   │   ├── Reject:
│   │   │   │   │   ├── either name not in dataset keypoints
│   │   │   │   │   └── posterior == anterior
│   │   │   │   │
│   │   │   │   ├── Validate ordering
│   │   │   │   │   └── validate_ap_pair(ds, posterior, anterior)
│   │   │   │   │       └── Reject: ordering doesn't match max R×M individual's AP ordering
│   │   │   │   │
│   │   │   │   └── Result: user pair, validated
│   │   │   │
│   │   │   └── None (default)
│   │   │       └── infer_body_axis(ds)
│   │   │           ├── Reject: dataset has < 2 keypoints
│   │   │           ├── Selects best individual (max R×M)
│   │   │           ├── Runs AP pipeline → filter cascade → top-scoring pair
│   │   │           └── Returns: InferenceResult
│   │   │               ├── posterior_keypoint, anterior_keypoint
│   │   │               ├── confidence_score (R×M)
│   │   │               └── pc1_vector, anterior_sign
│   │   │
│   │   └── Compute orientation vectors
│   │       ├── For each individual in resolved individuals:
│   │       │   └── For each frame in resolved frame_interval:
│   │       │       ├── Extract positions at (posterior, anterior)
│   │       │       └── Vector: posterior → anterior
│   │       └── Output: raw orientation vectors per individual per frame
│   │
│   └── Heading leg (runs when polarization_type is "heading" or "both")
│       │
│       ├── Resolve velocity_node → keypoint for displacement
│       │   │
│       │   ├── velocity_node: str | None
│       │   │
│       │   ├── Provided
│       │   │   ├── Reject: name not in dataset keypoints
│       │   │   ├── Skips AP pipeline entirely
│       │   │   └── Result: user-specified node
│       │   │
│       │   └── None (default)
│       │       └── infer_body_axis(ds)
│       │           ├── Runs same AP pipeline as orientation auto-detect
│       │           └── Extracts: sorted_candidate_nodes[0] from InferenceResult
│       │               └── Most stable AP-axis node (survived Step 1 lateral filter)
│       │
│       ├── Resolve displacement_frames
│       │   │
│       │   ├── displacement_frames: int
│       │   │   Default: 1
│       │   │
│       │   ├── Reject: not a positive integer
│       │   └── Reject: < 1
│       │
│       └── Compute heading vectors
│           ├── For each individual in resolved individuals:
│           │   └── For each frame in resolved frame_interval:
│           │       └── Displacement: position(t) − position(t − displacement_frames)
│           └── Output: raw heading vectors per individual per frame
│
│
├── Step 4: Compute polarization metric
│   │   Shared math, applied to each direction-vector set produced by Step 3
│   │   (once for "heading" or "orientation"; twice for "both")
│   │
│   ├── Normalize each direction vector to unit length
│   │   └── Zero-length or NaN vectors → excluded (not counted in N)
│   │
│   ├── Per frame:
│   │   ├── N = count of individuals with valid unit vectors
│   │   ├── Vector sum = Σ unit vectors across individuals
│   │   ├── Polarization = ‖vector sum‖ / N
│   │   └── Clip to [0, 1]; NaN if N = 0
│   │
│   └── Mean angle (conditional on return_angle)
│       │
│       ├── return_angle: bool
│       │   Default: False → skip angle computation entirely
│       │
│       ├── True:
│       │   ├── Angle of summed vector relative to +x axis
│       │   ├── NaN if N = 0 or summed vector has zero magnitude
│       │   │
│       │   └── in_degrees: bool
│       │       ├── Default: False → radians
│       │       └── True → convert to degrees
│       │
│       └── False:
│           └── No angle in output
│
│
└── Returns: xr.Dataset
    │
    ├── When resolved polarization_type == "orientation":
    │   ├── orientation_polarization: DataArray (time,)
    │   ├── mean_orientation_angle: DataArray (time,)          ← only if return_angle
    │   └── attrs:
    │       ├── posterior_node, anterior_node
    │       └── inference_method: "auto" | "user_provided"
    │
    ├── When resolved polarization_type == "heading":
    │   ├── heading_polarization: DataArray (time,)
    │   ├── mean_heading_angle: DataArray (time,)              ← only if return_angle
    │   └── attrs:
    │       ├── velocity_node
    │       ├── displacement_frames
    │       └── inference_method: "auto" | "user_provided"
    │
    └── When resolved polarization_type == "both":
        └── Union of orientation and heading outputs above

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant