Skip to content

Add best_direction() helper for canonical (kind, field) directions#212

Merged
iskandr merged 3 commits into
masterfrom
best-direction-helper
May 8, 2026
Merged

Add best_direction() helper for canonical (kind, field) directions#212
iskandr merged 3 commits into
masterfrom
best-direction-helper

Conversation

@iskandr
Copy link
Copy Markdown
Contributor

@iskandr iskandr commented May 8, 2026

Closes #211.

Summary

Surfaces the canonical "best direction" of a prediction field — whether higher or lower is better — as a queryable API alongside Kind and kind_support(). Half of this was already implicit in PeptideResult._best_by_score (max for score) and ._best_by_rank (min for percentile_rank); the new module-level best_direction(kind, field) makes the convention explicit and extends it to value, whose direction is kind-dependent (IC50 nM for pMHC_affinity → min; half-life for pMHC_stability → max).

This PR also promotes the previously-private _best_by_* helpers to a public, direction-aware API on PeptideResult.

New API in mhctools.pred (re-exported from mhctools)

FIELD_BEST_DIRECTIONS = {"score": "max", "percentile_rank": "min"}
VALUE_BEST_DIRECTIONS = {
    Kind.pMHC_affinity: "min",   # IC50 nM
    Kind.pMHC_stability: "max",  # half-life
}

def best_direction(kind, field) -> str: ...

class PeptideResult:
    def best_by(self, kind, field) -> Optional[Prediction]: ...
    def best_by_score(self, kind) -> Optional[Prediction]: ...
    def best_by_rank(self, kind) -> Optional[Prediction]: ...
    def best_by_value(self, kind) -> Optional[Prediction]: ...

best_direction() raises ValueError for field='value' on a kind without a registered direction, with a pointer to the table — the safer "fail loud" path for new value-bearing kinds.

The existing convenience accessors (affinity, presentation, best_affinity_by_rank, …) now route through the public methods. Internal _best_by_score / _best_by_rank are removed in favor of the unified best_by.

Why

Downstream aggregators (openvax/topiary#160's BestAlleleField; vaxrank tier signals) need this to do "best across alleles" or "best across methods" without replicating the table. Same architectural pattern as #210 — per-kind metadata next to Kind.

Behavior change to note

best_*_by_rank accessors now fall back to allele-less rank predictions when no allele-bearing rank prediction of that kind exists, matching the existing fallback in best_* (score) accessors. The old _best_by_rank returned None in that case. In practice rank predictions always come with an allele, so this should be invisible — but downstream consumers relying on "no allele ⇒ None for rank" should be aware. Pinned by test_best_by_rank_falls_back_to_allele_less.

Test plan

  • pytest tests/test_pred.py — 17 new tests covering: field-default constants, kind-specific value directions, best_direction() happy path for score/rank/value, value on unregistered kind raises, unknown field raises, accepts string-or-Kind input, best_by_* happy paths for all three fields, best_by_value raises for unregistered kinds, None-skipping, allele-less fallback for both score and rank, missing-kind returns None, string-kind input on best_by.
  • Full unit suite: 373 passed, 7 skipped (Python 3.10/3.11/3.12 + integration-netmhc all green in CI).

Bumps to 3.14.1.

iskandr added 3 commits May 8, 2026 12:07
Closes #211. Surfaces the canonical "best direction" of a prediction
field (max or min) as a queryable API alongside Kind / kind_support.
Half of this was already implicit in BindingPredictionCollection's
_best_by_score (max) and _best_by_rank (min); the new module-level
``best_direction(kind, field)`` makes the convention explicit and
extends it to ``value``, whose direction is kind-dependent (IC50 nM
for pMHC_affinity → min; half-life for pMHC_stability → max).

- ``FIELD_BEST_DIRECTIONS``: ``{"score": "max", "percentile_rank":
  "min"}`` — uniform across kinds.
- ``VALUE_BEST_DIRECTIONS``: ``{Kind.pMHC_affinity: "min",
  Kind.pMHC_stability: "max"}`` — extend per kind.
- ``best_direction(kind, field)``: lookup helper that raises
  ValueError for ``field='value'`` on an unregistered kind, with a
  pointer to the table.

Downstream consumers (topiary's BestAlleleField; vaxrank tier
signals) can drop their local direction tables and import from
mhctools instead.

7 new tests; full suite green. Bump to 3.14.0 (additive minor).
Adds a single direction-aware best_by(kind, field) that uses
best_direction() for max/min, plus thin best_by_score / best_by_rank /
best_by_value wrappers. The existing kind accessors (affinity,
presentation, best_affinity_by_rank, ...) now route through the public
methods.

Behavior changes:
- best_by_value(kind) raises ValueError for kinds without a registered
  value direction (consistent with best_direction()), since `value`
  semantics are kind-dependent.
- The no-allele fallback that used to live in _best_by_score is now
  uniform across fields: prefer predictions with an allele, fall back to
  allele-less if none have an allele, skip None values.

8 new tests; full suite green. Bump to 3.14.1.
Two completeness tests:
- best_by accepts string-form kinds (mirrors the existing best_direction
  string-kind test).
- best_*_by_rank now returns allele-less rank predictions when no
  allele-bearing rank pred exists. This is a behavior change vs the old
  _best_by_rank (which had no fallback) introduced when best_by was
  unified across fields. Pinning so it can't silently regress.
@iskandr iskandr merged commit b2285d0 into master May 8, 2026
4 checks passed
@iskandr iskandr deleted the best-direction-helper branch May 8, 2026 23:08
iskandr added a commit to openvax/topiary that referenced this pull request May 11, 2026
mhctools 3.14.1 ships ``best_direction(kind, field)`` (openvax/mhctools#212,
closes #211). Drop topiary's local ``_BEST_FIELD_DIRECTIONS`` /
``_BEST_VALUE_DIRECTIONS`` / ``_best_direction`` and import the
canonical helper instead — the direction table is mhctools-canonical
metadata and shouldn't live in two places.

- ``topiary/ranking/nodes.py``: replace the local table + helper with
  ``from mhctools import best_direction as _best_direction``.
- ``requirements.txt``: bump mhctools floor 3.13.7 → 3.14.1.
- ``tests/test_best_allele.py``: match the upstream error message
  ("best_direction undefined" instead of "best_value direction is
  undefined").
- Bump topiary 5.12.0 → 5.13.0 (additive minor — the mhctools floor
  bump tightens deps, but no public API change).
iskandr added a commit to openvax/topiary that referenced this pull request May 11, 2026
)

* Add best-allele aggregation accessors for haplotype-mode predictors

Issue #158 calls out that pMHC_presentation from haplotype-mode
predictors (MHCflurry's class1_presentation in haplotype mode) is
conceptually a multi-allele input → per-peptide answer, but mhctools
emits one row per (peptide, allele) so the DSL's per-allele grouping
returns per-allele scores by default. Add an explicit aggregator that
returns the best score across alleles plus the corresponding allele
attribution, gated behind new accessors so the default per-allele
behavior is preserved.

- BestAlleleField DSLNode: per-peptide max/min of (kind, field) across
  alleles, broadcast back to ctx.group_index. Direction by field:
  score=max, value=min, percentile_rank=min.
- KindAccessor properties: best_value, best_score, best_rank, plus
  *_allele variants returning the allele name string.
- DSL string parser: presentation['mhcflurry'].best_score and
  best_score_allele (and rank/value/percentile spellings) now parse to
  BestAlleleField.
- Refactor: factor _filter_kind_method_version helper out of Field.eval
  for reuse by BestAlleleField.
- Bump topiary to 5.12.0 (additive minor).

Use case (per #158): vaxrank's "presented anywhere" tier signal —
``Presentation['mhcflurry'].best_score`` returns the multi-allele
aggregate, ``best_score_allele`` returns the attribution.

* Address review: drop merge, per-kind directions, kind_support warnings

Four fixes against PR #160 review:

1. Replace the fragile ``__best__``/merge broadcast with
   ``ctx.group_index.droplevel('allele')`` + ``per_peptide.reindex()``.
   No intermediate `__best__` column, no positional-merge ordering
   assumption, fewer allocations. ``sub.copy()`` is gone too — work
   on the filter's already-fresh slice via index masks + ``.assign()``.

2. Replace the field-only ``_BEST_DIRECTIONS`` constant with a
   per-(kind, field) lookup ``_best_direction(kind, field)``. ``score``
   defaults max and ``percentile_rank`` defaults min for every kind,
   but ``value`` requires a per-kind entry — pMHC_affinity uses IC50
   nM (min), pMHC_stability would use half-life (max). Asking for
   ``best_value`` on a kind without a registered direction raises
   loudly with a pointer to the table.

3. Plumb ``kind_support`` through ``EvalContext``,
   ``apply_filter``, ``apply_sort``, and ``evaluate_scores``. When
   ``best_*`` is invoked against a (model, kind) reporting
   ``mhc_dependence='single_allele'``, emit a UserWarning that the
   result is the best per-allele score, not a joint multi-allele
   aggregate. No warning for ``'haplotype'`` (correct usage) or when
   ``kind_support`` is absent.

4. Update BestAlleleField docstring with the verified semantics:
   in haplotype mode mhctools emits a single row per peptide with
   ``allele=`` MHCflurry's deconvolved ``best_allele``, so the
   aggregator is a no-op there; in per-allele mode it returns the
   best per-allele score (not joint).

11 new tests covering: per-kind direction table (best_value undefined
on presentation raises; affinity uses min; score works for any kind),
kind_support warning on single_allele / no warning on haplotype /
no-op without metadata / method substring matching, broadcast
ignores input row order, ties pick first deterministically, and
apply_filter integration (vaxrank's "presented anywhere" use case)
including warning forwarding.

Full suite: 1318 passed, 3 skipped.

* Use mhctools.best_direction; bump floor to 3.14.1

mhctools 3.14.1 ships ``best_direction(kind, field)`` (openvax/mhctools#212,
closes #211). Drop topiary's local ``_BEST_FIELD_DIRECTIONS`` /
``_BEST_VALUE_DIRECTIONS`` / ``_best_direction`` and import the
canonical helper instead — the direction table is mhctools-canonical
metadata and shouldn't live in two places.

- ``topiary/ranking/nodes.py``: replace the local table + helper with
  ``from mhctools import best_direction as _best_direction``.
- ``requirements.txt``: bump mhctools floor 3.13.7 → 3.14.1.
- ``tests/test_best_allele.py``: match the upstream error message
  ("best_direction undefined" instead of "best_value direction is
  undefined").
- Bump topiary 5.12.0 → 5.13.0 (additive minor — the mhctools floor
  bump tightens deps, but no public API change).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surface canonical "best direction" for (kind, field) — needed by downstream aggregators

1 participant