feat(embed): add MOMENT-1 embedding models (small/base/large) (#795)

hombit · claude · web-flow · commit 3000aef51d14 · 2026-06-26T19:41:24.000Z
* feat(embed): add MOMENT-1 embedding models (small/base/large) Add `light_curve.embed.Moment1`, an ONNX-backed univariate (magnitude-only) wrapper for the MOMENT-1 time-series foundation model (Goswami et al. 2024, MIT), exposing `mean` / `sequence` outputs in small/base/large sizes (512/768/1024-dim). Like the Chronos models it discards timestamps and treats observations as sequentially ordered, but uses a fixed 512-observation context (64 patches of 8) rather than a dynamic sequence axis. `size` is a required `from_hf` argument. Because the context length is fixed, all reduction windows are batched into a single ONNX call; the `sequence` output is restricted to single-window reductions. Also bump the prep-models test submodule to the commit that adds the MOMENT export + reference test data, and add tests, API/docs pages, and a CHANGELOG entry. The Python pipeline was validated locally against the prep-models reference parquet (cosine similarity 1.000000 for all three sizes). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KEKNjm8oSYzw276okfggzc * test(embed): bump prep-models to merged MOMENT commit, full reference rows The MOMENT prep-models PR merged to main and the reference test-data parquets were regenerated with the full 10 samples. Bump the prep-models submodule 10eda06→a35f0d1 (was the unmerged feat/moment1 commit) and run the reference test over all 10 rows, matching the Chronos tests. Verified locally: all MOMENT tests pass against the now-published HuggingFace models (light-curve/moment1-{small,base,large}); cosine similarity 1.000000. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KEKNjm8oSYzw276okfggzc --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,7 +9,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
---
+- `light_curve.embed.Moment1`: ONNX-backed univariate (magnitude-only) MOMENT-1 time-series
+  foundation model (Goswami et al. 2024, MIT license), exposing `mean` / `sequence` outputs.
+  Available in small/base/large sizes (512/768/1024-dim) via `Moment1.from_hf(size=...)`; uses a
+  fixed 512-observation context (64 patches of 8).
 
 ### Changed
 
diff --git a/docs/embed/api.md b/docs/embed/api.md
@@ -42,6 +42,11 @@
       inherited_members: true
       members:
         - from_hf
+::: light_curve.embed.Moment1
+    options:
+      inherited_members: true
+      members:
+        - from_hf
 
 ## Reduction strategies
 
diff --git a/docs/embed/index.md b/docs/embed/index.md
@@ -31,6 +31,7 @@ If you already have the ONNX model file locally, `huggingface_hub` is not requir
 | `ATCAT` | 6 (ugrizY jointly) | time, flux, flux\_err, band index | 384 | ELAsTiCC |
 | `Chronos2` | single (magnitude only) | mag | 768 | time-series corpus |
 | `ChronosBolt` | single (magnitude only) | mag | 256–768 | time-series corpus |
+| `Moment1` | single (magnitude only) | mag | 512–1024 | Time-series Pile |
 
 ## Single-band: Astromer2
 
@@ -210,6 +211,29 @@ Series longer than the native context (8192 for Chronos 2, 2048 for
 Chronos-Bolt) are reduced first; the default `reduction="end"` keeps the most
 recent observations.
 
+## Single-band: MOMENT-1
+
+[MOMENT](https://huggingface.co/AutonLab/MOMENT-1-base) is a T5-based time-series
+foundation model.  Like Chronos it embeds a **magnitude sequence only** with
+timestamps discarded.  It comes in three sizes (`small`/`base`/`large` →
+512/768/1024-dim) and uses a **fixed** 512-observation context (64 patches of 8):
+
+```python
+import numpy as np
+from light_curve.embed import Moment1
+
+rng = np.random.default_rng(7)
+mag = rng.normal(18.0, 0.3, 150).astype(np.float32)  # chronological order
+
+model = Moment1.from_hf(size="base", output="mean")
+embedding = model(mag)
+print(embedding.shape)  # (1, 1, 1, 768)
+```
+
+Light curves longer than 512 observations are reduced first; the default
+`reduction="end"` keeps the most recent 512.  The `"sequence"` output always has
+64 patches and supports only single-window reductions.
+
 ## GPU and alternative runtimes
 
 Pass `ort_session_kwargs` to select an execution provider:
diff --git a/light-curve/light_curve/embed/__init__.py b/light-curve/light_curve/embed/__init__.py
@@ -4,6 +4,7 @@
 from .atcat import ATCAT
 from .chronos import Chronos2, ChronosBolt
 from .model import EmbeddingSession, SingleBandModel
+from .moment import Moment1
 from .reduction import (
     Beginning,
     End,
@@ -27,6 +28,7 @@
     "EmbeddingSession",
     "End",
     "Middle",
+    "Moment1",
     "MultipleReductions",
     "NonOverlappingWindows",
     "RandomSubsample",
diff --git a/light-curve/light_curve/embed/moment.py b/light-curve/light_curve/embed/moment.py
@@ -0,0 +1,260 @@
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING, Literal
+
+import numpy as np
+from numpy.typing import ArrayLike
+
+from light_curve.embed.input_tensors import InputTensors
+from light_curve.embed.model import (
+    SingleBandModel,
+    _hf_hub_download_cached,
+    create_onnx_session,
+)
+from light_curve.embed.reduction import Reduction
+
+if TYPE_CHECKING:
+    from typing import Self
+
+    import onnxruntime as ort
+
+# MOMENT has a fixed 512-step context split into 64 non-overlapping patches of 8.
+_SEQ_LEN = 512
+_PATCH_SIZE = 8
+
+
+@dataclass
+class MomentInputs(InputTensors):
+    """Input tensors for MOMENT-1 models.
+
+    Attributes
+    ----------
+    mag : ndarray, shape ``(n_subsamples, seq_size)``
+        Per-subsample magnitudes, zero-padded to the reduction's ``seq_size``.
+        The actual model context (left NaN-padded to the fixed 512-step window)
+        is built per subsample at inference time from the valid entries.
+    bool_mask : ndarray, shape ``(n_subsamples, seq_size)``
+        Boolean validity — ``True`` for real observations, ``False`` for padding.
+    """
+
+    mag: np.ndarray = field(kw_only=True)
+
+
+class Moment1(SingleBandModel):
+    """MOMENT-1 univariate light-curve embedding model.
+
+    A T5-based time-series foundation model (Goswami et al. 2024) pretrained with
+    a masked-reconstruction objective on the Time-series Pile.  It embeds a single
+    univariate magnitude series: timestamps are discarded and observations are
+    treated as sequentially ordered (the same convention used for the Chronos
+    models).  The series is capped to the most recent 512 observations and
+    left-padded with NaN to that fixed window; reversible instance normalisation
+    (RevIN) is applied internally by the model.
+
+    The model comes in three sizes with different embedding dimensions: ``small``
+    (512), ``base`` (768), and ``large`` (1024).  Unlike Chronos, the context
+    length is fixed at 512 observations (64 patches of 8), not a dynamic axis.
+
+    The ONNX models are hosted on HuggingFace at
+    ``https://huggingface.co/light-curve/moment1-<size>``.
+
+    Use :meth:`from_hf` (with ``size=``) to download and load the model.
+
+    Model license
+    -------------
+    MIT (upstream AutonLab/MOMENT-1 license).
+
+    References
+    ----------
+    Goswami et al. (2024), *MOMENT: A Family of Open Time-series Foundation
+    Models*, ICML 2024.  https://huggingface.co/AutonLab/MOMENT-1-base
+
+    Parameters
+    ----------
+    session :
+        ONNX inference session for the MOMENT-1 model file.
+    size : {"small", "base", "large"}
+        Which model size this session corresponds to (sets ``embed_dim``).
+    output : str, optional
+        ``"mean"`` (default) or ``"sequence"``.
+    reduction : str, list of str, or Reduction, optional
+        Observation-selection strategy for light curves longer than 512.
+        Defaults to ``"end"``.
+    reduction_kwargs : dict, optional
+        Extra keyword arguments forwarded to :func:`reduction_from_str`.
+    """
+
+    patch_size: int = _PATCH_SIZE
+    seq_len: int = _SEQ_LEN
+    max_obs: int = _SEQ_LEN
+    model_outputs: frozenset[str] = frozenset({"mean", "sequence"})
+    _EMBED_DIMS: dict[str, int] = {"small": 512, "base": 768, "large": 1024}
+
+    def __init__(
+        self,
+        session: ort.InferenceSession,
+        *,
+        size: Literal["small", "base", "large"],
+        output: Literal["mean", "sequence"] = "mean",
+        reduction: str | list[str] | Reduction = "end",
+        reduction_kwargs: dict[str, object] | None = None,
+    ) -> None:
+        if size not in self._EMBED_DIMS:
+            raise ValueError(f"Unknown size '{size}'. Must be one of: {', '.join(sorted(self._EMBED_DIMS))}")
+        self.size = size
+        self.embed_dim = self._EMBED_DIMS[size]
+        self.hf_repo = f"light-curve/moment1-{size}"
+        self.hf_filename = f"moment1-{size}.onnx"
+        super().__init__(
+            session,
+            bands=None,
+            reduction=reduction,
+            reduction_kwargs=reduction_kwargs,
+        )
+        if output not in self.model_outputs:
+            raise ValueError(f"Unknown output '{output}'. Must be one of: {', '.join(sorted(self.model_outputs))}")
+        self.output = output
+
+    @classmethod
+    def from_hf(
+        cls,
+        size: str,
+        output: str = "mean",
+        *,
+        reduction: str | list[str] | Reduction = "end",
+        reduction_kwargs: dict[str, object] | None = None,
+        ort_session_kwargs: dict[str, object] | None = None,
+    ) -> Self:
+        """Load a MOMENT-1 model of the given ``size`` from the HuggingFace Hub.
+
+        Downloads (and caches) the ONNX model file, creates an
+        ``onnxruntime.InferenceSession``, and returns a ready-to-use instance.
+
+        Parameters
+        ----------
+        size : {"small", "base", "large"}
+            Model size to load.  Required: the sizes have different embedding
+            dimensions, so there is no meaningful default.
+        output : str, optional
+            Named ONNX output to return: ``"mean"`` (default, masked mean pool
+            over valid patches → ``(..., 1, embed_dim)``) or ``"sequence"``
+            (per-patch encoder states → ``(..., 64, embed_dim)``).
+        reduction : str, list of str, or Reduction, optional
+            Observation-selection strategy for light curves longer than 512.
+            Defaults to ``"end"`` (the most recent 512 observations, matching the
+            model's native right-aligned context).
+        reduction_kwargs : dict or None, optional
+            Extra keyword arguments forwarded to :func:`reduction_from_str`.
+        ort_session_kwargs : dict or None, optional
+            Keyword arguments forwarded to ``onnxruntime.InferenceSession``.
+
+        Returns
+        -------
+        Moment1
+            Instance with a live ONNX inference session.
+
+        Raises
+        ------
+        ValueError
+            If ``size`` or ``output`` is not recognised.
+        ImportError
+            If ``huggingface_hub`` or an ``onnxruntime`` variant is missing.
+        """
+        if size not in cls._EMBED_DIMS:
+            raise ValueError(f"Unknown size '{size}'. Must be one of: {', '.join(sorted(cls._EMBED_DIMS))}")
+        model_path = _hf_hub_download_cached(f"light-curve/moment1-{size}", f"moment1-{size}.onnx")
+        session = create_onnx_session(model_path, **(ort_session_kwargs or {}))
+        return cls(
+            session=session,
+            size=size,
+            output=output,
+            reduction=reduction,
+            reduction_kwargs=reduction_kwargs,
+        )
+
+    def __call__(self, mag: ArrayLike) -> np.ndarray:
+        """Embed a magnitude series.
+
+        Parameters
+        ----------
+        mag : array-like, shape ``(n,)``
+            Magnitudes in chronological order.  Timestamps are not used by the
+            model, which treats observations as sequentially ordered.
+
+        Returns
+        -------
+        np.ndarray, shape ``(1, n_subsamples, seq_size, embed_dim)``
+            Embedding tensor.  ``seq_size`` is 1 for ``"mean"`` and 64 (the
+            number of patches) for ``"sequence"``.
+        """
+        return super().__call__(mag)
+
+    def preprocess_lc(self, mag: ArrayLike) -> MomentInputs:
+        """Select observations per the reduction; padding to the fixed window is deferred.
+
+        Parameters
+        ----------
+        mag : array-like, shape ``(n,)``
+            Magnitudes in chronological order.
+
+        Returns
+        -------
+        MomentInputs
+        """
+        mag = np.asarray(mag, dtype=np.float32)
+        mag_win, bool_mask = self.reduction.preprocess_lc(mag, seq_size=self.max_obs)
+        return MomentInputs(bool_mask=bool_mask, mag=mag_win.astype(np.float32))
+
+    def _context(self, mag: np.ndarray) -> np.ndarray:
+        """Left-pad valid magnitudes with NaN to the fixed 512-step window."""
+        mag = mag[-self.seq_len :]
+        n = mag.shape[0]
+        context = np.full((1, self.seq_len), np.nan, dtype=np.float32)
+        context[0, self.seq_len - n :] = mag
+        return context
+
+    def predict_tensors(self, tensors: MomentInputs) -> np.ndarray:
+        """Run the ONNX model per subsample and return reduced embeddings.
+
+        Because MOMENT's context length is fixed (512), all subsamples share the
+        same shape and are batched into a single ONNX call.
+
+        Parameters
+        ----------
+        tensors : MomentInputs
+            As returned by :meth:`preprocess_lc`.
+
+        Returns
+        -------
+        np.ndarray, shape ``(n_subsamples, seq_size, embed_dim)``
+            Embeddings after applying the reduction's aggregation.  ``seq_size``
+            is 1 for ``"mean"`` and 64 for ``"sequence"``.
+
+        Raises
+        ------
+        ValueError
+            For the ``"sequence"`` output with a multi-window reduction: the
+            reduction's per-window aggregation operates in observation space,
+            which does not align with the fixed 64-patch sequence.
+        """
+        n_subsamples = tensors.bool_mask.shape[0]
+        if self.output == "sequence" and n_subsamples != 1:
+            raise ValueError(
+                "The 'sequence' output supports only single-subsample reductions for MOMENT "
+                "(per-window aggregation operates in observation space, which does not align "
+                "with the fixed 64-patch sequence)."
+            )
+
+        contexts = np.concatenate(
+            [self._context(tensors.mag[i][tensors.bool_mask[i]]) for i in range(n_subsamples)],
+            axis=0,
+        )  # (n_subsamples, 512)
+        (raw,) = self.session.run([self.output], {"context": contexts})
+        # mean: (n_subsamples, embed_dim); sequence: (n_subsamples, 64, embed_dim)
+
+        if self.output == "mean":
+            embeddings = raw[:, np.newaxis, :]  # (n_subsamples, 1, embed_dim)
+        else:
+            embeddings = raw  # (1, 64, embed_dim)
+        return self.reduction.reduce_embeddings(embeddings, tensors, output=self.output)
diff --git a/light-curve/tests/embed/test_moment.py b/light-curve/tests/embed/test_moment.py
diff --git a/light-curve/tests/prep-models b/light-curve/tests/prep-models