Skip to content

Refactor: instruments have sensors#328

Open
j-atkins wants to merge 58 commits intomainfrom
refactor-sensors
Open

Refactor: instruments have sensors#328
j-atkins wants to merge 58 commits intomainfrom
refactor-sensors

Conversation

@j-atkins
Copy link
Copy Markdown
Collaborator

@j-atkins j-atkins commented Apr 10, 2026

Overview

This PR centralises/abstracts instrument sensor/variable sampling logic so that each instrument declares which sensors it carries (e.g. TEMPERATURE, SALINITY, VELOCITY). Users can configure which sensors are active for each instrument in the expedition YAML.

This helps pave the way for easy addition of BGC sensors to the Argo float in a future PR (#234), and consolidation of CTD + CTD_BGC into a single instrument (#260) with a combined sensor allowlist. Also makes it straightforward to add new sensors to any instrument in the future (e.g., #312, #313), and streamlines them process for developers to add new instruments (i.e. #237)

Major changes

  • sensors.py in instruments/ defines the SensorType class and per-instrument allowlists (so that there is control over which sensors each instrument supports and users cannot configure unsupported sensors).
  • New SensorConfig pydantic model and sensors field in every instrument config in expedition.py.
  • New SensorRegistry in utils.py that maps each SensorType to its FieldSet key, Copernicus variable name, category (phys/bgc), and Parcels particle variable name(s).
    • Per-instrument parcticle classes are now built dynamically at runtime based on which sensors are active, but the fixed/mechanical variables are still hard-coded in the instrument files, e.g. cycle_phase for Argo Floats.

API change

As mentioned above, the instruments config section of the expedition YAML now has a sensors list field, where users specify which sensors are active. For example, below the CTD is configured to sample TEMPERATURE and SALINITY:

ctd_config:
  stationkeeping_time_minutes: 50
  min_depth_meter: -11
  max_depth_meter: -2000
  sensors:
    - TEMPERATURE
    - SALINITY
  • If the sensors list is omitted, it default to all valid sensors for that instrument.
  • By using allowlists for each instrument, it will not allow non-sensical sensor combinations (e.g. BGC sensors on an ADCP).
  • Will also not allow an empty sensor list, at least one sensor must be active.

Additional change

  • Argo Float sampling kernels have been separated from the vertical-movement kernel, making it easier to add BGC sensors in a future PR.

Follow-up PRs

  • The plan CLI tool will need updating to account for sensor configuration options. Currently not broken but doesn't give option to configure sensors. (New issue to be opened)
  • Docs update to clearly communicate which sensors each instrument accepts, and how to configure them in either the plan tool or the expedition YAML. (New issue to be opened)
  • Merge CTD + CTD_BGC into a single instrument. (Unify CTD and CTD_BGC to one instrument #260)
  • Add BGC sampling to Argo Floats (New ARGO_BGC instrument #234)

Tests

  • Update existing tests and add new tests to cover new sensor logic

j-atkins added 30 commits March 25, 2026 13:53
… kernels from the argo vertical movement kernel to enable easier scalability
@j-atkins j-atkins marked this pull request as ready for review April 10, 2026 13:34
Copy link
Copy Markdown
Member

@erikvansebille erikvansebille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few comments below

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to still be a lot of duplication between CTD and CTD_BGC. Was the intention for this PR not to remove that duplication/ Or will that come in a next PR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed it will come in the next PR... the CTD_BGC will soon be removed but I decided not to do it in this PR so as to keep it focused

Variable("lifetime", dtype=np.float32),
]
)
_DRIFTER_FIXED_VARIABLES = [
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether FIXED is the right term here (and in other instruments). Suggests they are fixed to the instrument. Would NONSENSOR be a better term? Or do you have another term?

Comment on lines +9 to +18
TEMPERATURE = "TEMPERATURE"
SALINITY = "SALINITY"
VELOCITY = "VELOCITY"
OXYGEN = "OXYGEN"
CHLOROPHYLL = "CHLOROPHYLL"
NITRATE = "NITRATE"
PHOSPHATE = "PHOSPHATE"
PH = "PH"
PHYTOPLANKTON = "PHYTOPLANKTON"
PRIMARY_PRODUCTION = "PRIMARY_PRODUCTION"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this list needed? Can we do without? Seems like an extra place to change when adding a sensor..

Copy link
Copy Markdown
Collaborator Author

@j-atkins j-atkins Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep it just because it's a lightweight way of adding a bit of consistency/convention across the codebase and helps with some things like quickly catching typos in the YAML config

Comment on lines +51 to +54
_ST_SENSOR_KERNELS: dict[SensorType, callable] = {
SensorType.TEMPERATURE: _sample_temperature,
SensorType.SALINITY: _sample_salinity,
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why explicitly mention them here, if they are also part of sensors.py? Seems a duplication?

import yaml

from virtualship.errors import InstrumentsConfigError, ScheduleError
from virtualship.instruments.sensors import (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not import all? So that if a new sensor is added, it can automatically also be imported here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, are you suggesting from virtualship.instruments import sensors? I think that would be a good idea :)

"""
FieldSet-key → Copernicus-variable mapping for enabled sensors.

VELOCITY is a special case: one sensor provides two FieldSet variables (U and V).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in Parcels, it should sample fieldset.UV, which is one Field?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Overcomplicated things here unnecessarily

],
description=(
"Sensors fitted to the BGC CTD. "
"Supported: CHLOROPHYLL, NITRATE, OXYGEN, PH, PHOSPHATE, PHYTOPLANKTON, PRIMARY_PRODUCTION. "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this list the same order as the list above? Otherwise, more difficult to see whether the list is complete

Comment on lines +586 to +598
@pydantic.field_validator("sensors", mode="after")
@classmethod
def _check_sensor_compatibility(cls, value) -> list[SensorConfig]:
return _check_sensor_compatibility(value, XBT_SUPPORTED_SENSORS, "XBT")

@pydantic.field_serializer("sensors")
def _serialize_sensors(self, value: list[SensorConfig], _info):
return _serialize_sensor_list(value)

def active_variables(self) -> dict[str, str]:
"""FieldSet-key → Copernicus-variable mapping for enabled sensors."""
return build_variables_from_sensors(self.sensors)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these functions required for every instrument? Can't they be reused?

Comment thread src/virtualship/utils.py
Comment on lines +100 to +103
particle_vars=[
"U",
"V",
], # two particle variables associated with one sensor
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not UV?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to add so much more unit testing!

Copy link
Copy Markdown
Collaborator

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far! Really like the direction that this is going. I have a bunch of comments, some quick fixes and others more cenceptual that I think would be good to go over together.

Comment thread src/virtualship/utils.py
Comment on lines +64 to +66
class _SensorMeta:
fs_key: str # map to Parcels fieldset variables
copernicus_var: str # map to Copernicus Marine Service variable names
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to rename this class to _Sensor, and to include the type in the class itself

Suggested change
class _SensorMeta:
fs_key: str # map to Parcels fieldset variables
copernicus_var: str # map to Copernicus Marine Service variable names
class _Sensor:
type_: SensorType
fs_key: str # map to Parcels fieldset variables
copernicus_var: str # map to Copernicus Marine Service variable names

Then the SENSOR_REGISTRY can become:

SENSOR_REGISTRY: dict[SensorType, _Sensor] = {s.type_: s for s in [
    ... # list of sensors
]} 

making it easy to look up sensors, but also making it so that if a sensor its chosen it's easy to see what type it is again.

PRIMARY_PRODUCTION = "PRIMARY_PRODUCTION"


# per-instrument allowlists of supported sensors (source truth for validation for which sensors each instrument supports)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit confusing the structure of the classes here - perhaps this is something worth drawing up so that we have some clarity?

We have an Instrument class

@register_instrument(InstrumentType.ADCP)
class ADCPInstrument(Instrument):
    """ADCP instrument class."""

    def __init__(self, expedition, from_data):
        """Initialize ADCPInstrument."""
        variables = expedition.instruments_config.adcp_config.active_variables()
        limit_spec = {
            "spatial": True
        }  # spatial limits; lat/lon constrained to waypoint locations + buffer

        super().__init__(
            expedition,
            variables,
            add_bathymetry=False,
            allow_time_extrapolation=True,
            verbose_progress=False,
            spacetime_buffer_size=None,
            limit_spec=limit_spec,
            from_data=from_data,
        )

But I feel that there's a lot of details in this class that aren't really related to what an instrument is, but is more glue code to get it working in VirtualShip

From my POV, its best to design things close to the physical domain. An Instrument instance has a set of sensors installed on the machine. These sensors need to match up with a list of allowed sensors (which are defined on the class, e.g., via a method .get_allowed_sensors() and a check in the __init__ that it adheres - this would remove all the _check_sensor_compatibility methods that are currently here). Then each sensor has its own details potentially related to that sensor.

Maybe it would be good for us to sit down and diagram out the abstractions here. Part of that might be brainstorming how the interface between the configs and model code looks like.

Comment thread src/virtualship/utils.py


@dataclass(frozen=True)
class _SensorMeta:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to also add the kernels to this Sensor class? I see a bunch of mappings across the codebase relating the sensors to the kernels (where a lot of them are the same, as Erik mentioned in #328 (comment) ).

I think it would be a big win if we can put the kernels in the sensor class! I don't know how easy it would be since the right kernel might depend on more than just the choice of sensor. If that's the case - what does choosing the right kernel depend on?

Comment thread src/virtualship/utils.py
Comment on lines +58 to +60
# =====================================================
# SECTION: sensor and variable metadata and registries
# =====================================================
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to sensors.py? The class and the SENSOR_REGISTRY?

Comment thread src/virtualship/utils.py
category: Literal[
"phys", "bgc"
] # physical vs. biogeochemical variable, used for product ID selection logic
particle_vars: list[str] # particle variable name(s) produced by this sensor
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is only used in

sensor_variables = [
Variable(var_name, dtype=np.float32, initial=np.nan)
for sc in sensors
if sc.enabled
for var_name in sc.meta.particle_vars
]
. Perhaps it would be clearer instead of these being strings for them to explicitly be lists of parcels.Variable objects?

Also, for my own understanding, how does this relate to (e.g.)

_ARGO_FIXED_VARIABLES = [
Variable("cycle_phase", dtype=np.int32, initial=0.0),
Variable("cycle_age", dtype=np.float32, initial=0.0),
Variable("drift_age", dtype=np.float32, initial=0.0),
Variable("min_depth", dtype=np.float32),
Variable("max_depth", dtype=np.float32),
Variable("drift_depth", dtype=np.float32),
Variable("vertical_speed", dtype=np.float32),
Variable("cycle_days", dtype=np.int32),
Variable("drift_days", dtype=np.int32),
Variable("grounded", dtype=np.int32, initial=0),
]
? Are the particle variables here "extra ones" needed for the sensor and to be captured as output, whereas the ones in _ARGO_FIXED_VARIABLES are for the base behaviour of the instrument?

Comment on lines +43 to +45
_ADCP_SENSOR_KERNELS: dict[SensorType, callable] = {
SensorType.VELOCITY: _sample_velocity,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type hint isn't correct

Suggested change
_ADCP_SENSOR_KERNELS: dict[SensorType, callable] = {
SensorType.VELOCITY: _sample_velocity,
}
from collections.abc.Callable # put at top of file
_ADCP_SENSOR_KERNELS: dict[SensorType, Callable] = {
SensorType.VELOCITY: _sample_velocity,
}

Great use of typing across the PR - I think having this typing is really useful so that it gets us to think more about the structure of the codebase. Currently I don't think typechecking is run in CI (or if it is, we currently have a lot of failures). Should we map out a path forward for enabling typechecking across the codebase as well? The types are only really useful if they're enforced (something I'm working on in Parcels as well).

@@ -50,6 +53,11 @@ def _sample_temperature(particle, fieldset, time):
particle.temperature = fieldset.T[time, particle.depth, particle.lat, particle.lon]


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about the callable type annotation - perhaps worth find-all'ing across the codebase for , callable] so all these are replaced

import yaml

from virtualship.errors import InstrumentsConfigError, ScheduleError
from virtualship.instruments.sensors import (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, are you suggesting from virtualship.instruments import sensors? I think that would be a good idea :)

Comment thread src/virtualship/utils.py

def build_particle_class_from_sensors(
sensors: list[SensorConfig],
fixed_variables: list,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fixed_variables: list,
fixed_variables: list[Variable],

Comment thread src/virtualship/utils.py
def build_particle_class_from_sensors(
sensors: list[SensorConfig],
fixed_variables: list,
particle_class: type,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick. Feel free to ignore this suggestion if you don't find it useful

Suggested change
particle_class: type,
particle_class: type, # generic type annotation needed for v3 particle class behaviour # TODO: Update with Parcels v4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants