Skip to content

Commit 1f0a7d9

Browse files
Barseq blackbox acquisition (#1781)
* First pass at minimal blackbox acquisition. Has file that generates acquisition for two subjects, plus json outputs * Added reference to output files * Added specimen IDs to barseq acquisitions * Removed readme file * Updating examples/barseq_acquisition.py docstring * Cleaned up comments in examples/barseq_acquisition.py * Set timestamps to pacific with offset * Changed experimenters to 'Barseq team' * Removed references to local paths, removed duplicate jsons * Added ExternalDataStream, made instrument optional, added test, updated example barseq * Updated example json outputs for barseq subjects * update docs [skip actions] * Linting fixes * Used DiscriminatedList * Clean up stream adding logic * Simplify barseq acquisition example * Removed jsons * clean up test and import properly * Add ExternalDataStream support to Metadata validators and test that no instrument warning is raised for external acquisitions. * Wording updates as suggested by Dan --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
1 parent f0c93ee commit 1f0a7d9

6 files changed

Lines changed: 160 additions & 12 deletions

File tree

docs/source/acquisition.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,13 +85,13 @@ while the StimulusEpoch represents all stimuli being presented.
8585
| `experimenters` | `List[str]` | experimenter(s) |
8686
| `protocol_id` | `Optional[List[str]]` | Protocol ID (DOI for protocols.io) |
8787
| `ethics_review_id` | `Optional[List[str]]` | Ethics review ID |
88-
| `instrument_id` | `str` | Instrument ID (Should match the Instrument.instrument_id) |
88+
| `instrument_id` | `Optional[str]` | Instrument ID (Should match the Instrument.instrument_id. Required when instrument metadata is available.) |
8989
| `acquisition_type` | `str` | Acquisition type (Descriptive string detailing the type of acquisition, should be consistent across similar acquisitions for the same experiment.) |
9090
| `notes` | `Optional[str]` | Notes |
9191
| `coordinate_system` | Optional[[CoordinateSystem](components/coordinates.md#coordinatesystem)] | Coordinate system (Origin and axis definitions for determining the configured position of devices during acquisition. Required when coordinates are provided within the Acquisition) |
9292
| `calibrations` | List[[Calibration](components/measurements.md#calibration) or [VolumeCalibration](components/measurements.md#volumecalibration) or [PowerCalibration](components/measurements.md#powercalibration)] | Calibrations (List of calibration measurements taken prior to acquisition.) |
9393
| `maintenance` | List[[Maintenance](components/measurements.md#maintenance)] | Maintenance (List of maintenance on instrument prior to acquisition.) |
94-
| `data_streams` | List[[DataStream](acquisition.md#datastream)] | Data streams (A data stream is a collection of devices that are acquiring data simultaneously. Each acquisition can include multiple streams. Streams should be split when configurations are changed.) |
94+
| `data_streams` | List[[DataStream](acquisition.md#datastream) or [ExternalDataStream](acquisition.md#externaldatastream)] | Data streams (A data stream is a collection of devices that are acquiring data simultaneously. Each acquisition can include multiple streams. Streams should be split when configurations are changed. Use ExternalDataStream for acquisitions where instrument metadata is unavailable.) |
9595
| `stimulus_epochs` | List[[StimulusEpoch](acquisition.md#stimulusepoch)] | Stimulus (A stimulus epoch captures all stimuli being presented during an acquisition. Epochs should be split when the purpose of the stimulus changes.) |
9696
| `manipulations` | List[[Manipulation](acquisition.md#manipulation)] | Manipulations (Procedures performed during the acquisition.) |
9797
| `subject_details` | Optional[[AcquisitionSubjectDetails](acquisition.md#acquisitionsubjectdetails)] | Subject details (Required for in vivo acquisitions.) |
@@ -131,6 +131,18 @@ same time.
131131
| `connections` | List[[Connection](components/connections.md#connection)] | Connections (Connections are links between devices that are specific to this acquisition (i.e. not already defined in the Instrument)) |
132132

133133

134+
### ExternalDataStream
135+
136+
A simplified data stream for acquisitions where instrument metadata is unavailable.
137+
138+
| Field | Type | Title (Description) |
139+
|-------|------|-------------|
140+
| `stream_start_time` | `datetime (timezone-aware)` | Stream start time |
141+
| `stream_end_time` | `datetime (timezone-aware)` | Stream stop time |
142+
| `modalities` | List[[Modality](aind_data_schema_models/modalities.md#modality)] | Modalities (Modalities that are acquired in this stream) |
143+
| `notes` | `Optional[str]` | Notes |
144+
145+
134146
### Manipulation
135147

136148
Description of procedures performed during an acquisition.

examples/barseq_acquisition.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
"""Example of a BARseq acquisition using ExternalDataStream.
2+
3+
BARseq is acquired by the AIBS Molecular Anatomy team using an instrument
4+
and workflows not yet fully documented in the AIND schema. ExternalDataStream
5+
is used here to record acquisition metadata without requiring full instrument
6+
configuration details, treating the data as if it were acquired externally.
7+
"""
8+
9+
import argparse
10+
from datetime import datetime
11+
from zoneinfo import ZoneInfo
12+
13+
from aind_data_schema_models.modalities import Modality
14+
15+
from aind_data_schema.core.acquisition import Acquisition, ExternalDataStream
16+
17+
acquisition = Acquisition(
18+
subject_id="123456",
19+
specimen_id=["123456_bar001", "123456_bar002"],
20+
acquisition_start_time=datetime(2025, 1, 1, 9, 0, 0, tzinfo=ZoneInfo("America/Los_Angeles")),
21+
acquisition_end_time=datetime(2025, 1, 31, 17, 0, 0, tzinfo=ZoneInfo("America/Los_Angeles")),
22+
acquisition_type="BarcodeSequencing",
23+
data_streams=[
24+
ExternalDataStream(
25+
stream_start_time=datetime(2025, 1, 1, 9, 0, 0, tzinfo=ZoneInfo("America/Los_Angeles")),
26+
stream_end_time=datetime(2025, 1, 31, 17, 0, 0, tzinfo=ZoneInfo("America/Los_Angeles")),
27+
modalities=[Modality.BARSEQ],
28+
notes="Acquired externally.",
29+
)
30+
],
31+
)
32+
33+
if __name__ == "__main__":
34+
parser = argparse.ArgumentParser()
35+
parser.add_argument("--output-dir", default=None, help="Output directory for generated JSON file")
36+
args = parser.parse_args()
37+
38+
serialized = acquisition.model_dump_json()
39+
deserialized = Acquisition.model_validate_json(serialized)
40+
deserialized.write_standard_file(prefix="barseq", output_directory=args.output_dir)

src/aind_data_schema/core/acquisition.py

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ class DataStream(DataModel):
108108
same time.
109109
"""
110110

111+
object_type: Literal["DataStream"] = "DataStream"
111112
stream_start_time: Annotated[
112113
AwareDatetimeWithDefault,
113114
Field(..., title="Stream start time"),
@@ -250,6 +251,26 @@ def __add__(self, other: "DataStream", overlap_s: int = 120) -> "DataStream":
250251
)
251252

252253

254+
class ExternalDataStream(DataModel):
255+
"""A simplified data stream for acquisitions where instrument metadata is unavailable."""
256+
257+
object_type: Literal["ExternalDataStream"] = "ExternalDataStream"
258+
stream_start_time: Annotated[
259+
AwareDatetimeWithDefault,
260+
Field(..., title="Stream start time"),
261+
TimeValidation.BETWEEN,
262+
]
263+
stream_end_time: Annotated[
264+
AwareDatetimeWithDefault,
265+
Field(..., title="Stream stop time"),
266+
TimeValidation.BETWEEN,
267+
]
268+
modalities: List[Modality.ONE_OF] = Field(
269+
..., title="Modalities", description="Modalities that are acquired in this stream"
270+
)
271+
notes: Optional[str] = Field(default=None, title="Notes")
272+
273+
253274
class StimulusEpoch(DataModel):
254275
"""All stimuli being presented to the subject. starting and stopping at approximately the
255276
same time. Not all acquisitions have StimulusEpochs.
@@ -374,7 +395,11 @@ def coerce_fixed_offset_tz_string(cls, v):
374395
)
375396
protocol_id: Optional[List[str]] = Field(default=None, title="Protocol ID", description="DOI for protocols.io")
376397
ethics_review_id: Optional[List[str]] = Field(default=None, title="Ethics review ID")
377-
instrument_id: str = Field(..., title="Instrument ID", description="Should match the Instrument.instrument_id")
398+
instrument_id: Optional[str] = Field(
399+
default=None,
400+
title="Instrument ID",
401+
description="Should match the Instrument.instrument_id. Required when instrument metadata is available.",
402+
)
378403
acquisition_type: str = Field(
379404
...,
380405
title="Acquisition type",
@@ -406,12 +431,13 @@ def coerce_fixed_offset_tz_string(cls, v):
406431
)
407432

408433
# Acquisition data
409-
data_streams: List[DataStream] = Field(
434+
data_streams: DiscriminatedList[DataStream | ExternalDataStream] = Field(
410435
...,
411436
title="Data streams",
412437
description=(
413438
"A data stream is a collection of devices that are acquiring data simultaneously. Each acquisition can "
414-
"include multiple streams. Streams should be split when configurations are changed."
439+
"include multiple streams. Streams should be split when configurations are changed. "
440+
"Use ExternalDataStream for acquisitions where instrument metadata is unavailable."
415441
),
416442
)
417443
stimulus_epochs: List[StimulusEpoch] = Field(
@@ -463,6 +489,16 @@ def check_subject_specimen_id(self):
463489

464490
return self
465491

492+
@model_validator(mode="after")
493+
def instrument_id_required_for_data_streams(self):
494+
"""Require instrument_id when any standard DataStream is present"""
495+
if not hasattr(self, "data_streams"):
496+
return self
497+
if any(isinstance(stream, DataStream) for stream in self.data_streams):
498+
if not self.instrument_id:
499+
raise ValueError("instrument_id is required when data_streams contains a DataStream")
500+
return self
501+
466502
@model_validator(mode="after")
467503
def specimen_required(self):
468504
"""Check if specimen ID is required for in vitro imaging modalities"""
@@ -546,7 +582,10 @@ def __add__(self, other: "Acquisition") -> "Acquisition":
546582
ethics_review_id = merge_optional_list(self.ethics_review_id, other.ethics_review_id)
547583
calibrations = self.calibrations + other.calibrations
548584
maintenance = self.maintenance + other.maintenance
549-
data_streams = Acquisition._merge_data_streams(self.data_streams + other.data_streams)
585+
all_streams = self.data_streams + other.data_streams
586+
external_streams = [s for s in all_streams if isinstance(s, ExternalDataStream)]
587+
regular_streams = [s for s in all_streams if isinstance(s, DataStream)]
588+
data_streams = Acquisition._merge_data_streams(regular_streams) + external_streams
550589
stimulus_epochs = self.stimulus_epochs + other.stimulus_epochs
551590

552591
# Remove duplicates

src/aind_data_schema/core/metadata.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
from aind_data_schema.base import DataCoreModel
2222
from aind_data_schema.components.identifiers import DatabaseIdentifiers
2323
from aind_data_schema.components.subjects import CalibrationObject
24-
from aind_data_schema.core.acquisition import Acquisition
24+
from aind_data_schema.core.acquisition import Acquisition, DataStream, ExternalDataStream
2525
from aind_data_schema.core.data_description import DataDescription
2626
from aind_data_schema.core.instrument import Instrument
2727
from aind_data_schema.core.model import Model
@@ -149,9 +149,16 @@ def validate_expected_files_by_modality(self):
149149

150150
for file in REQUIRED_FILE_SETS.keys():
151151
if getattr(self, file):
152-
for file in REQUIRED_FILE_SETS[file]:
153-
if not getattr(self, file):
154-
warnings.warn(f"Metadata missing required file: {file}")
152+
for required_file in REQUIRED_FILE_SETS[file]:
153+
if not getattr(self, required_file):
154+
# Skip instrument warning when acquisition only has ExternalDataStream
155+
if (
156+
required_file == "instrument"
157+
and self.acquisition
158+
and all(isinstance(s, ExternalDataStream) for s in self.acquisition.data_streams)
159+
):
160+
continue
161+
warnings.warn(f"Metadata missing required file: {required_file}")
155162

156163
return self
157164

@@ -218,7 +225,8 @@ def validate_acquisition_active_devices(self):
218225

219226
if self.acquisition:
220227
for data_stream in self.acquisition.data_streams:
221-
active_devices.extend(data_stream.active_devices)
228+
if isinstance(data_stream, DataStream):
229+
active_devices.extend(data_stream.active_devices)
222230

223231
device_names = []
224232

@@ -253,6 +261,8 @@ def validate_acquisition_connections(self):
253261
if self.acquisition:
254262
data_streams = self.acquisition.data_streams
255263
for data_stream in data_streams:
264+
if not isinstance(data_stream, DataStream):
265+
continue
256266
for connection in data_stream.connections:
257267
# Check both source and target devices exist
258268
missing_devices = []

tests/test_acquisition.py

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,14 @@
2222
SampleChamberConfig,
2323
)
2424
from aind_data_schema.components.coordinates import CoordinateSystemLibrary, Translation
25-
from aind_data_schema.core.acquisition import Acquisition, AcquisitionSubjectDetails, DataStream, StimulusEpoch
25+
from aind_data_schema.core.acquisition import (
26+
Acquisition,
27+
AcquisitionSubjectDetails,
28+
DataStream,
29+
StimulusEpoch,
30+
)
2631
from aind_data_schema.components.connections import Connection
32+
from examples.barseq_acquisition import acquisition as barseq_acquisition
2733
from examples.ephys_acquisition import acquisition as ephys_acquisition
2834
from examples.exaspim_acquisition import acq as exaspim_acquisition
2935
from examples.mri_acquisition import acquisition as mri_acquisition, scan1
@@ -53,6 +59,33 @@ def test_constructors(self):
5359
scan1_dict["notes"] = ""
5460
MRIScan.model_validate(scan1_dict)
5561

62+
def test_external_data_stream(self):
63+
"""Test ExternalDataStream: valid without instrument_id, and DataStream requires instrument_id"""
64+
# Happy path: BARseq example uses ExternalDataStream and has no instrument_id
65+
self.assertIsNotNone(barseq_acquisition)
66+
self.assertIsNone(barseq_acquisition.instrument_id)
67+
68+
# Guard: DataStream without instrument_id should fail
69+
start = datetime(2025, 1, 1, 0, 0, 0, tzinfo=timezone.utc)
70+
end = datetime(2025, 1, 2, 0, 0, 0, tzinfo=timezone.utc)
71+
with self.assertRaises(ValidationError) as context:
72+
Acquisition(
73+
subject_id="123456",
74+
acquisition_start_time=start,
75+
acquisition_end_time=end,
76+
acquisition_type="Test",
77+
data_streams=[
78+
DataStream(
79+
stream_start_time=start,
80+
stream_end_time=end,
81+
modalities=[Modality.BARSEQ],
82+
active_devices=[],
83+
configurations=[],
84+
)
85+
],
86+
)
87+
self.assertIn("instrument_id is required", str(context.exception))
88+
5689
def test_check_subject_specimen_id(self):
5790
"""Test that subject and specimen IDs match"""
5891
with self.assertRaises(ValueError) as context:

tests/test_metadata.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
from aind_data_schema.components.subject_procedures import TrainingProtocol
3030
from aind_data_schema.core.acquisition import StimulusEpoch
3131

32+
from examples.barseq_acquisition import acquisition as barseq_acquisition
3233
from examples.data_description import d as data_description
3334
from examples.subject import s as subject
3435

@@ -334,6 +335,19 @@ def test_validate_expected_files_by_modality(self):
334335
str(context.exception),
335336
)
336337

338+
def test_external_data_stream_no_instrument_warning(self):
339+
"""Test that ExternalDataStream-only acquisitions do not warn about missing instrument"""
340+
with warnings.catch_warnings(record=True) as w:
341+
warnings.simplefilter("always")
342+
Metadata(
343+
name="655019_2023-04-03T181709",
344+
location="bucket",
345+
subject=subject,
346+
acquisition=barseq_acquisition,
347+
)
348+
instrument_warnings = [str(warning.message) for warning in w if "instrument" in str(warning.message)]
349+
self.assertEqual([], instrument_warnings)
350+
337351
def test_validate_acquisition_connections(self):
338352
"""Tests that acquisition connections are validated correctly."""
339353
# Case where all connection devices are present in instrument components

0 commit comments

Comments
 (0)