[feature][MSD-506] Meteor store annoted data#3451
Conversation
Provides a thread-safe, non-blocking ``DataCollector.record()`` call that any Odemis module can invoke to capture a labelled data sample. CLI interface is implemented to download the data from the cloud storage.
There was a problem hiding this comment.
Pull request overview
This PR adds an annotated data-collection pipeline for cryo features, including user consent management, S3 upload/download utilities, and a per-project sampling flag that is propagated to newly created CryoFeature instances and persisted in features.json.
Changes:
- Introduces a
DataCollectorframework with background serialization/upload to S3 and persistent consent configuration. - Adds GUI consent dialog + Help menu toggle, and wires per-project feature sampling into feature creation/loading.
- Adds an S3 “fetch samples” CLI and extensive unit/integration tests for the new utilities.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/odemis/util/datacollector.py |
New data-collection framework (config, background worker, S3 backend, serialization). |
src/odemis/util/dc_fetch.py |
New S3 retrieval helpers + CLI entrypoint implementation. |
scripts/odemis-dc-fetch.py |
Script wrapper for the S3 sample fetch CLI. |
src/odemis/gui/win/consent.py |
New consent dialog UI. |
src/odemis/gui/main.py |
Shows consent prompt on startup and injects DataCollector into menu controller. |
src/odemis/gui/cont/menu.py |
Adds Help menu checkbox to toggle data sharing consent. |
src/odemis/acq/feature.py |
Adds CryoFeature.collect flag + collection routine and helpers. |
src/odemis/gui/model/tab_gui_data.py |
Propagates per-session features_collectable flag into newly created CryoFeatures. |
src/odemis/gui/cont/tabs/cryo_chamber_tab.py |
Makes a per-project sampling decision and clears collect on loaded features. |
src/odemis/gui/cont/features.py |
Adds triggers for collection on status change, posture transitions, and feature deletion. |
src/odemis/util/test/datacollector_test.py |
New tests for config, serialization, queue limit, retry, and S3 integration (skipped when creds missing). |
src/odemis/util/test/dc_fetch_test.py |
New tests for S3 listing/pagination and download filtering logic. |
src/odemis/acq/test/feature_test.py / src/odemis/acq/test/test-features.json |
Updates feature JSON format and adds tests for collect + collection helpers. |
debian/control |
Adds python3-boto3 dependency. |
| consent_val = self.consent | ||
| remind_val = self.remind_date | ||
|
|
||
| if consent_val is True: | ||
| consent_line = "consent = true" | ||
| elif consent_val is False: | ||
| consent_line = "consent = false" | ||
| else: | ||
| consent_line = "consent = none" | ||
|
|
There was a problem hiding this comment.
DataCollectorConfig._write() writes consent = none when consent is unset, but DataCollectorConfig.consent reads the value using ConfigParser.getboolean(), which cannot parse none and will raise ValueError on the next load. This makes cfg.consent unusable after a save/load cycle when consent is undecided (e.g., after clear_consent() / postpone_consent()). Either omit/comment out the consent option when unset (and keep it absent in the file), or update the getter to explicitly treat none/empty as None (catch ValueError).
| # Limit event_name length so the filename stays within filesystem limits. | ||
| safe_event = item.event_name[:64] if item.event_name else "event" | ||
| zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip" | ||
|
|
||
| tmp_dir = Path(tempfile.mkdtemp(prefix="dc_")) | ||
| try: | ||
| payload_meta: dict = {} | ||
| extra_files: list = [] # list of (arcname, abs_path) | ||
|
|
||
| for key, value in item.payload.items(): | ||
| if value is None or isinstance(value, (str, int, float, bool)): | ||
| payload_meta[key] = value | ||
|
|
||
| elif isinstance(value, numpy.ndarray): | ||
| if item.image_format.upper() == "HDF5": | ||
| arc_name = f"{key}.h5" | ||
| abs_path = tmp_dir / arc_name | ||
| try: | ||
| da = value if isinstance(value, model.DataArray) else model.DataArray(value) | ||
| hdf5.export(str(abs_path), da) | ||
| except Exception: | ||
| logging.exception("Failed to export DataArray to HDF5 at %s", abs_path) | ||
| abs_path = None | ||
| else: | ||
| arc_name = f"{key}.ome.tiff" | ||
| abs_path = tmp_dir / arc_name | ||
| try: | ||
| da = value if isinstance(value, model.DataArray) else model.DataArray(value) | ||
| tiff.export(str(abs_path), da) | ||
| except Exception: | ||
| logging.exception("Failed to export DataArray to TIFF at %s", abs_path) | ||
| abs_path = None | ||
|
|
||
| if abs_path is not None and abs_path.exists(): | ||
| extra_files.append((arc_name, abs_path)) | ||
| payload_meta[key] = arc_name | ||
| else: | ||
| payload_meta[key] = None | ||
| payload_meta["export_error"] = True | ||
|
|
||
| elif isinstance(value, (dict, list)): | ||
| arc_name = f"extra_{key}.json" | ||
| abs_path = tmp_dir / arc_name | ||
| abs_path.write_text(json.dumps(value, default=str), encoding="utf-8") | ||
| extra_files.append((arc_name, abs_path)) | ||
| payload_meta[key] = arc_name |
There was a problem hiding this comment.
event_name and payload keys are used directly to construct filenames inside _serialize() (ZIP name, extra_<key>.json, <key>.ome.tiff/.h5). Because DataCollector.record() allows arbitrary strings for event_name and payload keys, this enables path traversal (e.g., event_name='../../x' or payload key containing path separators) and could write/replace files outside queue_dir/tmp_dir. Sanitize event_name and all derived filenames to a safe character set and ensure the final resolved path stays within the intended directory before writing/renaming.
| try: | ||
| _dc = DataCollector() | ||
| if not _dc.get_consent(): | ||
| return |
There was a problem hiding this comment.
collect_feature_data() instantiates a new DataCollector() on every call. When consent is granted this will create a new background worker thread per invocation (and a new config instance), which can leak threads and increase CPU/memory usage—especially since collection is triggered from multiple GUI events and also runs inside separate threads already. Prefer reusing a single shared DataCollector instance (e.g., inject it from the GUI/controller, or use a module-level singleton) rather than constructing a new one each time.
| """Create S3 client and return `(client, bucket)`.""" | ||
| backend = config.get_upload_backend() | ||
| if not isinstance(backend, S3UploadBackend): | ||
| raise RuntimeError("Only S3 backend is supported for retrieval.") | ||
| # Accessing protected members intentionally to reuse existing backend setup. | ||
| client = backend._get_client() # pylint: disable=protected-access | ||
| bucket = backend._bucket # pylint: disable=protected-access | ||
| return client, bucket | ||
|
|
||
|
|
There was a problem hiding this comment.
create_s3_client_from_config() appears unused (the fetch path uses build_s3_client_from_config() instead). Consider removing it or using it consistently to avoid duplicate ways of building an S3 client and reduce maintenance surface.
| """Create S3 client and return `(client, bucket)`.""" | |
| backend = config.get_upload_backend() | |
| if not isinstance(backend, S3UploadBackend): | |
| raise RuntimeError("Only S3 backend is supported for retrieval.") | |
| # Accessing protected members intentionally to reuse existing backend setup. | |
| client = backend._get_client() # pylint: disable=protected-access | |
| bucket = backend._bucket # pylint: disable=protected-access | |
| return client, bucket | |
| """Create an S3 client from configuration. | |
| This wrapper is kept for compatibility and delegates to | |
| build_s3_client_from_config so there is a single implementation | |
| for S3 client construction. | |
| """ | |
| return build_s3_client_from_config(config) |
| # Feature whose status VA we are subscribed to for data-collection triggering. | ||
| self._status_collect_feature: Optional[CryoFeature] = None | ||
|
|
There was a problem hiding this comment.
Optional is used in the type annotation for _status_collect_feature, but it is not imported. Without from __future__ import annotations, this will raise a NameError at import time and break the GUI controller. Import Optional from typing (or switch to typing.Optional[...]).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis pull request adds an anonymized data collection pipeline and UI consent flow: a thread-safe DataCollector with config persistence and S3 upload backend; a background worker that serializes payloads to ZIPs and uploads with retries and queue-size enforcement; a CLI tool to fetch DataCollector ZIP samples from S3; automatic feature-level collection triggers (status changes, posture transitions, deletion) with per-project sampling; a ConsentDialog and GUI integrations to prompt and toggle consent; test coverage for datacollector and fetch utilities; and a Debian packaging dependency for Sequence Diagram(s)sequenceDiagram
participant User
participant GUI as GUI/main.py
participant DC as DataCollector
participant Config as DataCollectorConfig
participant Dialog as ConsentDialog
User->>GUI: Launch application
GUI->>DC: Instantiate DataCollector
GUI->>DC: should_prompt_for_consent()
DC->>Config: Read config file
Config-->>DC: Return consent/reminder decision
alt Prompt needed
GUI->>Dialog: Show ConsentDialog
Dialog-->>User: Prompt (Opt in / Opt out / Remind)
User->>Dialog: Choose option
Dialog-->>GUI: Return result
GUI->>DC: set_consent()/postpone_consent()
DC->>Config: Persist consent state
else No prompt
GUI->>GUI: Continue startup
end
sequenceDiagram
participant Controller as CryoFeatureController
participant Feature as CryoFeature
participant DC as DataCollector
participant Worker as BackgroundWorker
participant S3 as S3 Backend
Feature->>Controller: status change or deletion or posture transition
Controller->>Controller: Check feature.collect & eligibility
alt Eligible & consent granted
Controller->>DC: record(event_name, schema, payload)
DC->>DC: Validate & enqueue _WorkItem
Worker->>DC: Dequeue item
Worker->>Worker: Serialize metadata + images -> ZIP
Worker->>S3: Upload ZIP
S3-->>Worker: Upload success
Worker->>Worker: Delete local ZIP
else Not eligible or no consent
Controller->>Controller: Skip collection
end
sequenceDiagram
participant Admin
participant CLI as odemis-dc-fetch
participant Config as DataCollectorConfig
participant S3 as S3 Bucket
participant Disk as Output Directory
Admin->>CLI: Run with --event/--since/--output
CLI->>Config: Load credentials / backend config
Config-->>CLI: S3 client info
CLI->>S3: list_objects_v2 (one or more prefixes)
S3-->>CLI: Paginated objects
loop For each object key
CLI->>CLI: Parse event & timestamp
CLI->>CLI: Apply filters (--event, --since, --host)
alt Matches
CLI->>S3: GetObject(key)
S3-->>Disk: Write file (skip if exists)
else Skip
end
end
CLI-->>Admin: Exit code (0 success, 1 failures, 2 parse error)
Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (2)
src/odemis/gui/cont/menu.py (1)
46-54: Add type hints to the new consent menu API.The changed constructor and new handlers should annotate all parameters and return types.
♻️ Proposed fix
+from typing import Any + @@ - def __init__(self, main_data, main_frame, data_collector: DataCollector): + def __init__(self, main_data: Any, main_frame: wx.Frame, data_collector: DataCollector) -> None: @@ - def _append_data_sharing_menu_item(self, main_frame): + def _append_data_sharing_menu_item(self, main_frame: wx.Frame) -> wx.MenuItem | None: @@ - def _on_toggle_data_sharing(self, evt): + def _on_toggle_data_sharing(self, evt: wx.CommandEvent) -> None:As per coding guidelines,
**/*.py: Always use type hints for function parameters and return types in Python code.Also applies to: 171-185
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/odemis/gui/cont/menu.py` around lines 46 - 54, Annotate the constructor and the new consent menu handlers with explicit type hints: update __init__ to declare parameter types (e.g., main_data: MainGUIData, main_frame: wx.Frame, data_collector: DataCollector) and the return type -> None; then locate the consent-related handler functions referenced around lines 171-185 and add full parameter and return type annotations (e.g., event: wx.Event or appropriate event type -> None, any other params typed to their domain types). Ensure you import or reference the types (MainGUIData, DataCollector, wx.Frame/Event) at the top of the module so the annotations are valid.src/odemis/gui/win/consent.py (1)
16-59: Add docstrings for the new dialog methods.The event handlers and initializer are new functions and should follow the project docstring rule.
♻️ Proposed fix
def __init__(self, parent: wx.Window, remind_days: int) -> None: + """ + Initialize the consent dialog. + + :param parent: Parent window for the dialog. + :param remind_days: Number of days before prompting again. + """ title = "Share data with Delmic" @@ def _on_opt_in(self, _evt: wx.CommandEvent) -> None: + """ + Handle the opt-in button. + """ self.EndModal(self.RESULT_OPT_IN) def _on_opt_out(self, _evt: wx.CommandEvent) -> None: + """ + Handle the opt-out button. + """ self.EndModal(self.RESULT_OPT_OUT) def _on_remind_later(self, _evt: wx.CommandEvent) -> None: + """ + Handle the remind-later button. + """ self.EndModal(self.RESULT_REMIND_LATER) def _on_close(self, _evt: wx.CloseEvent) -> None: + """ + Handle closing the dialog without an explicit choice. + """ self.EndModal(self.RESULT_REMIND_LATER)As per coding guidelines,
**/*.py: Include docstrings for all functions and classes, following the reStructuredText style guide, without type information and without using inline formatting markers or backticks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/odemis/gui/win/consent.py` around lines 16 - 59, Add reStructuredText docstrings to the Consent dialog methods: document the __init__ constructor and each event handler method (_on_opt_in, _on_opt_out, _on_remind_later, _on_close). For each docstring include a one-sentence description of the method's purpose and, where helpful, describe important parameters (e.g., evt) and the effect (which modal result is returned) using plain text (no type info, no inline code/backticks), following the project's reST style conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/odemis/acq/feature.py`:
- Around line 573-584: The code currently sets channel_key = None when MD_OUT_WL
is missing, causing all such streams to collide and only the first to be kept;
change to use a per-stream fallback like the stream's position so missing
MD_OUT_WL yields a unique key. Iterate with an index over the concatenated list
(fm_zstacks + fm_images) and set channel_key =
s.raw[0].metadata.get(model.MD_OUT_WL, f"fallback:{i}") (or similar unique
identifier from the stream) before checking seen_channels; update references to
selected_fm and seen_channels accordingly.
- Around line 539-638: The race is that multiple threads can pass the initial if
not feature.collect check and all run until feature.collect is set False at the
end; make the one-shot flip atomic by performing an immediate compare-and-set at
the start of the routine (e.g., in collect_feature_data): replace the plain
boolean check of feature.collect with an atomic operation that sets
feature.collect to False only if it was True (or acquire a per-feature lock
keyed by feature id, check feature.collect and set it False while holding the
lock), and if the CAS/lock indicates collect was already False return early;
reference feature.collect and the top-level routine (collect_feature_data in
this file) when adding the atomic CAS or lock.
In `@src/odemis/acq/test/feature_test.py`:
- Around line 349-365: The test test_payload_has_no_feature_name only checks
that f.name.value ("my_secret_feature_name") is not present among payload keys;
update the assertions to also ensure the feature name does not appear in payload
values by checking captured.values() and the stringified values (for
nested/serialized values) after collect_feature_data runs — e.g., add assertions
using self.assertNotIn("my_secret_feature_name", captured.values()) and
self.assertNotIn("my_secret_feature_name", str(list(captured.values()))) to the
test (keep existing fake_record/captured usage and collect_feature_data
invocation).
In `@src/odemis/gui/cont/features.py`:
- Line 31: The module imports Dict and List from typing but uses Optional in the
annotation for self._status_collect_feature (type Optional[CryoFeature]) which
is undefined; update the import statement that currently lists Dict and List to
also import Optional so Optional is available for the annotation (refer to the
import line and the attribute self._status_collect_feature / CryoFeature usage
to locate the change).
In `@src/odemis/gui/cont/tabs/cryo_chamber_tab.py`:
- Around line 388-389: The code sets
self.tab_data_model.main.features_collectable using random.random() and
FEATURE_COLLECT_PROBABILITY but those names are not defined; import the random
module at the top of this module and add a module-level constant
FEATURE_COLLECT_PROBABILITY (e.g., a float like 0.1) before it’s used so
_change_project_conf() and any code referencing features_collectable (and
class/attribute names like tab_data_model.main.features_collectable) do not
raise NameError.
In `@src/odemis/gui/main.py`:
- Around line 436-441: The consent dialog updates consent via
self._data_collector.set_consent(...) but the Help > Share data menu checkbox
isn't refreshed; add a helper method refresh_data_sharing_state(self) in the
menu controller (e.g. class in src/odemis/gui/cont/menu.py) that does if
self._consent_menu_item is not None:
self._consent_menu_item.Check(self._data_collector.get_consent() is True), then
call that helper after the dialog result handling in main.py (after the branches
that call set_consent(True/False) or postpone_consent()) so the menu checkbox
reflects the new persisted consent state immediately.
In `@src/odemis/util/datacollector.py`:
- Around line 588-604: The loop currently does "if
self._process_pending_zips(...): continue" which starves new in-memory records;
change _run so that when _process_pending_zips(...) returns True you still
attempt to drain or process queued items instead of skipping the queue.get()
step — e.g., replace the continue with a non-blocking attempt to fetch work (use
self._queue.get_nowait() in a try/except queue.Empty or
self._queue.get(timeout=0.0)) and call self._process_work_item(item) if you get
one; keep the existing exception handling (_schedule_retry, logging.exception)
and avoid a tight busy-loop by falling back to the original blocking
self._queue.get(timeout=1.0) when no items are available.
- Around line 328-373: The event_name and payload keys are used directly to
build filesystem names (safe_event, zip_name, arc_name, extra_files paths) which
allows path traversal or unsafe ZIP entries; add a sanitization step that
normalizes item.event_name and each payload key into safe filenames (strip or
replace path separators like "/" and "\" and sequences like "..", collapse to a
whitelist of allowed chars such as alphanumerics, hyphen, underscore, enforce a
max length like 64) before using them to construct zip_name, tmp_dir children,
or arc_name; apply this sanitizer to safe_event, every arc_name (e.g., when
creating extra_{key}.json or {key}.h5/.ome.tiff) and when writing to tmp_dir or
adding to extra_files so no user-supplied string can escape tmp_dir or create
dangerous ZIP entries (update references in the code around safe_event,
zip_name, arc_name, payload_meta assignments, and extra_files population).
In `@src/odemis/util/dc_fetch.py`:
- Around line 40-47: The ISO parse path must normalize a trailing 'Z' before
calling datetime.fromisoformat to maintain Python 3.10 compatibility: in the
code handling text (the branch that calls datetime.fromisoformat(text)), detect
and replace a trailing 'Z' (or '+00:00' equivalent if present) with '+00:00' or
otherwise remove it so fromisoformat won't raise ValueError, then proceed to set
tzinfo to timezone.utc when parsed.tzinfo is None and use
parsed.astimezone(timezone.utc) when it has tzinfo; update the logic around the
parsed = datetime.fromisoformat(text) call accordingly.
---
Nitpick comments:
In `@src/odemis/gui/cont/menu.py`:
- Around line 46-54: Annotate the constructor and the new consent menu handlers
with explicit type hints: update __init__ to declare parameter types (e.g.,
main_data: MainGUIData, main_frame: wx.Frame, data_collector: DataCollector) and
the return type -> None; then locate the consent-related handler functions
referenced around lines 171-185 and add full parameter and return type
annotations (e.g., event: wx.Event or appropriate event type -> None, any other
params typed to their domain types). Ensure you import or reference the types
(MainGUIData, DataCollector, wx.Frame/Event) at the top of the module so the
annotations are valid.
In `@src/odemis/gui/win/consent.py`:
- Around line 16-59: Add reStructuredText docstrings to the Consent dialog
methods: document the __init__ constructor and each event handler method
(_on_opt_in, _on_opt_out, _on_remind_later, _on_close). For each docstring
include a one-sentence description of the method's purpose and, where helpful,
describe important parameters (e.g., evt) and the effect (which modal result is
returned) using plain text (no type info, no inline code/backticks), following
the project's reST style conventions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2aab52b5-1622-44b1-9a6a-a474e984fe44
📒 Files selected for processing (15)
debian/controlscripts/odemis-dc-fetch.pysrc/odemis/acq/feature.pysrc/odemis/acq/test/feature_test.pysrc/odemis/acq/test/test-features.jsonsrc/odemis/gui/cont/features.pysrc/odemis/gui/cont/menu.pysrc/odemis/gui/cont/tabs/cryo_chamber_tab.pysrc/odemis/gui/main.pysrc/odemis/gui/model/tab_gui_data.pysrc/odemis/gui/win/consent.pysrc/odemis/util/datacollector.pysrc/odemis/util/dc_fetch.pysrc/odemis/util/test/datacollector_test.pysrc/odemis/util/test/dc_fetch_test.py
| if not feature.collect: | ||
| return | ||
|
|
||
| try: | ||
| _dc = DataCollector() | ||
| if not _dc.get_consent(): | ||
| return | ||
| except Exception: | ||
| logging.exception("collect_feature_data: failed to access DataCollector; skipping.") | ||
| return | ||
|
|
||
| try: | ||
|
|
||
| # Load feature streams from disk when not yet in memory. | ||
| if not feature.streams.value and project_dir: | ||
| try: | ||
| load_feature_streams_from_disk(feature, project_dir) | ||
| except Exception: | ||
| logging.exception( | ||
| "collect_feature_data: failed to load streams; skipping.") | ||
| return | ||
|
|
||
| feature_streams = list(feature.streams.value) | ||
|
|
||
| # Collect first z-stack per FM channel; fall back to first FM image per channel. | ||
| fm_zstacks: List = [] | ||
| fm_images: List = [] | ||
| for s in feature_streams: | ||
| if isinstance(s, StaticFluoStream): | ||
| if _is_zstack_stream(s): | ||
| fm_zstacks.append(s) | ||
| else: | ||
| fm_images.append(s) | ||
|
|
||
| # Per channel: prefer z-stack, then plain FM image. | ||
| # Channels are delineated by MD_OUT_WL; use index as fallback. | ||
| selected_fm: List = [] | ||
| seen_channels: set = set() | ||
| for s in fm_zstacks + fm_images: | ||
| try: | ||
| channel_key = s.raw[0].metadata.get(model.MD_OUT_WL) | ||
| except (IndexError, AttributeError): | ||
| channel_key = None | ||
| if channel_key not in seen_channels: | ||
| seen_channels.add(channel_key) | ||
| selected_fm.append(s) | ||
|
|
||
| # Collect spatially overlapping overview streams. | ||
| stage_pos = feature.stage_position.value | ||
| feat_x = stage_pos.get("x", 0.0) | ||
| feat_y = stage_pos.get("y", 0.0) | ||
|
|
||
| overview_fm: List = [] | ||
| overview_sem: List = [] | ||
| for s in (overview_streams or []): | ||
| if not _stream_overlaps_position(s, feat_x, feat_y): | ||
| continue | ||
| if isinstance(s, StaticFluoStream): | ||
| overview_fm.append(s) | ||
| elif isinstance(s, (StaticSEMStream, StaticFIBStream)): | ||
| overview_sem.append(s) | ||
|
|
||
| # Build privacy-preserving payload — generic keys, no names or filenames. | ||
| payload: dict = { | ||
| "status": feature.status.value, | ||
| "stage_position": dict(stage_pos), | ||
| "fm_focus_position": dict(feature.fm_focus_position.value), | ||
| } | ||
|
|
||
| def _get_raw(stream: "Stream") -> Optional["model.DataArray"]: | ||
| try: | ||
| return stream.raw[0] if stream.raw else None | ||
| except Exception: | ||
| return None | ||
|
|
||
| for idx, s in enumerate(selected_fm): | ||
| da = _get_raw(s) | ||
| if da is not None: | ||
| payload[f"channel_{idx}"] = da | ||
|
|
||
| for idx, s in enumerate(overview_fm): | ||
| da = _get_raw(s) | ||
| if da is not None: | ||
| payload[f"overview_fm_{idx}"] = da | ||
|
|
||
| for idx, s in enumerate(overview_sem): | ||
| da = _get_raw(s) | ||
| if da is not None: | ||
| payload[f"overview_sem_{idx}"] = da | ||
|
|
||
| image_keys = [k for k in payload if k.startswith(("channel_", "overview_fm_", "overview_sem_"))] | ||
| if not image_keys: | ||
| logging.debug( | ||
| "collect_feature_data: no images found for feature; skipping upload." | ||
| ) | ||
| return | ||
|
|
||
| _dc.record("feature_collected", "1.0", payload) | ||
|
|
||
| feature.collect = False |
There was a problem hiding this comment.
Make the one-shot collect transition atomic.
Concurrent status/posture/delete triggers can all pass Line 539 before Line 638 flips feature.collect, causing duplicate submissions for the same feature.
Proposed fix
FEATURE_COLLECT_PROBABILITY = 0.2
+_COLLECTION_STATE_LOCK = threading.Lock()- _dc.record("feature_collected", "1.0", payload)
-
- feature.collect = False
+ with _COLLECTION_STATE_LOCK:
+ if not feature.collect:
+ return
+ feature.collect = False
+
+ try:
+ _dc.record("feature_collected", "1.0", payload)
+ except Exception:
+ with _COLLECTION_STATE_LOCK:
+ feature.collect = True
+ raise
logging.debug("collect_feature_data: submitted feature data for collection.")🧰 Tools
🪛 Ruff (0.15.10)
[warning] 611-611: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/odemis/acq/feature.py` around lines 539 - 638, The race is that multiple
threads can pass the initial if not feature.collect check and all run until
feature.collect is set False at the end; make the one-shot flip atomic by
performing an immediate compare-and-set at the start of the routine (e.g., in
collect_feature_data): replace the plain boolean check of feature.collect with
an atomic operation that sets feature.collect to False only if it was True (or
acquire a per-feature lock keyed by feature id, check feature.collect and set it
False while holding the lock), and if the CAS/lock indicates collect was already
False return early; reference feature.collect and the top-level routine
(collect_feature_data in this file) when adding the atomic CAS or lock.
| # Per channel: prefer z-stack, then plain FM image. | ||
| # Channels are delineated by MD_OUT_WL; use index as fallback. | ||
| selected_fm: List = [] | ||
| seen_channels: set = set() | ||
| for s in fm_zstacks + fm_images: | ||
| try: | ||
| channel_key = s.raw[0].metadata.get(model.MD_OUT_WL) | ||
| except (IndexError, AttributeError): | ||
| channel_key = None | ||
| if channel_key not in seen_channels: | ||
| seen_channels.add(channel_key) | ||
| selected_fm.append(s) |
There was a problem hiding this comment.
Use a real per-stream fallback channel key.
The comment says “use index as fallback,” but Line 581 sets channel_key = None; every stream missing MD_OUT_WL then shares the same key, so only the first such FM stream is collected.
Proposed fix
- for s in fm_zstacks + fm_images:
+ for idx, s in enumerate(fm_zstacks + fm_images):
+ channel_key = ("index", idx)
try:
- channel_key = s.raw[0].metadata.get(model.MD_OUT_WL)
- except (IndexError, AttributeError):
- channel_key = None
+ out_wl = s.raw[0].metadata.get(model.MD_OUT_WL)
+ if out_wl is not None:
+ channel_key = ("out_wl", repr(out_wl))
+ except (IndexError, AttributeError, TypeError):
+ pass
if channel_key not in seen_channels:
seen_channels.add(channel_key)
selected_fm.append(s)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/odemis/acq/feature.py` around lines 573 - 584, The code currently sets
channel_key = None when MD_OUT_WL is missing, causing all such streams to
collide and only the first to be kept; change to use a per-stream fallback like
the stream's position so missing MD_OUT_WL yields a unique key. Iterate with an
index over the concatenated list (fm_zstacks + fm_images) and set channel_key =
s.raw[0].metadata.get(model.MD_OUT_WL, f"fallback:{i}") (or similar unique
identifier from the stream) before checking seen_channels; update references to
selected_fm and seen_channels accordingly.
| def test_payload_has_no_feature_name(self): | ||
| """Payload must not contain the feature name string as a key or value.""" | ||
| f = self._make_feature_with_stream(collect=True) | ||
| f.name.value = "my_secret_feature_name" | ||
| captured = {} | ||
|
|
||
| def fake_record(event_name, schema_version, payload, **kwargs): | ||
| captured.update(payload) | ||
|
|
||
| with patch("odemis.acq.feature.DataCollector") as MockDC: | ||
| MockDC.return_value.get_consent.return_value = True | ||
| MockDC.return_value.record.side_effect = fake_record | ||
| collect_feature_data(f) | ||
|
|
||
| self.assertNotIn("my_secret_feature_name", captured) | ||
| self.assertNotIn("my_secret_feature_name", str(captured.keys())) | ||
|
|
There was a problem hiding this comment.
Assert the feature name is absent from payload values too.
The docstring says “as a key or value,” but the assertions only inspect keys. A leaked feature name in "status", metadata, or another value would not fail this test.
Proposed fix
self.assertNotIn("my_secret_feature_name", captured)
self.assertNotIn("my_secret_feature_name", str(captured.keys()))
+ self.assertNotIn("my_secret_feature_name", str(captured.values()))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/odemis/acq/test/feature_test.py` around lines 349 - 365, The test
test_payload_has_no_feature_name only checks that f.name.value
("my_secret_feature_name") is not present among payload keys; update the
assertions to also ensure the feature name does not appear in payload values by
checking captured.values() and the stringified values (for nested/serialized
values) after collect_feature_data runs — e.g., add assertions using
self.assertNotIn("my_secret_feature_name", captured.values()) and
self.assertNotIn("my_secret_feature_name", str(list(captured.values()))) to the
test (keep existing fake_record/captured usage and collect_feature_data
invocation).
| import threading | ||
|
|
||
| import wx | ||
| from typing import Dict, List |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -n 110 src/odemis/gui/cont/features.py | tail -n 80Repository: delmic/odemis
Length of output: 3014
Import Optional from typing.
Line 98 uses Optional[CryoFeature] in the type annotation for self._status_collect_feature, but line 31 only imports Dict and List from typing. This causes an undefined name error and must be fixed for Python 3.10+ code validity.
Proposed fix
-from typing import Dict, List
+from typing import Dict, List, Optional📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from typing import Dict, List | |
| from typing import Dict, List, Optional |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/odemis/gui/cont/features.py` at line 31, The module imports Dict and List
from typing but uses Optional in the annotation for self._status_collect_feature
(type Optional[CryoFeature]) which is undefined; update the import statement
that currently lists Dict and List to also import Optional so Optional is
available for the annotation (refer to the import line and the attribute
self._status_collect_feature / CryoFeature usage to locate the change).
| if result == ConsentDialog.RESULT_OPT_IN: | ||
| self._data_collector.set_consent(True) | ||
| elif result == ConsentDialog.RESULT_OPT_OUT: | ||
| self._data_collector.set_consent(False) | ||
| else: | ||
| self._data_collector.postpone_consent() |
There was a problem hiding this comment.
Keep the menu consent checkbox synchronized after the dialog.
The menu item is initialized before this dialog runs, so opting in here leaves Help > Share data unchecked while collection consent is enabled.
🛠️ Proposed fix
if result == ConsentDialog.RESULT_OPT_IN:
self._data_collector.set_consent(True)
elif result == ConsentDialog.RESULT_OPT_OUT:
self._data_collector.set_consent(False)
else:
self._data_collector.postpone_consent()
+
+ if self._menu_controller is not None:
+ self._menu_controller.refresh_data_sharing_state()Add this helper in src/odemis/gui/cont/menu.py:
def refresh_data_sharing_state(self) -> None:
"""Refresh the data sharing menu checkbox from persisted consent."""
if self._consent_menu_item is not None:
self._consent_menu_item.Check(self._data_collector.get_consent() is True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/odemis/gui/main.py` around lines 436 - 441, The consent dialog updates
consent via self._data_collector.set_consent(...) but the Help > Share data menu
checkbox isn't refreshed; add a helper method refresh_data_sharing_state(self)
in the menu controller (e.g. class in src/odemis/gui/cont/menu.py) that does if
self._consent_menu_item is not None:
self._consent_menu_item.Check(self._data_collector.get_consent() is True), then
call that helper after the dialog result handling in main.py (after the branches
that call set_consent(True/False) or postpone_consent()) so the menu checkbox
reflects the new persisted consent state immediately.
| ``consent`` is always present as ``none`` / ``true`` / ``false``. | ||
| ``reminder_date`` is commented out until explicitly set, then written | ||
| in ``YYYY-MM-DD`` format. | ||
| """ | ||
| self.file_path.parent.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| consent_val = self.consent | ||
| remind_val = self.remind_date | ||
|
|
||
| if consent_val is True: | ||
| consent_line = "consent = true" | ||
| elif consent_val is False: | ||
| consent_line = "consent = false" | ||
| else: | ||
| consent_line = "consent = none" |
There was a problem hiding this comment.
Handle persisted consent = none when reading config.
_write() persists undecided consent as none, but getboolean() raises ValueError for that value. A fresh DataCollectorConfig() reading the file can then break consent checks instead of returning None.
Proposed fix
`@property`
def consent(self) -> Optional[bool]:
"""Return the consent state, or ``None`` if not yet set."""
try:
- return self._cp.getboolean("general", "consent")
+ value = self._cp.get("general", "consent").strip().lower()
except (configparser.NoSectionError, configparser.NoOptionError):
return None
+ if value in ("", "none", "null"):
+ return None
+ if value in ("1", "yes", "true", "on"):
+ return True
+ if value in ("0", "no", "false", "off"):
+ return False
+ logging.warning("Invalid datacollector consent value %r; treating as undecided.", value)
+ return NoneAlso applies to: 190-195
| # Limit event_name length so the filename stays within filesystem limits. | ||
| safe_event = item.event_name[:64] if item.event_name else "event" | ||
| zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip" | ||
|
|
||
| tmp_dir = Path(tempfile.mkdtemp(prefix="dc_")) | ||
| try: | ||
| payload_meta: dict = {} | ||
| extra_files: list = [] # list of (arcname, abs_path) | ||
|
|
||
| for key, value in item.payload.items(): | ||
| if value is None or isinstance(value, (str, int, float, bool)): | ||
| payload_meta[key] = value | ||
|
|
||
| elif isinstance(value, numpy.ndarray): | ||
| if item.image_format.upper() == "HDF5": | ||
| arc_name = f"{key}.h5" | ||
| abs_path = tmp_dir / arc_name | ||
| try: | ||
| da = value if isinstance(value, model.DataArray) else model.DataArray(value) | ||
| hdf5.export(str(abs_path), da) | ||
| except Exception: | ||
| logging.exception("Failed to export DataArray to HDF5 at %s", abs_path) | ||
| abs_path = None | ||
| else: | ||
| arc_name = f"{key}.ome.tiff" | ||
| abs_path = tmp_dir / arc_name | ||
| try: | ||
| da = value if isinstance(value, model.DataArray) else model.DataArray(value) | ||
| tiff.export(str(abs_path), da) | ||
| except Exception: | ||
| logging.exception("Failed to export DataArray to TIFF at %s", abs_path) | ||
| abs_path = None | ||
|
|
||
| if abs_path is not None and abs_path.exists(): | ||
| extra_files.append((arc_name, abs_path)) | ||
| payload_meta[key] = arc_name | ||
| else: | ||
| payload_meta[key] = None | ||
| payload_meta["export_error"] = True | ||
|
|
||
| elif isinstance(value, (dict, list)): | ||
| arc_name = f"extra_{key}.json" | ||
| abs_path = tmp_dir / arc_name | ||
| abs_path.write_text(json.dumps(value, default=str), encoding="utf-8") | ||
| extra_files.append((arc_name, abs_path)) | ||
| payload_meta[key] = arc_name |
There was a problem hiding this comment.
Sanitize event names and payload keys before using them as paths.
event_name and payload keys flow into filesystem paths and ZIP arcnames. Values containing / or .. can write outside the queue/temp directory or create unsafe ZIP entries.
Proposed fix
+import re+_SAFE_FILENAME_RE = re.compile(r"[^A-Za-z0-9_.-]+")
+
+
+def _safe_name(value: object, fallback: str, max_length: int = 64) -> str:
+ """Return a filesystem-safe component for generated archive names."""
+ safe = _SAFE_FILENAME_RE.sub("_", str(value).strip())[:max_length].strip("._-")
+ return safe or fallback
+
+
def _serialize(item: _WorkItem, queue_dir: Path) -> Path:- # Limit event_name length so the filename stays within filesystem limits.
- safe_event = item.event_name[:64] if item.event_name else "event"
+ safe_event = _safe_name(item.event_name, "event")
zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip" for key, value in item.payload.items():
+ safe_key = _safe_name(key, "payload")
if value is None or isinstance(value, (str, int, float, bool)):
payload_meta[key] = value
elif isinstance(value, numpy.ndarray):
if item.image_format.upper() == "HDF5":
- arc_name = f"{key}.h5"
+ arc_name = f"{safe_key}.h5"
abs_path = tmp_dir / arc_name
@@
else:
- arc_name = f"{key}.ome.tiff"
+ arc_name = f"{safe_key}.ome.tiff"
abs_path = tmp_dir / arc_name
@@
elif isinstance(value, (dict, list)):
- arc_name = f"extra_{key}.json"
+ arc_name = f"extra_{safe_key}.json"
abs_path = tmp_dir / arc_name🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/odemis/util/datacollector.py` around lines 328 - 373, The event_name and
payload keys are used directly to build filesystem names (safe_event, zip_name,
arc_name, extra_files paths) which allows path traversal or unsafe ZIP entries;
add a sanitization step that normalizes item.event_name and each payload key
into safe filenames (strip or replace path separators like "/" and "\" and
sequences like "..", collapse to a whitelist of allowed chars such as
alphanumerics, hyphen, underscore, enforce a max length like 64) before using
them to construct zip_name, tmp_dir children, or arc_name; apply this sanitizer
to safe_event, every arc_name (e.g., when creating extra_{key}.json or
{key}.h5/.ome.tiff) and when writing to tmp_dir or adding to extra_files so no
user-supplied string can escape tmp_dir or create dangerous ZIP entries (update
references in the code around safe_event, zip_name, arc_name, payload_meta
assignments, and extra_files population).
| def _run(self) -> None: | ||
| """Main loop: process items until the thread is stopped.""" | ||
| while True: | ||
| try: | ||
| if self._process_pending_zips(self._queue_dir): | ||
| continue | ||
| except Exception: | ||
| logging.exception("DataCollector error while processing pending uploads") | ||
| self._schedule_retry() | ||
| continue | ||
| try: | ||
| item = self._queue.get(timeout=1.0) | ||
| except queue.Empty: | ||
| continue | ||
|
|
||
| try: | ||
| self._process_work_item(item) |
There was a problem hiding this comment.
Do not starve new records while pending uploads are backing off.
When _process_pending_zips() returns True during retry backoff, Line 593 immediately loops and skips _queue.get(). New samples then remain only in memory until uploads recover, so they can be lost on shutdown.
Proposed fix
def _run(self) -> None:
"""Main loop: process items until the thread is stopped."""
while True:
try:
- if self._process_pending_zips(self._queue_dir):
- continue
+ self._process_pending_zips(self._queue_dir)
except Exception:
logging.exception("DataCollector error while processing pending uploads")
self._schedule_retry()
- continue
try:
item = self._queue.get(timeout=1.0)
except queue.Empty:
continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/odemis/util/datacollector.py` around lines 588 - 604, The loop currently
does "if self._process_pending_zips(...): continue" which starves new in-memory
records; change _run so that when _process_pending_zips(...) returns True you
still attempt to drain or process queued items instead of skipping the
queue.get() step — e.g., replace the continue with a non-blocking attempt to
fetch work (use self._queue.get_nowait() in a try/except queue.Empty or
self._queue.get(timeout=0.0)) and call self._process_work_item(item) if you get
one; keep the existing exception handling (_schedule_retry, logging.exception)
and avoid a tight busy-loop by falling back to the original blocking
self._queue.get(timeout=1.0) when no items are available.
| text = value.strip() | ||
| if len(text) == 10: | ||
| parsed = datetime.strptime(text, "%Y-%m-%d") | ||
| return parsed.replace(tzinfo=timezone.utc) | ||
| parsed = datetime.fromisoformat(text) | ||
| if parsed.tzinfo is None: | ||
| return parsed.replace(tzinfo=timezone.utc) | ||
| return parsed.astimezone(timezone.utc) |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
#!/bin/bash
python3.10 - <<'PY'
from datetime import datetime
datetime.fromisoformat("2026-03-01T12:30:00Z")
PYRepository: delmic/odemis
Repository: delmic/odemis
Exit code: 127
stderr:
/bin/bash: line 2: python3.10: command not found
Script executed:
# First, locate and examine the target file
head -60 src/odemis/util/dc_fetch.py | tail -30Repository: delmic/odemis
Repository: delmic/odemis
Exit code: 0
stdout:
from odemis.util.datacollector import DataCollectorConfig, S3UploadBackend
def parse_since_utc(value: str) -> datetime:
"""Parse a date/datetime string to UTC-aware datetime.
Accepts ISO-8601 date (`YYYY-MM-DD`) and datetime (`YYYY-MM-DDTHH:MM:SS` with optional timezone).
"""
text = value.strip()
if len(text) == 10:
parsed = datetime.strptime(text, "%Y-%m-%d")
return parsed.replace(tzinfo=timezone.utc)
parsed = datetime.fromisoformat(text)
if parsed.tzinfo is None:
return parsed.replace(tzinfo=timezone.utc)
return parsed.astimezone(timezone.utc)
def parse_key_timestamp_utc(key: str) -> Optional[datetime]:
"""Parse `<event>-<YYYYMMDDTHHmmss>-<uuid>.zip` timestamp from key basename."""
name = Path(key).name
if not name.endswith(".zip"):
return None
stem = name[:-4]
parts = stem.rsplit("-", 2)
if len(parts) != 3:
return None
ts = parts[1]
try:
Script executed:
# Check Python version requirements in the project
find . -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" | head -5 | xargs cat 2>/dev/null | grep -i "python" | head -20Repository: delmic/odemis
Repository: delmic/odemis
Exit code: 0
stdout:
#!/usr/bin/env python3
Run `python setup.py build_ext --inplace` to compile
#!/usr/bin/env python3
# sudo apt-get install python-setuptools cython
# python3 setup.py build_ext --inplace
"Programming Language :: Python",
Script executed:
# Also check if there are any existing tests for this function
find . -name "*test*dc_fetch*" -o -path "*/test*" -name "*dc_fetch*" 2>/dev/nullRepository: delmic/odemis
Repository: delmic/odemis
Exit code: 0
stdout:
./src/odemis/util/test/dc_fetch_test.py
Python 3.10 datetime.fromisoformat() support for trailing Z timezone
No, Python 3.10's datetime.fromisoformat does not support trailing 'Z' timezone suffix. It raises ValueError for strings like '2011-11-04T00:05:23Z'. Support was added in Python 3.11, which expanded parsing to most ISO 8601 formats including 'Z' as UTC alias. In 3.10, it only parses formats matching datetime.isoformat output, which uses '+00:00' not 'Z'. Official 3.10 docs explicitly state it does not support arbitrary ISO 8601 strings. Python 3.10 receives only security fixes, no feature backports. Workaround: replace('Z', '+00:00').
Citations:
- 1: https://docs.python.org/release/3.10.10/library/datetime.html
- 2: https://docs.python.org/release/3.10.19/library/datetime.html
- 3: Datetime fromisoformat does not parse expected string python/cpython#128686
- 4: "ValueError: Invalid isoformat string" from a valid string python/cpython#90187
- 5: https://stackoverflow.com/questions/75867446/documentation-example-for-datetime-fromisoformat-raises-invalid-isoformat-string
- 6: https://www.reddit.com/r/Python/comments/163ugsm/psa_as_of_python_311_datetimefromisoformat/
- 7: https://docs.python.org/3/library/datetime.html
Script executed:
# Find Python version requirement in project root
find . -maxdepth 2 -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" -o -name "README.md" | xargs grep -l "python" 2>/dev/null | head -3Repository: delmic/odemis
Repository: delmic/odemis
Exit code: 0
stdout:
./setup.py
Script executed:
# Check the test file for parse_since_utc
cat src/odemis/util/test/dc_fetch_test.pyRepository: delmic/odemis
Repository: delmic/odemis
Exit code: 0
stdout:
#!/usr/bin/env python3
"""
Created on 11 March 2026
`@author`: Karishma Kumar
Copyright © 2026 Karishma Kumar, Delmic
This file is part of Odemis.
Odemis is free software: you can redistribute it and/or modify it under the
terms of the GNU General Public License version 2 as published by the Free
Software Foundation.
Odemis is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
Odemis. If not, see http://www.gnu.org/licenses/.
"""
import tempfile
import unittest
from datetime import datetime, timezone
from pathlib import Path
from unittest.mock import Mock, patch
from odemis.util import dc_fetch
class DCFetchTest(unittest.TestCase):
"""Unit tests for S3 retrieval helpers."""
def test_parse_since_utc_date(self) -> None:
"""Date input should parse as UTC midnight."""
parsed = dc_fetch.parse_since_utc("2026-03-22")
self.assertEqual(parsed, datetime(2026, 3, 22, 0, 0, 0, tzinfo=timezone.utc))
def test_parse_key_timestamp(self) -> None:
"""Timestamp should be parsed from key basename."""
parsed = dc_fetch.parse_key_timestamp_utc("host/z_stack_acquired-20260322T104530-a1b2c3d4.zip")
self.assertEqual(parsed, datetime(2026, 3, 22, 10, 45, 30, tzinfo=timezone.utc))
def test_should_download_key_filters(self) -> None:
"""Event and since filters should both be enforced."""
key = "host/z_stack_acquired-20260322T104530-a1b2c3d4.zip"
since_before = datetime(2026, 3, 22, 10, 0, 0, tzinfo=timezone.utc)
since_after = datetime(2026, 3, 22, 11, 0, 0, tzinfo=timezone.utc)
self.assertTrue(dc_fetch.should_download_key(key, "z_stack_acquired", since_before))
self.assertFalse(dc_fetch.should_download_key(key, "other_event", since_before))
self.assertFalse(dc_fetch.should_download_key(key, "z_stack_acquired", since_after))
def test_iter_s3_objects_paginates(self) -> None:
"""S3 iterator should follow continuation tokens."""
client = Mock()
client.list_objects_v2.side_effect = [
{
"Contents": [{"Key": "host/a.zip"}],
"IsTruncated": True,
"NextContinuationToken": "token-1",
},
{
"Contents": [{"Key": "host/b.zip"}],
"IsTruncated": False,
},
]
keys = [item["Key"] for item in dc_fetch.iter_s3_objects(client, "bucket", "host/")]
self.assertEqual(keys, ["host/a.zip", "host/b.zip"])
self.assertEqual(client.list_objects_v2.call_count, 2)
def test_parse_host_filters_comma_list(self) -> None:
"""Host parser should accept comma-separated values and normalize them."""
hosts = dc_fetch.parse_host_filters("meteor-5099, atlas-001 ,/secom-22/")
self.assertEqual(hosts, ["meteor-5099", "atlas-001", "secom-22"])
def test_fetch_samples_downloads_matching_keys(self) -> None:
"""Fetch flow should download matching keys and report counters."""
with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir:
output_dir = Path(tmp_dir)
client = Mock()
client.list_objects_v2.return_value = {
"Contents": [
{"Key": "host/evt-20260322T100000-aaaa1111.zip"},
{"Key": "host/other-20260322T100000-bbbb2222.zip"},
],
"IsTruncated": False,
}
def _download_file(_bucket: str, _key: str, filename: str) -> None:
Path(filename).write_bytes(b"zip")
client.download_file.side_effect = _download_file
with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")):
result = dc_fetch.fetch_samples(
event_filter="evt",
since_utc=datetime(2026, 3, 22, 9, 0, 0, tzinfo=timezone.utc),
output_dir=output_dir,
)
self.assertEqual(result["listed"], 2)
self.assertEqual(result["matched"], 1)
self.assertEqual(result["downloaded"], 1)
self.assertEqual(result["failed"], 0)
self.assertTrue((output_dir / "evt-20260322T100000-aaaa1111.zip").exists())
def test_fetch_samples_applies_host_filter_prefix(self) -> None:
"""Host filter should become the S3 list prefix."""
with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir:
output_dir = Path(tmp_dir)
client = Mock()
client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False}
with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")):
dc_fetch.fetch_samples(
event_filter=None,
since_utc=None,
output_dir=output_dir,
host_filter="meteor-5099",
)
call_kwargs = client.list_objects_v2.call_args.kwargs
self.assertEqual(call_kwargs["Bucket"], "bucket")
self.assertEqual(call_kwargs["Prefix"], "meteor-5099/")
def test_fetch_samples_applies_multiple_host_prefixes(self) -> None:
"""Comma-separated hosts should trigger one listing call per host prefix."""
with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir:
output_dir = Path(tmp_dir)
client = Mock()
client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False}
with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")):
dc_fetch.fetch_samples(
event_filter=None,
since_utc=None,
output_dir=output_dir,
host_filter="meteor-5099,atlas-001",
)
self.assertEqual(client.list_objects_v2.call_count, 2)
first_prefix = client.list_objects_v2.call_args_list[0].kwargs["Prefix"]
second_prefix = client.list_objects_v2.call_args_list[1].kwargs["Prefix"]
self.assertEqual(first_prefix, "meteor-5099/")
self.assertEqual(second_prefix, "atlas-001/")
def test_fetch_samples_passes_bucket_endpoint_region_overrides(self) -> None:
"""Overrides should be forwarded to the S3 client builder."""
with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir:
output_dir = Path(tmp_dir)
client = Mock()
client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False}
with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")) as builder:
dc_fetch.fetch_samples(
event_filter=None,
since_utc=None,
output_dir=output_dir,
host_filter=None,
bucket_override="other-bucket",
endpoint_override="https://s3.eu-west-1.amazonaws.com",
region_override="eu-west-1",
)
kwargs = builder.call_args.kwargs
self.assertEqual(kwargs["bucket_override"], "other-bucket")
self.assertEqual(kwargs["endpoint_override"], "https://s3.eu-west-1.amazonaws.com")
self.assertEqual(kwargs["region_override"], "eu-west-1")
def test_build_s3_client_uses_backend_region_as_default(self) -> None:
"""build_s3_client_from_config should use backend._region when no override is given."""
from odemis.util.datacollector import S3UploadBackend, S3_REGION
backend = S3UploadBackend(
access_key="key",
secret_key="secret",
region=S3_REGION,
bucket="test-bucket",
)
mock_config = Mock()
mock_config.get_upload_backend.return_value = backend
with patch("boto3.client") as mock_boto3_client:
mock_boto3_client.return_value = Mock()
dc_fetch.build_s3_client_from_config(mock_config)
call_kwargs = mock_boto3_client.call_args.kwargs
self.assertEqual(call_kwargs.get("region_name"), S3_REGION)
# endpoint_url must be None so boto3 resolves the regional endpoint automatically
self.assertIsNone(call_kwargs.get("endpoint_url"))
if __name__ == "__main__":
unittest.main()
Script executed:
# Look for Python version spec in setup.py
head -100 setup.py 2>/dev/null | grep -A5 -B5 "python"Repository: delmic/odemis
Repository: delmic/odemis
Exit code: 0
stdout:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# To rebuild just the cython modules, use these commands:
# sudo apt-get install python-setuptools cython
# python3 setup.py build_ext --inplace
from setuptools import setup, find_packages, Extension
from Cython.Build import cythonize # Warning: must be _after_ setup import
import glob
import os
import subprocess
Normalize trailing Z before fromisoformat() for Python 3.10 compatibility.
UTC timestamps with trailing Z (e.g., 2026-03-01T12:30:00Z) are standard ISO 8601 format commonly provided as CLI input. However, datetime.fromisoformat() does not support the Z suffix in Python 3.10—it raises ValueError. Support was added in Python 3.11. Since the coding guidelines require Python 3.10+ compatibility, this code path needs a normalization step.
Proposed fix
text = value.strip()
if len(text) == 10:
parsed = datetime.strptime(text, "%Y-%m-%d")
return parsed.replace(tzinfo=timezone.utc)
+ if text.endswith("Z"):
+ text = text[:-1] + "+00:00"
parsed = datetime.fromisoformat(text)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| text = value.strip() | |
| if len(text) == 10: | |
| parsed = datetime.strptime(text, "%Y-%m-%d") | |
| return parsed.replace(tzinfo=timezone.utc) | |
| parsed = datetime.fromisoformat(text) | |
| if parsed.tzinfo is None: | |
| return parsed.replace(tzinfo=timezone.utc) | |
| return parsed.astimezone(timezone.utc) | |
| text = value.strip() | |
| if len(text) == 10: | |
| parsed = datetime.strptime(text, "%Y-%m-%d") | |
| return parsed.replace(tzinfo=timezone.utc) | |
| if text.endswith("Z"): | |
| text = text[:-1] + "+00:00" | |
| parsed = datetime.fromisoformat(text) | |
| if parsed.tzinfo is None: | |
| return parsed.replace(tzinfo=timezone.utc) | |
| return parsed.astimezone(timezone.utc) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/odemis/util/dc_fetch.py` around lines 40 - 47, The ISO parse path must
normalize a trailing 'Z' before calling datetime.fromisoformat to maintain
Python 3.10 compatibility: in the code handling text (the branch that calls
datetime.fromisoformat(text)), detect and replace a trailing 'Z' (or '+00:00'
equivalent if present) with '+00:00' or otherwise remove it so fromisoformat
won't raise ValueError, then proceed to set tzinfo to timezone.utc when
parsed.tzinfo is None and use parsed.astimezone(timezone.utc) when it has
tzinfo; update the logic around the parsed = datetime.fromisoformat(text) call
accordingly.
Builds on PR of Building the data framework #3444
Overview
Introduces per-project sampling for cryo feature data collection. When a project is opened or created, a single random decision is made (at 20% probability) that applies uniformly to all features created during that session. Features loaded from a previous session are immediately excluded from collection.
How it works
Single decision per project (
cryo_chamber_tab.py)_change_project_conf()is called every time a project is opened or created. It makes one random draw (random.random() \< FEATURE_COLLECT_PROBABILITY) and stores the result asmain.features_collectable— a lightweight dynamic attribute on the main GUI data model (not a formal VA, never persisted to disk).All new features inherit the decision (
tab_gui_data.py)CryoGUIData.add_new_feature()readsgetattr(self.main, "features_collectable", False)and passes it ascollect=when constructing eachCryoFeature. Every feature created in the same session therefore shares the same flag value — either all are eligible for collection or none are.Loaded features are excluded (
cryo_chamber_tab.py)In
_load_project_data(), after features are read fromfeatures.json, every loaded feature is immediately reset tocollect=Falsebefore being assigned to the model. Features from a prior session were either already collected or never selected; the new session's sampling decision applies only to features created going forward.CryoFeature.collectflag (feature.py)The
collectparameter onCryoFeature.__init__is a plainbool, defaulting toFalse. The flag is serialised intofeatures.jsonand survives a save/load cycle. If the key is absent in loaded JSON (older data), it defaults toFalse.