[feature][MSD-506] Meteor store annoted data by K4rishma · Pull Request #3451 · delmic/odemis

K4rishma · 2026-04-22T11:56:03Z

Builds on PR of Building the data framework #3444

Overview

Introduces per-project sampling for cryo feature data collection. When a project is opened or created, a single random decision is made (at 20% probability) that applies uniformly to all features created during that session. Features loaded from a previous session are immediately excluded from collection.

How it works

Single decision per project (cryo_chamber_tab.py)
_change_project_conf() is called every time a project is opened or created. It makes one random draw (random.random() \< FEATURE_COLLECT_PROBABILITY) and stores the result as main.features_collectable — a lightweight dynamic attribute on the main GUI data model (not a formal VA, never persisted to disk).

All new features inherit the decision (tab_gui_data.py)
CryoGUIData.add_new_feature() reads getattr(self.main, "features_collectable", False) and passes it as collect= when constructing each CryoFeature. Every feature created in the same session therefore shares the same flag value — either all are eligible for collection or none are.

Loaded features are excluded (cryo_chamber_tab.py)
In _load_project_data(), after features are read from features.json, every loaded feature is immediately reset to collect=False before being assigned to the model. Features from a prior session were either already collected or never selected; the new session's sampling decision applies only to features created going forward.

CryoFeature.collect flag (feature.py)
The collect parameter on CryoFeature.__init__ is a plain bool, defaulting to False. The flag is serialised into features.json and survives a save/load cycle. If the key is absent in loaded JSON (older data), it defaults to False.

Provides a thread-safe, non-blocking ``DataCollector.record()`` call that any Odemis module can invoke to capture a labelled data sample. CLI interface is implemented to download the data from the cloud storage.

Copilot

Pull request overview

This PR adds an annotated data-collection pipeline for cryo features, including user consent management, S3 upload/download utilities, and a per-project sampling flag that is propagated to newly created CryoFeature instances and persisted in features.json.

Changes:

Introduces a DataCollector framework with background serialization/upload to S3 and persistent consent configuration.
Adds GUI consent dialog + Help menu toggle, and wires per-project feature sampling into feature creation/loading.
Adds an S3 “fetch samples” CLI and extensive unit/integration tests for the new utilities.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/odemis/util/datacollector.py`	New data-collection framework (config, background worker, S3 backend, serialization).
`src/odemis/util/dc_fetch.py`	New S3 retrieval helpers + CLI entrypoint implementation.
`scripts/odemis-dc-fetch.py`	Script wrapper for the S3 sample fetch CLI.
`src/odemis/gui/win/consent.py`	New consent dialog UI.
`src/odemis/gui/main.py`	Shows consent prompt on startup and injects `DataCollector` into menu controller.
`src/odemis/gui/cont/menu.py`	Adds Help menu checkbox to toggle data sharing consent.
`src/odemis/acq/feature.py`	Adds `CryoFeature.collect` flag + collection routine and helpers.
`src/odemis/gui/model/tab_gui_data.py`	Propagates per-session `features_collectable` flag into newly created `CryoFeature`s.
`src/odemis/gui/cont/tabs/cryo_chamber_tab.py`	Makes a per-project sampling decision and clears `collect` on loaded features.
`src/odemis/gui/cont/features.py`	Adds triggers for collection on status change, posture transitions, and feature deletion.
`src/odemis/util/test/datacollector_test.py`	New tests for config, serialization, queue limit, retry, and S3 integration (skipped when creds missing).
`src/odemis/util/test/dc_fetch_test.py`	New tests for S3 listing/pagination and download filtering logic.
`src/odemis/acq/test/feature_test.py` / `src/odemis/acq/test/test-features.json`	Updates feature JSON format and adds tests for `collect` + collection helpers.
`debian/control`	Adds `python3-boto3` dependency.

Copilot · 2026-04-22T12:02:50Z

+        consent_val = self.consent
+        remind_val = self.remind_date
+
+        if consent_val is True:
+            consent_line = "consent = true"
+        elif consent_val is False:
+            consent_line = "consent = false"
+        else:
+            consent_line = "consent = none"
+


DataCollectorConfig._write() writes consent = none when consent is unset, but DataCollectorConfig.consent reads the value using ConfigParser.getboolean(), which cannot parse none and will raise ValueError on the next load. This makes cfg.consent unusable after a save/load cycle when consent is undecided (e.g., after clear_consent() / postpone_consent()). Either omit/comment out the consent option when unset (and keep it absent in the file), or update the getter to explicitly treat none/empty as None (catch ValueError).

Copilot · 2026-04-22T12:02:51Z

+    # Limit event_name length so the filename stays within filesystem limits.
+    safe_event = item.event_name[:64] if item.event_name else "event"
+    zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip"
+
+    tmp_dir = Path(tempfile.mkdtemp(prefix="dc_"))
+    try:
+        payload_meta: dict = {}
+        extra_files: list = []  # list of (arcname, abs_path)
+
+        for key, value in item.payload.items():
+            if value is None or isinstance(value, (str, int, float, bool)):
+                payload_meta[key] = value
+
+            elif isinstance(value, numpy.ndarray):
+                if item.image_format.upper() == "HDF5":
+                    arc_name = f"{key}.h5"
+                    abs_path = tmp_dir / arc_name
+                    try:
+                        da = value if isinstance(value, model.DataArray) else model.DataArray(value)
+                        hdf5.export(str(abs_path), da)
+                    except Exception:
+                        logging.exception("Failed to export DataArray to HDF5 at %s", abs_path)
+                        abs_path = None
+                else:
+                    arc_name = f"{key}.ome.tiff"
+                    abs_path = tmp_dir / arc_name
+                    try:
+                        da = value if isinstance(value, model.DataArray) else model.DataArray(value)
+                        tiff.export(str(abs_path), da)
+                    except Exception:
+                        logging.exception("Failed to export DataArray to TIFF at %s", abs_path)
+                        abs_path = None
+
+                if abs_path is not None and abs_path.exists():
+                    extra_files.append((arc_name, abs_path))
+                    payload_meta[key] = arc_name
+                else:
+                    payload_meta[key] = None
+                    payload_meta["export_error"] = True
+
+            elif isinstance(value, (dict, list)):
+                arc_name = f"extra_{key}.json"
+                abs_path = tmp_dir / arc_name
+                abs_path.write_text(json.dumps(value, default=str), encoding="utf-8")
+                extra_files.append((arc_name, abs_path))
+                payload_meta[key] = arc_name


event_name and payload keys are used directly to construct filenames inside _serialize() (ZIP name, extra_<key>.json, <key>.ome.tiff/.h5). Because DataCollector.record() allows arbitrary strings for event_name and payload keys, this enables path traversal (e.g., event_name='../../x' or payload key containing path separators) and could write/replace files outside queue_dir/tmp_dir. Sanitize event_name and all derived filenames to a safe character set and ensure the final resolved path stays within the intended directory before writing/renaming.

Copilot · 2026-04-22T12:02:51Z

+    try:
+        _dc = DataCollector()
+        if not _dc.get_consent():
+            return


collect_feature_data() instantiates a new DataCollector() on every call. When consent is granted this will create a new background worker thread per invocation (and a new config instance), which can leak threads and increase CPU/memory usage—especially since collection is triggered from multiple GUI events and also runs inside separate threads already. Prefer reusing a single shared DataCollector instance (e.g., inject it from the GUI/controller, or use a module-level singleton) rather than constructing a new one each time.

Copilot · 2026-04-22T12:02:52Z

+    """Create S3 client and return `(client, bucket)`."""
+    backend = config.get_upload_backend()
+    if not isinstance(backend, S3UploadBackend):
+        raise RuntimeError("Only S3 backend is supported for retrieval.")
+    # Accessing protected members intentionally to reuse existing backend setup.
+    client = backend._get_client()  # pylint: disable=protected-access
+    bucket = backend._bucket  # pylint: disable=protected-access
+    return client, bucket
+
+


create_s3_client_from_config() appears unused (the fetch path uses build_s3_client_from_config() instead). Consider removing it or using it consistently to avoid duplicate ways of building an S3 client and reduce maintenance surface.

Suggested change

"""Create S3 client and return `(client, bucket)`."""

backend = config.get_upload_backend()

if not isinstance(backend, S3UploadBackend):

raise RuntimeError("Only S3 backend is supported for retrieval.")

# Accessing protected members intentionally to reuse existing backend setup.

client = backend._get_client() # pylint: disable=protected-access

bucket = backend._bucket # pylint: disable=protected-access

return client, bucket

"""Create an S3 client from configuration.

This wrapper is kept for compatibility and delegates to

build_s3_client_from_config so there is a single implementation

for S3 client construction.

"""

return build_s3_client_from_config(config)

Copilot · 2026-04-22T12:02:52Z

+        # Feature whose status VA we are subscribed to for data-collection triggering.
+        self._status_collect_feature: Optional[CryoFeature] = None
+


Optional is used in the type annotation for _status_collect_feature, but it is not imported. Without from __future__ import annotations, this will raise a NameError at import time and break the GUI controller. Import Optional from typing (or switch to typing.Optional[...]).

coderabbitai · 2026-04-22T12:05:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 820f35f2-98c4-41a6-9c63-c21291659338

📥 Commits

Reviewing files that changed from the base of the PR and between 279f1b4 and 510e3ab.

📒 Files selected for processing (1)

src/odemis/gui/cont/tabs/cryo_chamber_tab.py

📝 Walkthrough

Walkthrough

This pull request adds an anonymized data collection pipeline and UI consent flow: a thread-safe DataCollector with config persistence and S3 upload backend; a background worker that serializes payloads to ZIPs and uploads with retries and queue-size enforcement; a CLI tool to fetch DataCollector ZIP samples from S3; automatic feature-level collection triggers (status changes, posture transitions, deletion) with per-project sampling; a ConsentDialog and GUI integrations to prompt and toggle consent; test coverage for datacollector and fetch utilities; and a Debian packaging dependency for python3-boto3.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant GUI as GUI/main.py
    participant DC as DataCollector
    participant Config as DataCollectorConfig
    participant Dialog as ConsentDialog

    User->>GUI: Launch application
    GUI->>DC: Instantiate DataCollector
    GUI->>DC: should_prompt_for_consent()
    DC->>Config: Read config file
    Config-->>DC: Return consent/reminder decision
    alt Prompt needed
        GUI->>Dialog: Show ConsentDialog
        Dialog-->>User: Prompt (Opt in / Opt out / Remind)
        User->>Dialog: Choose option
        Dialog-->>GUI: Return result
        GUI->>DC: set_consent()/postpone_consent()
        DC->>Config: Persist consent state
    else No prompt
        GUI->>GUI: Continue startup
    end

sequenceDiagram
    participant Controller as CryoFeatureController
    participant Feature as CryoFeature
    participant DC as DataCollector
    participant Worker as BackgroundWorker
    participant S3 as S3 Backend

    Feature->>Controller: status change or deletion or posture transition
    Controller->>Controller: Check feature.collect & eligibility
    alt Eligible & consent granted
        Controller->>DC: record(event_name, schema, payload)
        DC->>DC: Validate & enqueue _WorkItem
        Worker->>DC: Dequeue item
        Worker->>Worker: Serialize metadata + images -> ZIP
        Worker->>S3: Upload ZIP
        S3-->>Worker: Upload success
        Worker->>Worker: Delete local ZIP
    else Not eligible or no consent
        Controller->>Controller: Skip collection
    end

sequenceDiagram
    participant Admin
    participant CLI as odemis-dc-fetch
    participant Config as DataCollectorConfig
    participant S3 as S3 Bucket
    participant Disk as Output Directory

    Admin->>CLI: Run with --event/--since/--output
    CLI->>Config: Load credentials / backend config
    Config-->>CLI: S3 client info
    CLI->>S3: list_objects_v2 (one or more prefixes)
    S3-->>CLI: Paginated objects
    loop For each object key
        CLI->>CLI: Parse event & timestamp
        CLI->>CLI: Apply filters (--event, --since, --host)
        alt Matches
            CLI->>S3: GetObject(key)
            S3-->>Disk: Write file (skip if exists)
        else Skip
        end
    end
    CLI-->>Admin: Exit code (0 success, 1 failures, 2 parse error)

Possibly related PRs

[METEOR-1302][feature] Super z workflow #3245: Modifies src/odemis/acq/feature.py to add new CryoFeature attributes and serialization changes that intersect with the new collect flag and JSON round-trip handling in this PR.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'feature Meteor store annoted data' is misleading and does not accurately describe the actual changeset, which implements per-project feature data sampling for cryo chamber analysis.	Revise the title to accurately reflect the main change, e.g., 'feature Per-project sampling for cryo feature data collection' or similar that clarifies the feature's actual purpose.
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the per-project sampling mechanism, how features inherit the collection flag, and how loaded features are excluded, all of which align with the actual changes in the pull request.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (2)

src/odemis/gui/cont/menu.py (1)

46-54: Add type hints to the new consent menu API.

The changed constructor and new handlers should annotate all parameters and return types.

♻️ Proposed fix

+from typing import Any
+
@@
-    def __init__(self, main_data, main_frame, data_collector: DataCollector):
+    def __init__(self, main_data: Any, main_frame: wx.Frame, data_collector: DataCollector) -> None:
@@
-    def _append_data_sharing_menu_item(self, main_frame):
+    def _append_data_sharing_menu_item(self, main_frame: wx.Frame) -> wx.MenuItem | None:
@@
-    def _on_toggle_data_sharing(self, evt):
+    def _on_toggle_data_sharing(self, evt: wx.CommandEvent) -> None:

As per coding guidelines, **/*.py: Always use type hints for function parameters and return types in Python code.

Also applies to: 171-185

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/odemis/gui/cont/menu.py` around lines 46 - 54, Annotate the constructor
and the new consent menu handlers with explicit type hints: update __init__ to
declare parameter types (e.g., main_data: MainGUIData, main_frame: wx.Frame,
data_collector: DataCollector) and the return type -> None; then locate the
consent-related handler functions referenced around lines 171-185 and add full
parameter and return type annotations (e.g., event: wx.Event or appropriate
event type -> None, any other params typed to their domain types). Ensure you
import or reference the types (MainGUIData, DataCollector, wx.Frame/Event) at
the top of the module so the annotations are valid.

src/odemis/gui/win/consent.py (1)

16-59: Add docstrings for the new dialog methods.

The event handlers and initializer are new functions and should follow the project docstring rule.

♻️ Proposed fix

     def __init__(self, parent: wx.Window, remind_days: int) -> None:
+        """
+        Initialize the consent dialog.
+
+        :param parent: Parent window for the dialog.
+        :param remind_days: Number of days before prompting again.
+        """
         title = "Share data with Delmic"
@@
     def _on_opt_in(self, _evt: wx.CommandEvent) -> None:
+        """
+        Handle the opt-in button.
+        """
         self.EndModal(self.RESULT_OPT_IN)
 
     def _on_opt_out(self, _evt: wx.CommandEvent) -> None:
+        """
+        Handle the opt-out button.
+        """
         self.EndModal(self.RESULT_OPT_OUT)
 
     def _on_remind_later(self, _evt: wx.CommandEvent) -> None:
+        """
+        Handle the remind-later button.
+        """
         self.EndModal(self.RESULT_REMIND_LATER)
 
     def _on_close(self, _evt: wx.CloseEvent) -> None:
+        """
+        Handle closing the dialog without an explicit choice.
+        """
         self.EndModal(self.RESULT_REMIND_LATER)

As per coding guidelines, **/*.py: Include docstrings for all functions and classes, following the reStructuredText style guide, without type information and without using inline formatting markers or backticks.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/odemis/gui/win/consent.py` around lines 16 - 59, Add reStructuredText
docstrings to the Consent dialog methods: document the __init__ constructor and
each event handler method (_on_opt_in, _on_opt_out, _on_remind_later,
_on_close). For each docstring include a one-sentence description of the
method's purpose and, where helpful, describe important parameters (e.g., evt)
and the effect (which modal result is returned) using plain text (no type info,
no inline code/backticks), following the project's reST style conventions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/odemis/acq/feature.py`:
- Around line 573-584: The code currently sets channel_key = None when MD_OUT_WL
is missing, causing all such streams to collide and only the first to be kept;
change to use a per-stream fallback like the stream's position so missing
MD_OUT_WL yields a unique key. Iterate with an index over the concatenated list
(fm_zstacks + fm_images) and set channel_key =
s.raw[0].metadata.get(model.MD_OUT_WL, f"fallback:{i}") (or similar unique
identifier from the stream) before checking seen_channels; update references to
selected_fm and seen_channels accordingly.
- Around line 539-638: The race is that multiple threads can pass the initial if
not feature.collect check and all run until feature.collect is set False at the
end; make the one-shot flip atomic by performing an immediate compare-and-set at
the start of the routine (e.g., in collect_feature_data): replace the plain
boolean check of feature.collect with an atomic operation that sets
feature.collect to False only if it was True (or acquire a per-feature lock
keyed by feature id, check feature.collect and set it False while holding the
lock), and if the CAS/lock indicates collect was already False return early;
reference feature.collect and the top-level routine (collect_feature_data in
this file) when adding the atomic CAS or lock.

In `@src/odemis/acq/test/feature_test.py`:
- Around line 349-365: The test test_payload_has_no_feature_name only checks
that f.name.value ("my_secret_feature_name") is not present among payload keys;
update the assertions to also ensure the feature name does not appear in payload
values by checking captured.values() and the stringified values (for
nested/serialized values) after collect_feature_data runs — e.g., add assertions
using self.assertNotIn("my_secret_feature_name", captured.values()) and
self.assertNotIn("my_secret_feature_name", str(list(captured.values()))) to the
test (keep existing fake_record/captured usage and collect_feature_data
invocation).

In `@src/odemis/gui/cont/features.py`:
- Line 31: The module imports Dict and List from typing but uses Optional in the
annotation for self._status_collect_feature (type Optional[CryoFeature]) which
is undefined; update the import statement that currently lists Dict and List to
also import Optional so Optional is available for the annotation (refer to the
import line and the attribute self._status_collect_feature / CryoFeature usage
to locate the change).

In `@src/odemis/gui/cont/tabs/cryo_chamber_tab.py`:
- Around line 388-389: The code sets
self.tab_data_model.main.features_collectable using random.random() and
FEATURE_COLLECT_PROBABILITY but those names are not defined; import the random
module at the top of this module and add a module-level constant
FEATURE_COLLECT_PROBABILITY (e.g., a float like 0.1) before it’s used so
_change_project_conf() and any code referencing features_collectable (and
class/attribute names like tab_data_model.main.features_collectable) do not
raise NameError.

In `@src/odemis/gui/main.py`:
- Around line 436-441: The consent dialog updates consent via
self._data_collector.set_consent(...) but the Help > Share data menu checkbox
isn't refreshed; add a helper method refresh_data_sharing_state(self) in the
menu controller (e.g. class in src/odemis/gui/cont/menu.py) that does if
self._consent_menu_item is not None:
self._consent_menu_item.Check(self._data_collector.get_consent() is True), then
call that helper after the dialog result handling in main.py (after the branches
that call set_consent(True/False) or postpone_consent()) so the menu checkbox
reflects the new persisted consent state immediately.

In `@src/odemis/util/datacollector.py`:
- Around line 588-604: The loop currently does "if
self._process_pending_zips(...): continue" which starves new in-memory records;
change _run so that when _process_pending_zips(...) returns True you still
attempt to drain or process queued items instead of skipping the queue.get()
step — e.g., replace the continue with a non-blocking attempt to fetch work (use
self._queue.get_nowait() in a try/except queue.Empty or
self._queue.get(timeout=0.0)) and call self._process_work_item(item) if you get
one; keep the existing exception handling (_schedule_retry, logging.exception)
and avoid a tight busy-loop by falling back to the original blocking
self._queue.get(timeout=1.0) when no items are available.
- Around line 328-373: The event_name and payload keys are used directly to
build filesystem names (safe_event, zip_name, arc_name, extra_files paths) which
allows path traversal or unsafe ZIP entries; add a sanitization step that
normalizes item.event_name and each payload key into safe filenames (strip or
replace path separators like "/" and "\" and sequences like "..", collapse to a
whitelist of allowed chars such as alphanumerics, hyphen, underscore, enforce a
max length like 64) before using them to construct zip_name, tmp_dir children,
or arc_name; apply this sanitizer to safe_event, every arc_name (e.g., when
creating extra_{key}.json or {key}.h5/.ome.tiff) and when writing to tmp_dir or
adding to extra_files so no user-supplied string can escape tmp_dir or create
dangerous ZIP entries (update references in the code around safe_event,
zip_name, arc_name, payload_meta assignments, and extra_files population).

In `@src/odemis/util/dc_fetch.py`:
- Around line 40-47: The ISO parse path must normalize a trailing 'Z' before
calling datetime.fromisoformat to maintain Python 3.10 compatibility: in the
code handling text (the branch that calls datetime.fromisoformat(text)), detect
and replace a trailing 'Z' (or '+00:00' equivalent if present) with '+00:00' or
otherwise remove it so fromisoformat won't raise ValueError, then proceed to set
tzinfo to timezone.utc when parsed.tzinfo is None and use
parsed.astimezone(timezone.utc) when it has tzinfo; update the logic around the
parsed = datetime.fromisoformat(text) call accordingly.

---

Nitpick comments:
In `@src/odemis/gui/cont/menu.py`:
- Around line 46-54: Annotate the constructor and the new consent menu handlers
with explicit type hints: update __init__ to declare parameter types (e.g.,
main_data: MainGUIData, main_frame: wx.Frame, data_collector: DataCollector) and
the return type -> None; then locate the consent-related handler functions
referenced around lines 171-185 and add full parameter and return type
annotations (e.g., event: wx.Event or appropriate event type -> None, any other
params typed to their domain types). Ensure you import or reference the types
(MainGUIData, DataCollector, wx.Frame/Event) at the top of the module so the
annotations are valid.

In `@src/odemis/gui/win/consent.py`:
- Around line 16-59: Add reStructuredText docstrings to the Consent dialog
methods: document the __init__ constructor and each event handler method
(_on_opt_in, _on_opt_out, _on_remind_later, _on_close). For each docstring
include a one-sentence description of the method's purpose and, where helpful,
describe important parameters (e.g., evt) and the effect (which modal result is
returned) using plain text (no type info, no inline code/backticks), following
the project's reST style conventions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2aab52b5-1622-44b1-9a6a-a474e984fe44

📥 Commits

Reviewing files that changed from the base of the PR and between 2b55f3c and 279f1b4.

📒 Files selected for processing (15)

debian/control
scripts/odemis-dc-fetch.py
src/odemis/acq/feature.py
src/odemis/acq/test/feature_test.py
src/odemis/acq/test/test-features.json
src/odemis/gui/cont/features.py
src/odemis/gui/cont/menu.py
src/odemis/gui/cont/tabs/cryo_chamber_tab.py
src/odemis/gui/main.py
src/odemis/gui/model/tab_gui_data.py
src/odemis/gui/win/consent.py
src/odemis/util/datacollector.py
src/odemis/util/dc_fetch.py
src/odemis/util/test/datacollector_test.py
src/odemis/util/test/dc_fetch_test.py

coderabbitai · 2026-04-22T12:05:56Z

+    if not feature.collect:
+        return
+
+    try:
+        _dc = DataCollector()
+        if not _dc.get_consent():
+            return
+    except Exception:
+        logging.exception("collect_feature_data: failed to access DataCollector; skipping.")
+        return
+
+    try:
+
+        # Load feature streams from disk when not yet in memory.
+        if not feature.streams.value and project_dir:
+            try:
+                load_feature_streams_from_disk(feature, project_dir)
+            except Exception:
+                logging.exception(
+                    "collect_feature_data: failed to load streams; skipping.")
+                return
+
+        feature_streams = list(feature.streams.value)
+
+        # Collect first z-stack per FM channel; fall back to first FM image per channel.
+        fm_zstacks: List = []
+        fm_images: List = []
+        for s in feature_streams:
+            if isinstance(s, StaticFluoStream):
+                if _is_zstack_stream(s):
+                    fm_zstacks.append(s)
+                else:
+                    fm_images.append(s)
+
+        # Per channel: prefer z-stack, then plain FM image.
+        # Channels are delineated by MD_OUT_WL; use index as fallback.
+        selected_fm: List = []
+        seen_channels: set = set()
+        for s in fm_zstacks + fm_images:
+            try:
+                channel_key = s.raw[0].metadata.get(model.MD_OUT_WL)
+            except (IndexError, AttributeError):
+                channel_key = None
+            if channel_key not in seen_channels:
+                seen_channels.add(channel_key)
+                selected_fm.append(s)
+
+        # Collect spatially overlapping overview streams.
+        stage_pos = feature.stage_position.value
+        feat_x = stage_pos.get("x", 0.0)
+        feat_y = stage_pos.get("y", 0.0)
+
+        overview_fm: List = []
+        overview_sem: List = []
+        for s in (overview_streams or []):
+            if not _stream_overlaps_position(s, feat_x, feat_y):
+                continue
+            if isinstance(s, StaticFluoStream):
+                overview_fm.append(s)
+            elif isinstance(s, (StaticSEMStream, StaticFIBStream)):
+                overview_sem.append(s)
+
+        # Build privacy-preserving payload — generic keys, no names or filenames.
+        payload: dict = {
+            "status": feature.status.value,
+            "stage_position": dict(stage_pos),
+            "fm_focus_position": dict(feature.fm_focus_position.value),
+        }
+
+        def _get_raw(stream: "Stream") -> Optional["model.DataArray"]:
+            try:
+                return stream.raw[0] if stream.raw else None
+            except Exception:
+                return None
+
+        for idx, s in enumerate(selected_fm):
+            da = _get_raw(s)
+            if da is not None:
+                payload[f"channel_{idx}"] = da
+
+        for idx, s in enumerate(overview_fm):
+            da = _get_raw(s)
+            if da is not None:
+                payload[f"overview_fm_{idx}"] = da
+
+        for idx, s in enumerate(overview_sem):
+            da = _get_raw(s)
+            if da is not None:
+                payload[f"overview_sem_{idx}"] = da
+
+        image_keys = [k for k in payload if k.startswith(("channel_", "overview_fm_", "overview_sem_"))]
+        if not image_keys:
+            logging.debug(
+                "collect_feature_data: no images found for feature; skipping upload."
+            )
+            return
+
+        _dc.record("feature_collected", "1.0", payload)
+
+        feature.collect = False


⚠️ Potential issue | 🟠 Major

Make the one-shot collect transition atomic.

Concurrent status/posture/delete triggers can all pass Line 539 before Line 638 flips feature.collect, causing duplicate submissions for the same feature.

Proposed fix

FEATURE_COLLECT_PROBABILITY = 0.2 +_COLLECTION_STATE_LOCK = threading.Lock()

- _dc.record("feature_collected", "1.0", payload) - - feature.collect = False + with _COLLECTION_STATE_LOCK: + if not feature.collect: + return + feature.collect = False + + try: + _dc.record("feature_collected", "1.0", payload) + except Exception: + with _COLLECTION_STATE_LOCK: + feature.collect = True + raise logging.debug("collect_feature_data: submitted feature data for collection.")

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 611-611: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/odemis/acq/feature.py` around lines 539 - 638, The race is that multiple threads can pass the initial if not feature.collect check and all run until feature.collect is set False at the end; make the one-shot flip atomic by performing an immediate compare-and-set at the start of the routine (e.g., in collect_feature_data): replace the plain boolean check of feature.collect with an atomic operation that sets feature.collect to False only if it was True (or acquire a per-feature lock keyed by feature id, check feature.collect and set it False while holding the lock), and if the CAS/lock indicates collect was already False return early; reference feature.collect and the top-level routine (collect_feature_data in this file) when adding the atomic CAS or lock.

coderabbitai · 2026-04-22T12:05:56Z

+        # Per channel: prefer z-stack, then plain FM image.
+        # Channels are delineated by MD_OUT_WL; use index as fallback.
+        selected_fm: List = []
+        seen_channels: set = set()
+        for s in fm_zstacks + fm_images:
+            try:
+                channel_key = s.raw[0].metadata.get(model.MD_OUT_WL)
+            except (IndexError, AttributeError):
+                channel_key = None
+            if channel_key not in seen_channels:
+                seen_channels.add(channel_key)
+                selected_fm.append(s)


⚠️ Potential issue | 🟠 Major

Use a real per-stream fallback channel key.

The comment says “use index as fallback,” but Line 581 sets channel_key = None; every stream missing MD_OUT_WL then shares the same key, so only the first such FM stream is collected.

Proposed fix

- for s in fm_zstacks + fm_images: + for idx, s in enumerate(fm_zstacks + fm_images): + channel_key = ("index", idx) try: - channel_key = s.raw[0].metadata.get(model.MD_OUT_WL) - except (IndexError, AttributeError): - channel_key = None + out_wl = s.raw[0].metadata.get(model.MD_OUT_WL) + if out_wl is not None: + channel_key = ("out_wl", repr(out_wl)) + except (IndexError, AttributeError, TypeError): + pass if channel_key not in seen_channels: seen_channels.add(channel_key) selected_fm.append(s)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/odemis/acq/feature.py` around lines 573 - 584, The code currently sets channel_key = None when MD_OUT_WL is missing, causing all such streams to collide and only the first to be kept; change to use a per-stream fallback like the stream's position so missing MD_OUT_WL yields a unique key. Iterate with an index over the concatenated list (fm_zstacks + fm_images) and set channel_key = s.raw[0].metadata.get(model.MD_OUT_WL, f"fallback:{i}") (or similar unique identifier from the stream) before checking seen_channels; update references to selected_fm and seen_channels accordingly.

coderabbitai · 2026-04-22T12:05:56Z

+    def test_payload_has_no_feature_name(self):
+        """Payload must not contain the feature name string as a key or value."""
+        f = self._make_feature_with_stream(collect=True)
+        f.name.value = "my_secret_feature_name"
+        captured = {}
+
+        def fake_record(event_name, schema_version, payload, **kwargs):
+            captured.update(payload)
+
+        with patch("odemis.acq.feature.DataCollector") as MockDC:
+            MockDC.return_value.get_consent.return_value = True
+            MockDC.return_value.record.side_effect = fake_record
+            collect_feature_data(f)
+
+        self.assertNotIn("my_secret_feature_name", captured)
+        self.assertNotIn("my_secret_feature_name", str(captured.keys()))
+


⚠️ Potential issue | 🟡 Minor

Assert the feature name is absent from payload values too.

The docstring says “as a key or value,” but the assertions only inspect keys. A leaked feature name in "status", metadata, or another value would not fail this test.

Proposed fix

self.assertNotIn("my_secret_feature_name", captured) self.assertNotIn("my_secret_feature_name", str(captured.keys())) + self.assertNotIn("my_secret_feature_name", str(captured.values()))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/odemis/acq/test/feature_test.py` around lines 349 - 365, The test test_payload_has_no_feature_name only checks that f.name.value ("my_secret_feature_name") is not present among payload keys; update the assertions to also ensure the feature name does not appear in payload values by checking captured.values() and the stringified values (for nested/serialized values) after collect_feature_data runs — e.g., add assertions using self.assertNotIn("my_secret_feature_name", captured.values()) and self.assertNotIn("my_secret_feature_name", str(list(captured.values()))) to the test (keep existing fake_record/captured usage and collect_feature_data invocation).

coderabbitai · 2026-04-22T12:05:56Z

+import threading

 import wx
 from typing import Dict, List


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

head -n 110 src/odemis/gui/cont/features.py | tail -n 80

Repository: delmic/odemis

Length of output: 3014

Import Optional from typing.

Line 98 uses Optional[CryoFeature] in the type annotation for self._status_collect_feature, but line 31 only imports Dict and List from typing. This causes an undefined name error and must be fixed for Python 3.10+ code validity.

Proposed fix

-from typing import Dict, List +from typing import Dict, List, Optional

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from typing import Dict, List

from typing import Dict, List, Optional

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/odemis/gui/cont/features.py` at line 31, The module imports Dict and List from typing but uses Optional in the annotation for self._status_collect_feature (type Optional[CryoFeature]) which is undefined; update the import statement that currently lists Dict and List to also import Optional so Optional is available for the annotation (refer to the import line and the attribute self._status_collect_feature / CryoFeature usage to locate the change).

coderabbitai · 2026-04-22T12:05:57Z

+            if result == ConsentDialog.RESULT_OPT_IN:
+                self._data_collector.set_consent(True)
+            elif result == ConsentDialog.RESULT_OPT_OUT:
+                self._data_collector.set_consent(False)
+            else:
+                self._data_collector.postpone_consent()


⚠️ Potential issue | 🟠 Major

Keep the menu consent checkbox synchronized after the dialog.

The menu item is initialized before this dialog runs, so opting in here leaves Help > Share data unchecked while collection consent is enabled.

🛠️ Proposed fix

if result == ConsentDialog.RESULT_OPT_IN: self._data_collector.set_consent(True) elif result == ConsentDialog.RESULT_OPT_OUT: self._data_collector.set_consent(False) else: self._data_collector.postpone_consent() + + if self._menu_controller is not None: + self._menu_controller.refresh_data_sharing_state()

Add this helper in src/odemis/gui/cont/menu.py:

def refresh_data_sharing_state(self) -> None: """Refresh the data sharing menu checkbox from persisted consent.""" if self._consent_menu_item is not None: self._consent_menu_item.Check(self._data_collector.get_consent() is True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/odemis/gui/main.py` around lines 436 - 441, The consent dialog updates consent via self._data_collector.set_consent(...) but the Help > Share data menu checkbox isn't refreshed; add a helper method refresh_data_sharing_state(self) in the menu controller (e.g. class in src/odemis/gui/cont/menu.py) that does if self._consent_menu_item is not None: self._consent_menu_item.Check(self._data_collector.get_consent() is True), then call that helper after the dialog result handling in main.py (after the branches that call set_consent(True/False) or postpone_consent()) so the menu checkbox reflects the new persisted consent state immediately.

coderabbitai · 2026-04-22T12:05:57Z

+        ``consent`` is always present as ``none`` / ``true`` / ``false``.
+        ``reminder_date`` is commented out until explicitly set, then written
+        in ``YYYY-MM-DD`` format.
+        """
+        self.file_path.parent.mkdir(parents=True, exist_ok=True)
+
+        consent_val = self.consent
+        remind_val = self.remind_date
+
+        if consent_val is True:
+            consent_line = "consent = true"
+        elif consent_val is False:
+            consent_line = "consent = false"
+        else:
+            consent_line = "consent = none"


⚠️ Potential issue | 🟠 Major

Handle persisted consent = none when reading config.

_write() persists undecided consent as none, but getboolean() raises ValueError for that value. A fresh DataCollectorConfig() reading the file can then break consent checks instead of returning None.

Proposed fix

`@property` def consent(self) -> Optional[bool]: """Return the consent state, or ``None`` if not yet set.""" try: - return self._cp.getboolean("general", "consent") + value = self._cp.get("general", "consent").strip().lower() except (configparser.NoSectionError, configparser.NoOptionError): return None + if value in ("", "none", "null"): + return None + if value in ("1", "yes", "true", "on"): + return True + if value in ("0", "no", "false", "off"): + return False + logging.warning("Invalid datacollector consent value %r; treating as undecided.", value) + return None

Also applies to: 190-195

coderabbitai · 2026-04-22T12:05:57Z

+    # Limit event_name length so the filename stays within filesystem limits.
+    safe_event = item.event_name[:64] if item.event_name else "event"
+    zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip"
+
+    tmp_dir = Path(tempfile.mkdtemp(prefix="dc_"))
+    try:
+        payload_meta: dict = {}
+        extra_files: list = []  # list of (arcname, abs_path)
+
+        for key, value in item.payload.items():
+            if value is None or isinstance(value, (str, int, float, bool)):
+                payload_meta[key] = value
+
+            elif isinstance(value, numpy.ndarray):
+                if item.image_format.upper() == "HDF5":
+                    arc_name = f"{key}.h5"
+                    abs_path = tmp_dir / arc_name
+                    try:
+                        da = value if isinstance(value, model.DataArray) else model.DataArray(value)
+                        hdf5.export(str(abs_path), da)
+                    except Exception:
+                        logging.exception("Failed to export DataArray to HDF5 at %s", abs_path)
+                        abs_path = None
+                else:
+                    arc_name = f"{key}.ome.tiff"
+                    abs_path = tmp_dir / arc_name
+                    try:
+                        da = value if isinstance(value, model.DataArray) else model.DataArray(value)
+                        tiff.export(str(abs_path), da)
+                    except Exception:
+                        logging.exception("Failed to export DataArray to TIFF at %s", abs_path)
+                        abs_path = None
+
+                if abs_path is not None and abs_path.exists():
+                    extra_files.append((arc_name, abs_path))
+                    payload_meta[key] = arc_name
+                else:
+                    payload_meta[key] = None
+                    payload_meta["export_error"] = True
+
+            elif isinstance(value, (dict, list)):
+                arc_name = f"extra_{key}.json"
+                abs_path = tmp_dir / arc_name
+                abs_path.write_text(json.dumps(value, default=str), encoding="utf-8")
+                extra_files.append((arc_name, abs_path))
+                payload_meta[key] = arc_name


⚠️ Potential issue | 🟠 Major

Sanitize event names and payload keys before using them as paths.

event_name and payload keys flow into filesystem paths and ZIP arcnames. Values containing / or .. can write outside the queue/temp directory or create unsafe ZIP entries.

Proposed fix

+import re

+_SAFE_FILENAME_RE = re.compile(r"[^A-Za-z0-9_.-]+") + + +def _safe_name(value: object, fallback: str, max_length: int = 64) -> str: + """Return a filesystem-safe component for generated archive names.""" + safe = _SAFE_FILENAME_RE.sub("_", str(value).strip())[:max_length].strip("._-") + return safe or fallback + + def _serialize(item: _WorkItem, queue_dir: Path) -> Path:

- # Limit event_name length so the filename stays within filesystem limits. - safe_event = item.event_name[:64] if item.event_name else "event" + safe_event = _safe_name(item.event_name, "event") zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip"

for key, value in item.payload.items(): + safe_key = _safe_name(key, "payload") if value is None or isinstance(value, (str, int, float, bool)): payload_meta[key] = value elif isinstance(value, numpy.ndarray): if item.image_format.upper() == "HDF5": - arc_name = f"{key}.h5" + arc_name = f"{safe_key}.h5" abs_path = tmp_dir / arc_name @@ else: - arc_name = f"{key}.ome.tiff" + arc_name = f"{safe_key}.ome.tiff" abs_path = tmp_dir / arc_name @@ elif isinstance(value, (dict, list)): - arc_name = f"extra_{key}.json" + arc_name = f"extra_{safe_key}.json" abs_path = tmp_dir / arc_name

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/odemis/util/datacollector.py` around lines 328 - 373, The event_name and payload keys are used directly to build filesystem names (safe_event, zip_name, arc_name, extra_files paths) which allows path traversal or unsafe ZIP entries; add a sanitization step that normalizes item.event_name and each payload key into safe filenames (strip or replace path separators like "/" and "\" and sequences like "..", collapse to a whitelist of allowed chars such as alphanumerics, hyphen, underscore, enforce a max length like 64) before using them to construct zip_name, tmp_dir children, or arc_name; apply this sanitizer to safe_event, every arc_name (e.g., when creating extra_{key}.json or {key}.h5/.ome.tiff) and when writing to tmp_dir or adding to extra_files so no user-supplied string can escape tmp_dir or create dangerous ZIP entries (update references in the code around safe_event, zip_name, arc_name, payload_meta assignments, and extra_files population).

coderabbitai · 2026-04-22T12:05:57Z

+    def _run(self) -> None:
+        """Main loop: process items until the thread is stopped."""
+        while True:
+            try:
+                if self._process_pending_zips(self._queue_dir):
+                    continue
+            except Exception:
+                logging.exception("DataCollector error while processing pending uploads")
+                self._schedule_retry()
+                continue
+            try:
+                item = self._queue.get(timeout=1.0)
+            except queue.Empty:
+                continue
+
+            try:
+                self._process_work_item(item)


⚠️ Potential issue | 🟠 Major

Do not starve new records while pending uploads are backing off.

When _process_pending_zips() returns True during retry backoff, Line 593 immediately loops and skips _queue.get(). New samples then remain only in memory until uploads recover, so they can be lost on shutdown.

Proposed fix

def _run(self) -> None: """Main loop: process items until the thread is stopped.""" while True: try: - if self._process_pending_zips(self._queue_dir): - continue + self._process_pending_zips(self._queue_dir) except Exception: logging.exception("DataCollector error while processing pending uploads") self._schedule_retry() - continue try: item = self._queue.get(timeout=1.0) except queue.Empty: continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/odemis/util/datacollector.py` around lines 588 - 604, The loop currently does "if self._process_pending_zips(...): continue" which starves new in-memory records; change _run so that when _process_pending_zips(...) returns True you still attempt to drain or process queued items instead of skipping the queue.get() step — e.g., replace the continue with a non-blocking attempt to fetch work (use self._queue.get_nowait() in a try/except queue.Empty or self._queue.get(timeout=0.0)) and call self._process_work_item(item) if you get one; keep the existing exception handling (_schedule_retry, logging.exception) and avoid a tight busy-loop by falling back to the original blocking self._queue.get(timeout=1.0) when no items are available.

coderabbitai · 2026-04-22T12:05:57Z

+    text = value.strip()
+    if len(text) == 10:
+        parsed = datetime.strptime(text, "%Y-%m-%d")
+        return parsed.replace(tzinfo=timezone.utc)
+    parsed = datetime.fromisoformat(text)
+    if parsed.tzinfo is None:
+        return parsed.replace(tzinfo=timezone.utc)
+    return parsed.astimezone(timezone.utc)


⚠️ Potential issue | 🟡 Minor

❓ Verification inconclusive

Script executed:

#!/bin/bash python3.10 - <<'PY' from datetime import datetime datetime.fromisoformat("2026-03-01T12:30:00Z") PY

Repository: delmic/odemis

Repository: delmic/odemis
Exit code: 127

stderr:

/bin/bash: line 2: python3.10: command not found

Script executed:

# First, locate and examine the target file head -60 src/odemis/util/dc_fetch.py | tail -30

Repository: delmic/odemis

Repository: delmic/odemis
Exit code: 0

stdout:

from odemis.util.datacollector import DataCollectorConfig, S3UploadBackend def parse_since_utc(value: str) -> datetime: """Parse a date/datetime string to UTC-aware datetime. Accepts ISO-8601 date (`YYYY-MM-DD`) and datetime (`YYYY-MM-DDTHH:MM:SS` with optional timezone). """ text = value.strip() if len(text) == 10: parsed = datetime.strptime(text, "%Y-%m-%d") return parsed.replace(tzinfo=timezone.utc) parsed = datetime.fromisoformat(text) if parsed.tzinfo is None: return parsed.replace(tzinfo=timezone.utc) return parsed.astimezone(timezone.utc) def parse_key_timestamp_utc(key: str) -> Optional[datetime]: """Parse `<event>-<YYYYMMDDTHHmmss>-<uuid>.zip` timestamp from key basename.""" name = Path(key).name if not name.endswith(".zip"): return None stem = name[:-4] parts = stem.rsplit("-", 2) if len(parts) != 3: return None ts = parts[1] try:

Script executed:

# Check Python version requirements in the project find . -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" | head -5 | xargs cat 2>/dev/null | grep -i "python" | head -20

Repository: delmic/odemis

Repository: delmic/odemis
Exit code: 0

stdout:

#!/usr/bin/env python3 Run `python setup.py build_ext --inplace` to compile #!/usr/bin/env python3 # sudo apt-get install python-setuptools cython # python3 setup.py build_ext --inplace "Programming Language :: Python",

Script executed:

# Also check if there are any existing tests for this function find . -name "*test*dc_fetch*" -o -path "*/test*" -name "*dc_fetch*" 2>/dev/null

Repository: delmic/odemis

Repository: delmic/odemis
Exit code: 0

stdout:

./src/odemis/util/test/dc_fetch_test.py

Python 3.10 datetime.fromisoformat() support for trailing Z timezone

No, Python 3.10's datetime.fromisoformat does not support trailing 'Z' timezone suffix. It raises ValueError for strings like '2011-11-04T00:05:23Z'. Support was added in Python 3.11, which expanded parsing to most ISO 8601 formats including 'Z' as UTC alias. In 3.10, it only parses formats matching datetime.isoformat output, which uses '+00:00' not 'Z'. Official 3.10 docs explicitly state it does not support arbitrary ISO 8601 strings. Python 3.10 receives only security fixes, no feature backports. Workaround: replace('Z', '+00:00').

Citations:

1: https://docs.python.org/release/3.10.10/library/datetime.html

2: https://docs.python.org/release/3.10.19/library/datetime.html

3: Datetime fromisoformat does not parse expected string python/cpython#128686

4: "ValueError: Invalid isoformat string" from a valid string python/cpython#90187

5: https://stackoverflow.com/questions/75867446/documentation-example-for-datetime-fromisoformat-raises-invalid-isoformat-string

6: https://www.reddit.com/r/Python/comments/163ugsm/psa_as_of_python_311_datetimefromisoformat/

7: https://docs.python.org/3/library/datetime.html

Script executed:

# Find Python version requirement in project root find . -maxdepth 2 -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" -o -name "README.md" | xargs grep -l "python" 2>/dev/null | head -3

Repository: delmic/odemis

Repository: delmic/odemis
Exit code: 0

stdout:

./setup.py

Script executed:

# Check the test file for parse_since_utc cat src/odemis/util/test/dc_fetch_test.py

Repository: delmic/odemis

Repository: delmic/odemis
Exit code: 0

stdout:

#!/usr/bin/env python3 """ Created on 11 March 2026 `@author`: Karishma Kumar Copyright © 2026 Karishma Kumar, Delmic This file is part of Odemis. Odemis is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. Odemis is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Odemis. If not, see http://www.gnu.org/licenses/. """ import tempfile import unittest from datetime import datetime, timezone from pathlib import Path from unittest.mock import Mock, patch from odemis.util import dc_fetch class DCFetchTest(unittest.TestCase): """Unit tests for S3 retrieval helpers.""" def test_parse_since_utc_date(self) -> None: """Date input should parse as UTC midnight.""" parsed = dc_fetch.parse_since_utc("2026-03-22") self.assertEqual(parsed, datetime(2026, 3, 22, 0, 0, 0, tzinfo=timezone.utc)) def test_parse_key_timestamp(self) -> None: """Timestamp should be parsed from key basename.""" parsed = dc_fetch.parse_key_timestamp_utc("host/z_stack_acquired-20260322T104530-a1b2c3d4.zip") self.assertEqual(parsed, datetime(2026, 3, 22, 10, 45, 30, tzinfo=timezone.utc)) def test_should_download_key_filters(self) -> None: """Event and since filters should both be enforced.""" key = "host/z_stack_acquired-20260322T104530-a1b2c3d4.zip" since_before = datetime(2026, 3, 22, 10, 0, 0, tzinfo=timezone.utc) since_after = datetime(2026, 3, 22, 11, 0, 0, tzinfo=timezone.utc) self.assertTrue(dc_fetch.should_download_key(key, "z_stack_acquired", since_before)) self.assertFalse(dc_fetch.should_download_key(key, "other_event", since_before)) self.assertFalse(dc_fetch.should_download_key(key, "z_stack_acquired", since_after)) def test_iter_s3_objects_paginates(self) -> None: """S3 iterator should follow continuation tokens.""" client = Mock() client.list_objects_v2.side_effect = [ { "Contents": [{"Key": "host/a.zip"}], "IsTruncated": True, "NextContinuationToken": "token-1", }, { "Contents": [{"Key": "host/b.zip"}], "IsTruncated": False, }, ] keys = [item["Key"] for item in dc_fetch.iter_s3_objects(client, "bucket", "host/")] self.assertEqual(keys, ["host/a.zip", "host/b.zip"]) self.assertEqual(client.list_objects_v2.call_count, 2) def test_parse_host_filters_comma_list(self) -> None: """Host parser should accept comma-separated values and normalize them.""" hosts = dc_fetch.parse_host_filters("meteor-5099, atlas-001 ,/secom-22/") self.assertEqual(hosts, ["meteor-5099", "atlas-001", "secom-22"]) def test_fetch_samples_downloads_matching_keys(self) -> None: """Fetch flow should download matching keys and report counters.""" with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir: output_dir = Path(tmp_dir) client = Mock() client.list_objects_v2.return_value = { "Contents": [ {"Key": "host/evt-20260322T100000-aaaa1111.zip"}, {"Key": "host/other-20260322T100000-bbbb2222.zip"}, ], "IsTruncated": False, } def _download_file(_bucket: str, _key: str, filename: str) -> None: Path(filename).write_bytes(b"zip") client.download_file.side_effect = _download_file with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")): result = dc_fetch.fetch_samples( event_filter="evt", since_utc=datetime(2026, 3, 22, 9, 0, 0, tzinfo=timezone.utc), output_dir=output_dir, ) self.assertEqual(result["listed"], 2) self.assertEqual(result["matched"], 1) self.assertEqual(result["downloaded"], 1) self.assertEqual(result["failed"], 0) self.assertTrue((output_dir / "evt-20260322T100000-aaaa1111.zip").exists()) def test_fetch_samples_applies_host_filter_prefix(self) -> None: """Host filter should become the S3 list prefix.""" with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir: output_dir = Path(tmp_dir) client = Mock() client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False} with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")): dc_fetch.fetch_samples( event_filter=None, since_utc=None, output_dir=output_dir, host_filter="meteor-5099", ) call_kwargs = client.list_objects_v2.call_args.kwargs self.assertEqual(call_kwargs["Bucket"], "bucket") self.assertEqual(call_kwargs["Prefix"], "meteor-5099/") def test_fetch_samples_applies_multiple_host_prefixes(self) -> None: """Comma-separated hosts should trigger one listing call per host prefix.""" with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir: output_dir = Path(tmp_dir) client = Mock() client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False} with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")): dc_fetch.fetch_samples( event_filter=None, since_utc=None, output_dir=output_dir, host_filter="meteor-5099,atlas-001", ) self.assertEqual(client.list_objects_v2.call_count, 2) first_prefix = client.list_objects_v2.call_args_list[0].kwargs["Prefix"] second_prefix = client.list_objects_v2.call_args_list[1].kwargs["Prefix"] self.assertEqual(first_prefix, "meteor-5099/") self.assertEqual(second_prefix, "atlas-001/") def test_fetch_samples_passes_bucket_endpoint_region_overrides(self) -> None: """Overrides should be forwarded to the S3 client builder.""" with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir: output_dir = Path(tmp_dir) client = Mock() client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False} with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")) as builder: dc_fetch.fetch_samples( event_filter=None, since_utc=None, output_dir=output_dir, host_filter=None, bucket_override="other-bucket", endpoint_override="https://s3.eu-west-1.amazonaws.com", region_override="eu-west-1", ) kwargs = builder.call_args.kwargs self.assertEqual(kwargs["bucket_override"], "other-bucket") self.assertEqual(kwargs["endpoint_override"], "https://s3.eu-west-1.amazonaws.com") self.assertEqual(kwargs["region_override"], "eu-west-1") def test_build_s3_client_uses_backend_region_as_default(self) -> None: """build_s3_client_from_config should use backend._region when no override is given.""" from odemis.util.datacollector import S3UploadBackend, S3_REGION backend = S3UploadBackend( access_key="key", secret_key="secret", region=S3_REGION, bucket="test-bucket", ) mock_config = Mock() mock_config.get_upload_backend.return_value = backend with patch("boto3.client") as mock_boto3_client: mock_boto3_client.return_value = Mock() dc_fetch.build_s3_client_from_config(mock_config) call_kwargs = mock_boto3_client.call_args.kwargs self.assertEqual(call_kwargs.get("region_name"), S3_REGION) # endpoint_url must be None so boto3 resolves the regional endpoint automatically self.assertIsNone(call_kwargs.get("endpoint_url")) if __name__ == "__main__": unittest.main()

Script executed:

# Look for Python version spec in setup.py head -100 setup.py 2>/dev/null | grep -A5 -B5 "python"

Repository: delmic/odemis

Repository: delmic/odemis
Exit code: 0

stdout:

#!/usr/bin/env python3 # -*- coding: utf-8 -*- # To rebuild just the cython modules, use these commands: # sudo apt-get install python-setuptools cython # python3 setup.py build_ext --inplace from setuptools import setup, find_packages, Extension from Cython.Build import cythonize # Warning: must be _after_ setup import import glob import os import subprocess

Normalize trailing Z before fromisoformat() for Python 3.10 compatibility.

UTC timestamps with trailing Z (e.g., 2026-03-01T12:30:00Z) are standard ISO 8601 format commonly provided as CLI input. However, datetime.fromisoformat() does not support the Z suffix in Python 3.10—it raises ValueError. Support was added in Python 3.11. Since the coding guidelines require Python 3.10+ compatibility, this code path needs a normalization step.

Proposed fix

text = value.strip() if len(text) == 10: parsed = datetime.strptime(text, "%Y-%m-%d") return parsed.replace(tzinfo=timezone.utc) + if text.endswith("Z"): + text = text[:-1] + "+00:00" parsed = datetime.fromisoformat(text)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

text = value.strip()

if len(text) == 10:

parsed = datetime.strptime(text, "%Y-%m-%d")

return parsed.replace(tzinfo=timezone.utc)

parsed = datetime.fromisoformat(text)

if parsed.tzinfo is None:

return parsed.replace(tzinfo=timezone.utc)

return parsed.astimezone(timezone.utc)

text = value.strip()

if len(text) == 10:

parsed = datetime.strptime(text, "%Y-%m-%d")

return parsed.replace(tzinfo=timezone.utc)

if text.endswith("Z"):

text = text[:-1] + "+00:00"

parsed = datetime.fromisoformat(text)

if parsed.tzinfo is None:

return parsed.replace(tzinfo=timezone.utc)

return parsed.astimezone(timezone.utc)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/odemis/util/dc_fetch.py` around lines 40 - 47, The ISO parse path must normalize a trailing 'Z' before calling datetime.fromisoformat to maintain Python 3.10 compatibility: in the code handling text (the branch that calls datetime.fromisoformat(text)), detect and replace a trailing 'Z' (or '+00:00' equivalent if present) with '+00:00' or otherwise remove it so fromisoformat won't raise ValueError, then proceed to set tzinfo to timezone.utc when parsed.tzinfo is None and use parsed.astimezone(timezone.utc) when it has tzinfo; update the logic around the parsed = datetime.fromisoformat(text) call accordingly.

K4rishma added 6 commits April 21, 2026 10:37

[feature] implement store annotated data framework

1f2c680

Provides a thread-safe, non-blocking ``DataCollector.record()`` call that any Odemis module can invoke to capture a labelled data sample. CLI interface is implemented to download the data from the cloud storage.

store annoted data on Meteor

895588f

make images required to upload

c94530d

trigger on feature change

547d2c9

use env variable to use test bucket

8c7f409

change the collection per project per session

279f1b4

Copilot AI review requested due to automatic review settings April 22, 2026 11:56

github-actions Bot added the size/L label Apr 22, 2026

Copilot started reviewing on behalf of K4rishma April 22, 2026 11:56 View session

K4rishma marked this pull request as draft April 22, 2026 11:56

Copilot AI reviewed Apr 22, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

fix import issues in cryo_chamber_tab

510e3ab

		# Feature whose status VA we are subscribed to for data-collection triggering.
		self._status_collect_feature: Optional[CryoFeature] = None

	from typing import Dict, List
	from typing import Dict, List, Optional

Conversation

K4rishma commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

Possibly related PRs

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

K4rishma commented Apr 22, 2026 •

edited

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited by atlassian Bot

Loading