Skip to content

[feature][MSD-506] Meteor store annoted data#3451

Draft
K4rishma wants to merge 7 commits into
delmic:masterfrom
K4rishma:meteor_store_annoted_data
Draft

[feature][MSD-506] Meteor store annoted data#3451
K4rishma wants to merge 7 commits into
delmic:masterfrom
K4rishma:meteor_store_annoted_data

Conversation

@K4rishma
Copy link
Copy Markdown
Contributor

@K4rishma K4rishma commented Apr 22, 2026

Builds on PR of Building the data framework #3444

Overview

Introduces per-project sampling for cryo feature data collection. When a project is opened or created, a single random decision is made (at 20% probability) that applies uniformly to all features created during that session. Features loaded from a previous session are immediately excluded from collection.


How it works

Single decision per project (cryo_chamber_tab.py)
_change_project_conf() is called every time a project is opened or created. It makes one random draw (random.random() \< FEATURE_COLLECT_PROBABILITY) and stores the result as main.features_collectable — a lightweight dynamic attribute on the main GUI data model (not a formal VA, never persisted to disk).

All new features inherit the decision (tab_gui_data.py)
CryoGUIData.add_new_feature() reads getattr(self.main, "features_collectable", False) and passes it as collect= when constructing each CryoFeature. Every feature created in the same session therefore shares the same flag value — either all are eligible for collection or none are.

Loaded features are excluded (cryo_chamber_tab.py)
In _load_project_data(), after features are read from features.json, every loaded feature is immediately reset to collect=False before being assigned to the model. Features from a prior session were either already collected or never selected; the new session's sampling decision applies only to features created going forward.

CryoFeature.collect flag (feature.py)
The collect parameter on CryoFeature.__init__ is a plain bool, defaulting to False. The flag is serialised into features.json and survives a save/load cycle. If the key is absent in loaded JSON (older data), it defaults to False.

Provides a thread-safe, non-blocking ``DataCollector.record()`` call that any Odemis module can invoke to capture a labelled data sample. CLI interface is implemented to download the data from the cloud storage.
Copilot AI review requested due to automatic review settings April 22, 2026 11:56
@K4rishma K4rishma marked this pull request as draft April 22, 2026 11:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an annotated data-collection pipeline for cryo features, including user consent management, S3 upload/download utilities, and a per-project sampling flag that is propagated to newly created CryoFeature instances and persisted in features.json.

Changes:

  • Introduces a DataCollector framework with background serialization/upload to S3 and persistent consent configuration.
  • Adds GUI consent dialog + Help menu toggle, and wires per-project feature sampling into feature creation/loading.
  • Adds an S3 “fetch samples” CLI and extensive unit/integration tests for the new utilities.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/odemis/util/datacollector.py New data-collection framework (config, background worker, S3 backend, serialization).
src/odemis/util/dc_fetch.py New S3 retrieval helpers + CLI entrypoint implementation.
scripts/odemis-dc-fetch.py Script wrapper for the S3 sample fetch CLI.
src/odemis/gui/win/consent.py New consent dialog UI.
src/odemis/gui/main.py Shows consent prompt on startup and injects DataCollector into menu controller.
src/odemis/gui/cont/menu.py Adds Help menu checkbox to toggle data sharing consent.
src/odemis/acq/feature.py Adds CryoFeature.collect flag + collection routine and helpers.
src/odemis/gui/model/tab_gui_data.py Propagates per-session features_collectable flag into newly created CryoFeatures.
src/odemis/gui/cont/tabs/cryo_chamber_tab.py Makes a per-project sampling decision and clears collect on loaded features.
src/odemis/gui/cont/features.py Adds triggers for collection on status change, posture transitions, and feature deletion.
src/odemis/util/test/datacollector_test.py New tests for config, serialization, queue limit, retry, and S3 integration (skipped when creds missing).
src/odemis/util/test/dc_fetch_test.py New tests for S3 listing/pagination and download filtering logic.
src/odemis/acq/test/feature_test.py / src/odemis/acq/test/test-features.json Updates feature JSON format and adds tests for collect + collection helpers.
debian/control Adds python3-boto3 dependency.

Comment on lines +157 to +166
consent_val = self.consent
remind_val = self.remind_date

if consent_val is True:
consent_line = "consent = true"
elif consent_val is False:
consent_line = "consent = false"
else:
consent_line = "consent = none"

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataCollectorConfig._write() writes consent = none when consent is unset, but DataCollectorConfig.consent reads the value using ConfigParser.getboolean(), which cannot parse none and will raise ValueError on the next load. This makes cfg.consent unusable after a save/load cycle when consent is undecided (e.g., after clear_consent() / postpone_consent()). Either omit/comment out the consent option when unset (and keep it absent in the file), or update the getter to explicitly treat none/empty as None (catch ValueError).

Copilot uses AI. Check for mistakes.
Comment on lines +328 to +373
# Limit event_name length so the filename stays within filesystem limits.
safe_event = item.event_name[:64] if item.event_name else "event"
zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip"

tmp_dir = Path(tempfile.mkdtemp(prefix="dc_"))
try:
payload_meta: dict = {}
extra_files: list = [] # list of (arcname, abs_path)

for key, value in item.payload.items():
if value is None or isinstance(value, (str, int, float, bool)):
payload_meta[key] = value

elif isinstance(value, numpy.ndarray):
if item.image_format.upper() == "HDF5":
arc_name = f"{key}.h5"
abs_path = tmp_dir / arc_name
try:
da = value if isinstance(value, model.DataArray) else model.DataArray(value)
hdf5.export(str(abs_path), da)
except Exception:
logging.exception("Failed to export DataArray to HDF5 at %s", abs_path)
abs_path = None
else:
arc_name = f"{key}.ome.tiff"
abs_path = tmp_dir / arc_name
try:
da = value if isinstance(value, model.DataArray) else model.DataArray(value)
tiff.export(str(abs_path), da)
except Exception:
logging.exception("Failed to export DataArray to TIFF at %s", abs_path)
abs_path = None

if abs_path is not None and abs_path.exists():
extra_files.append((arc_name, abs_path))
payload_meta[key] = arc_name
else:
payload_meta[key] = None
payload_meta["export_error"] = True

elif isinstance(value, (dict, list)):
arc_name = f"extra_{key}.json"
abs_path = tmp_dir / arc_name
abs_path.write_text(json.dumps(value, default=str), encoding="utf-8")
extra_files.append((arc_name, abs_path))
payload_meta[key] = arc_name
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event_name and payload keys are used directly to construct filenames inside _serialize() (ZIP name, extra_<key>.json, <key>.ome.tiff/.h5). Because DataCollector.record() allows arbitrary strings for event_name and payload keys, this enables path traversal (e.g., event_name='../../x' or payload key containing path separators) and could write/replace files outside queue_dir/tmp_dir. Sanitize event_name and all derived filenames to a safe character set and ensure the final resolved path stays within the intended directory before writing/renaming.

Copilot uses AI. Check for mistakes.
Comment thread src/odemis/acq/feature.py
Comment on lines +542 to +545
try:
_dc = DataCollector()
if not _dc.get_consent():
return
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_feature_data() instantiates a new DataCollector() on every call. When consent is granted this will create a new background worker thread per invocation (and a new config instance), which can leak threads and increase CPU/memory usage—especially since collection is triggered from multiple GUI events and also runs inside separate threads already. Prefer reusing a single shared DataCollector instance (e.g., inject it from the GUI/controller, or use a module-level singleton) rather than constructing a new one each time.

Copilot uses AI. Check for mistakes.
Comment on lines +170 to +179
"""Create S3 client and return `(client, bucket)`."""
backend = config.get_upload_backend()
if not isinstance(backend, S3UploadBackend):
raise RuntimeError("Only S3 backend is supported for retrieval.")
# Accessing protected members intentionally to reuse existing backend setup.
client = backend._get_client() # pylint: disable=protected-access
bucket = backend._bucket # pylint: disable=protected-access
return client, bucket


Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_s3_client_from_config() appears unused (the fetch path uses build_s3_client_from_config() instead). Consider removing it or using it consistently to avoid duplicate ways of building an S3 client and reduce maintenance surface.

Suggested change
"""Create S3 client and return `(client, bucket)`."""
backend = config.get_upload_backend()
if not isinstance(backend, S3UploadBackend):
raise RuntimeError("Only S3 backend is supported for retrieval.")
# Accessing protected members intentionally to reuse existing backend setup.
client = backend._get_client() # pylint: disable=protected-access
bucket = backend._bucket # pylint: disable=protected-access
return client, bucket
"""Create an S3 client from configuration.
This wrapper is kept for compatibility and delegates to
build_s3_client_from_config so there is a single implementation
for S3 client construction.
"""
return build_s3_client_from_config(config)

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +99
# Feature whose status VA we are subscribed to for data-collection triggering.
self._status_collect_feature: Optional[CryoFeature] = None

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional is used in the type annotation for _status_collect_feature, but it is not imported. Without from __future__ import annotations, this will raise a NameError at import time and break the GUI controller. Import Optional from typing (or switch to typing.Optional[...]).

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 820f35f2-98c4-41a6-9c63-c21291659338

📥 Commits

Reviewing files that changed from the base of the PR and between 279f1b4 and 510e3ab.

📒 Files selected for processing (1)
  • src/odemis/gui/cont/tabs/cryo_chamber_tab.py

📝 Walkthrough

Walkthrough

This pull request adds an anonymized data collection pipeline and UI consent flow: a thread-safe DataCollector with config persistence and S3 upload backend; a background worker that serializes payloads to ZIPs and uploads with retries and queue-size enforcement; a CLI tool to fetch DataCollector ZIP samples from S3; automatic feature-level collection triggers (status changes, posture transitions, deletion) with per-project sampling; a ConsentDialog and GUI integrations to prompt and toggle consent; test coverage for datacollector and fetch utilities; and a Debian packaging dependency for python3-boto3.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant GUI as GUI/main.py
    participant DC as DataCollector
    participant Config as DataCollectorConfig
    participant Dialog as ConsentDialog

    User->>GUI: Launch application
    GUI->>DC: Instantiate DataCollector
    GUI->>DC: should_prompt_for_consent()
    DC->>Config: Read config file
    Config-->>DC: Return consent/reminder decision
    alt Prompt needed
        GUI->>Dialog: Show ConsentDialog
        Dialog-->>User: Prompt (Opt in / Opt out / Remind)
        User->>Dialog: Choose option
        Dialog-->>GUI: Return result
        GUI->>DC: set_consent()/postpone_consent()
        DC->>Config: Persist consent state
    else No prompt
        GUI->>GUI: Continue startup
    end
Loading
sequenceDiagram
    participant Controller as CryoFeatureController
    participant Feature as CryoFeature
    participant DC as DataCollector
    participant Worker as BackgroundWorker
    participant S3 as S3 Backend

    Feature->>Controller: status change or deletion or posture transition
    Controller->>Controller: Check feature.collect & eligibility
    alt Eligible & consent granted
        Controller->>DC: record(event_name, schema, payload)
        DC->>DC: Validate & enqueue _WorkItem
        Worker->>DC: Dequeue item
        Worker->>Worker: Serialize metadata + images -> ZIP
        Worker->>S3: Upload ZIP
        S3-->>Worker: Upload success
        Worker->>Worker: Delete local ZIP
    else Not eligible or no consent
        Controller->>Controller: Skip collection
    end
Loading
sequenceDiagram
    participant Admin
    participant CLI as odemis-dc-fetch
    participant Config as DataCollectorConfig
    participant S3 as S3 Bucket
    participant Disk as Output Directory

    Admin->>CLI: Run with --event/--since/--output
    CLI->>Config: Load credentials / backend config
    Config-->>CLI: S3 client info
    CLI->>S3: list_objects_v2 (one or more prefixes)
    S3-->>CLI: Paginated objects
    loop For each object key
        CLI->>CLI: Parse event & timestamp
        CLI->>CLI: Apply filters (--event, --since, --host)
        alt Matches
            CLI->>S3: GetObject(key)
            S3-->>Disk: Write file (skip if exists)
        else Skip
        end
    end
    CLI-->>Admin: Exit code (0 success, 1 failures, 2 parse error)
Loading

Possibly related PRs

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'feature Meteor store annoted data' is misleading and does not accurately describe the actual changeset, which implements per-project feature data sampling for cryo chamber analysis. Revise the title to accurately reflect the main change, e.g., 'feature Per-project sampling for cryo feature data collection' or similar that clarifies the feature's actual purpose.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the per-project sampling mechanism, how features inherit the collection flag, and how loaded features are excluded, all of which align with the actual changes in the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (2)
src/odemis/gui/cont/menu.py (1)

46-54: Add type hints to the new consent menu API.

The changed constructor and new handlers should annotate all parameters and return types.

♻️ Proposed fix
+from typing import Any
+
@@
-    def __init__(self, main_data, main_frame, data_collector: DataCollector):
+    def __init__(self, main_data: Any, main_frame: wx.Frame, data_collector: DataCollector) -> None:
@@
-    def _append_data_sharing_menu_item(self, main_frame):
+    def _append_data_sharing_menu_item(self, main_frame: wx.Frame) -> wx.MenuItem | None:
@@
-    def _on_toggle_data_sharing(self, evt):
+    def _on_toggle_data_sharing(self, evt: wx.CommandEvent) -> None:

As per coding guidelines, **/*.py: Always use type hints for function parameters and return types in Python code.

Also applies to: 171-185

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/gui/cont/menu.py` around lines 46 - 54, Annotate the constructor
and the new consent menu handlers with explicit type hints: update __init__ to
declare parameter types (e.g., main_data: MainGUIData, main_frame: wx.Frame,
data_collector: DataCollector) and the return type -> None; then locate the
consent-related handler functions referenced around lines 171-185 and add full
parameter and return type annotations (e.g., event: wx.Event or appropriate
event type -> None, any other params typed to their domain types). Ensure you
import or reference the types (MainGUIData, DataCollector, wx.Frame/Event) at
the top of the module so the annotations are valid.
src/odemis/gui/win/consent.py (1)

16-59: Add docstrings for the new dialog methods.

The event handlers and initializer are new functions and should follow the project docstring rule.

♻️ Proposed fix
     def __init__(self, parent: wx.Window, remind_days: int) -> None:
+        """
+        Initialize the consent dialog.
+
+        :param parent: Parent window for the dialog.
+        :param remind_days: Number of days before prompting again.
+        """
         title = "Share data with Delmic"
@@
     def _on_opt_in(self, _evt: wx.CommandEvent) -> None:
+        """
+        Handle the opt-in button.
+        """
         self.EndModal(self.RESULT_OPT_IN)
 
     def _on_opt_out(self, _evt: wx.CommandEvent) -> None:
+        """
+        Handle the opt-out button.
+        """
         self.EndModal(self.RESULT_OPT_OUT)
 
     def _on_remind_later(self, _evt: wx.CommandEvent) -> None:
+        """
+        Handle the remind-later button.
+        """
         self.EndModal(self.RESULT_REMIND_LATER)
 
     def _on_close(self, _evt: wx.CloseEvent) -> None:
+        """
+        Handle closing the dialog without an explicit choice.
+        """
         self.EndModal(self.RESULT_REMIND_LATER)

As per coding guidelines, **/*.py: Include docstrings for all functions and classes, following the reStructuredText style guide, without type information and without using inline formatting markers or backticks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/gui/win/consent.py` around lines 16 - 59, Add reStructuredText
docstrings to the Consent dialog methods: document the __init__ constructor and
each event handler method (_on_opt_in, _on_opt_out, _on_remind_later,
_on_close). For each docstring include a one-sentence description of the
method's purpose and, where helpful, describe important parameters (e.g., evt)
and the effect (which modal result is returned) using plain text (no type info,
no inline code/backticks), following the project's reST style conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/odemis/acq/feature.py`:
- Around line 573-584: The code currently sets channel_key = None when MD_OUT_WL
is missing, causing all such streams to collide and only the first to be kept;
change to use a per-stream fallback like the stream's position so missing
MD_OUT_WL yields a unique key. Iterate with an index over the concatenated list
(fm_zstacks + fm_images) and set channel_key =
s.raw[0].metadata.get(model.MD_OUT_WL, f"fallback:{i}") (or similar unique
identifier from the stream) before checking seen_channels; update references to
selected_fm and seen_channels accordingly.
- Around line 539-638: The race is that multiple threads can pass the initial if
not feature.collect check and all run until feature.collect is set False at the
end; make the one-shot flip atomic by performing an immediate compare-and-set at
the start of the routine (e.g., in collect_feature_data): replace the plain
boolean check of feature.collect with an atomic operation that sets
feature.collect to False only if it was True (or acquire a per-feature lock
keyed by feature id, check feature.collect and set it False while holding the
lock), and if the CAS/lock indicates collect was already False return early;
reference feature.collect and the top-level routine (collect_feature_data in
this file) when adding the atomic CAS or lock.

In `@src/odemis/acq/test/feature_test.py`:
- Around line 349-365: The test test_payload_has_no_feature_name only checks
that f.name.value ("my_secret_feature_name") is not present among payload keys;
update the assertions to also ensure the feature name does not appear in payload
values by checking captured.values() and the stringified values (for
nested/serialized values) after collect_feature_data runs — e.g., add assertions
using self.assertNotIn("my_secret_feature_name", captured.values()) and
self.assertNotIn("my_secret_feature_name", str(list(captured.values()))) to the
test (keep existing fake_record/captured usage and collect_feature_data
invocation).

In `@src/odemis/gui/cont/features.py`:
- Line 31: The module imports Dict and List from typing but uses Optional in the
annotation for self._status_collect_feature (type Optional[CryoFeature]) which
is undefined; update the import statement that currently lists Dict and List to
also import Optional so Optional is available for the annotation (refer to the
import line and the attribute self._status_collect_feature / CryoFeature usage
to locate the change).

In `@src/odemis/gui/cont/tabs/cryo_chamber_tab.py`:
- Around line 388-389: The code sets
self.tab_data_model.main.features_collectable using random.random() and
FEATURE_COLLECT_PROBABILITY but those names are not defined; import the random
module at the top of this module and add a module-level constant
FEATURE_COLLECT_PROBABILITY (e.g., a float like 0.1) before it’s used so
_change_project_conf() and any code referencing features_collectable (and
class/attribute names like tab_data_model.main.features_collectable) do not
raise NameError.

In `@src/odemis/gui/main.py`:
- Around line 436-441: The consent dialog updates consent via
self._data_collector.set_consent(...) but the Help > Share data menu checkbox
isn't refreshed; add a helper method refresh_data_sharing_state(self) in the
menu controller (e.g. class in src/odemis/gui/cont/menu.py) that does if
self._consent_menu_item is not None:
self._consent_menu_item.Check(self._data_collector.get_consent() is True), then
call that helper after the dialog result handling in main.py (after the branches
that call set_consent(True/False) or postpone_consent()) so the menu checkbox
reflects the new persisted consent state immediately.

In `@src/odemis/util/datacollector.py`:
- Around line 588-604: The loop currently does "if
self._process_pending_zips(...): continue" which starves new in-memory records;
change _run so that when _process_pending_zips(...) returns True you still
attempt to drain or process queued items instead of skipping the queue.get()
step — e.g., replace the continue with a non-blocking attempt to fetch work (use
self._queue.get_nowait() in a try/except queue.Empty or
self._queue.get(timeout=0.0)) and call self._process_work_item(item) if you get
one; keep the existing exception handling (_schedule_retry, logging.exception)
and avoid a tight busy-loop by falling back to the original blocking
self._queue.get(timeout=1.0) when no items are available.
- Around line 328-373: The event_name and payload keys are used directly to
build filesystem names (safe_event, zip_name, arc_name, extra_files paths) which
allows path traversal or unsafe ZIP entries; add a sanitization step that
normalizes item.event_name and each payload key into safe filenames (strip or
replace path separators like "/" and "\" and sequences like "..", collapse to a
whitelist of allowed chars such as alphanumerics, hyphen, underscore, enforce a
max length like 64) before using them to construct zip_name, tmp_dir children,
or arc_name; apply this sanitizer to safe_event, every arc_name (e.g., when
creating extra_{key}.json or {key}.h5/.ome.tiff) and when writing to tmp_dir or
adding to extra_files so no user-supplied string can escape tmp_dir or create
dangerous ZIP entries (update references in the code around safe_event,
zip_name, arc_name, payload_meta assignments, and extra_files population).

In `@src/odemis/util/dc_fetch.py`:
- Around line 40-47: The ISO parse path must normalize a trailing 'Z' before
calling datetime.fromisoformat to maintain Python 3.10 compatibility: in the
code handling text (the branch that calls datetime.fromisoformat(text)), detect
and replace a trailing 'Z' (or '+00:00' equivalent if present) with '+00:00' or
otherwise remove it so fromisoformat won't raise ValueError, then proceed to set
tzinfo to timezone.utc when parsed.tzinfo is None and use
parsed.astimezone(timezone.utc) when it has tzinfo; update the logic around the
parsed = datetime.fromisoformat(text) call accordingly.

---

Nitpick comments:
In `@src/odemis/gui/cont/menu.py`:
- Around line 46-54: Annotate the constructor and the new consent menu handlers
with explicit type hints: update __init__ to declare parameter types (e.g.,
main_data: MainGUIData, main_frame: wx.Frame, data_collector: DataCollector) and
the return type -> None; then locate the consent-related handler functions
referenced around lines 171-185 and add full parameter and return type
annotations (e.g., event: wx.Event or appropriate event type -> None, any other
params typed to their domain types). Ensure you import or reference the types
(MainGUIData, DataCollector, wx.Frame/Event) at the top of the module so the
annotations are valid.

In `@src/odemis/gui/win/consent.py`:
- Around line 16-59: Add reStructuredText docstrings to the Consent dialog
methods: document the __init__ constructor and each event handler method
(_on_opt_in, _on_opt_out, _on_remind_later, _on_close). For each docstring
include a one-sentence description of the method's purpose and, where helpful,
describe important parameters (e.g., evt) and the effect (which modal result is
returned) using plain text (no type info, no inline code/backticks), following
the project's reST style conventions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2aab52b5-1622-44b1-9a6a-a474e984fe44

📥 Commits

Reviewing files that changed from the base of the PR and between 2b55f3c and 279f1b4.

📒 Files selected for processing (15)
  • debian/control
  • scripts/odemis-dc-fetch.py
  • src/odemis/acq/feature.py
  • src/odemis/acq/test/feature_test.py
  • src/odemis/acq/test/test-features.json
  • src/odemis/gui/cont/features.py
  • src/odemis/gui/cont/menu.py
  • src/odemis/gui/cont/tabs/cryo_chamber_tab.py
  • src/odemis/gui/main.py
  • src/odemis/gui/model/tab_gui_data.py
  • src/odemis/gui/win/consent.py
  • src/odemis/util/datacollector.py
  • src/odemis/util/dc_fetch.py
  • src/odemis/util/test/datacollector_test.py
  • src/odemis/util/test/dc_fetch_test.py

Comment thread src/odemis/acq/feature.py
Comment on lines +539 to +638
if not feature.collect:
return

try:
_dc = DataCollector()
if not _dc.get_consent():
return
except Exception:
logging.exception("collect_feature_data: failed to access DataCollector; skipping.")
return

try:

# Load feature streams from disk when not yet in memory.
if not feature.streams.value and project_dir:
try:
load_feature_streams_from_disk(feature, project_dir)
except Exception:
logging.exception(
"collect_feature_data: failed to load streams; skipping.")
return

feature_streams = list(feature.streams.value)

# Collect first z-stack per FM channel; fall back to first FM image per channel.
fm_zstacks: List = []
fm_images: List = []
for s in feature_streams:
if isinstance(s, StaticFluoStream):
if _is_zstack_stream(s):
fm_zstacks.append(s)
else:
fm_images.append(s)

# Per channel: prefer z-stack, then plain FM image.
# Channels are delineated by MD_OUT_WL; use index as fallback.
selected_fm: List = []
seen_channels: set = set()
for s in fm_zstacks + fm_images:
try:
channel_key = s.raw[0].metadata.get(model.MD_OUT_WL)
except (IndexError, AttributeError):
channel_key = None
if channel_key not in seen_channels:
seen_channels.add(channel_key)
selected_fm.append(s)

# Collect spatially overlapping overview streams.
stage_pos = feature.stage_position.value
feat_x = stage_pos.get("x", 0.0)
feat_y = stage_pos.get("y", 0.0)

overview_fm: List = []
overview_sem: List = []
for s in (overview_streams or []):
if not _stream_overlaps_position(s, feat_x, feat_y):
continue
if isinstance(s, StaticFluoStream):
overview_fm.append(s)
elif isinstance(s, (StaticSEMStream, StaticFIBStream)):
overview_sem.append(s)

# Build privacy-preserving payload — generic keys, no names or filenames.
payload: dict = {
"status": feature.status.value,
"stage_position": dict(stage_pos),
"fm_focus_position": dict(feature.fm_focus_position.value),
}

def _get_raw(stream: "Stream") -> Optional["model.DataArray"]:
try:
return stream.raw[0] if stream.raw else None
except Exception:
return None

for idx, s in enumerate(selected_fm):
da = _get_raw(s)
if da is not None:
payload[f"channel_{idx}"] = da

for idx, s in enumerate(overview_fm):
da = _get_raw(s)
if da is not None:
payload[f"overview_fm_{idx}"] = da

for idx, s in enumerate(overview_sem):
da = _get_raw(s)
if da is not None:
payload[f"overview_sem_{idx}"] = da

image_keys = [k for k in payload if k.startswith(("channel_", "overview_fm_", "overview_sem_"))]
if not image_keys:
logging.debug(
"collect_feature_data: no images found for feature; skipping upload."
)
return

_dc.record("feature_collected", "1.0", payload)

feature.collect = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the one-shot collect transition atomic.

Concurrent status/posture/delete triggers can all pass Line 539 before Line 638 flips feature.collect, causing duplicate submissions for the same feature.

Proposed fix
 FEATURE_COLLECT_PROBABILITY = 0.2
+_COLLECTION_STATE_LOCK = threading.Lock()
-        _dc.record("feature_collected", "1.0", payload)
-
-        feature.collect = False
+        with _COLLECTION_STATE_LOCK:
+            if not feature.collect:
+                return
+            feature.collect = False
+
+        try:
+            _dc.record("feature_collected", "1.0", payload)
+        except Exception:
+            with _COLLECTION_STATE_LOCK:
+                feature.collect = True
+            raise
         logging.debug("collect_feature_data: submitted feature data for collection.")
🧰 Tools
🪛 Ruff (0.15.10)

[warning] 611-611: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/acq/feature.py` around lines 539 - 638, The race is that multiple
threads can pass the initial if not feature.collect check and all run until
feature.collect is set False at the end; make the one-shot flip atomic by
performing an immediate compare-and-set at the start of the routine (e.g., in
collect_feature_data): replace the plain boolean check of feature.collect with
an atomic operation that sets feature.collect to False only if it was True (or
acquire a per-feature lock keyed by feature id, check feature.collect and set it
False while holding the lock), and if the CAS/lock indicates collect was already
False return early; reference feature.collect and the top-level routine
(collect_feature_data in this file) when adding the atomic CAS or lock.

Comment thread src/odemis/acq/feature.py
Comment on lines +573 to +584
# Per channel: prefer z-stack, then plain FM image.
# Channels are delineated by MD_OUT_WL; use index as fallback.
selected_fm: List = []
seen_channels: set = set()
for s in fm_zstacks + fm_images:
try:
channel_key = s.raw[0].metadata.get(model.MD_OUT_WL)
except (IndexError, AttributeError):
channel_key = None
if channel_key not in seen_channels:
seen_channels.add(channel_key)
selected_fm.append(s)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a real per-stream fallback channel key.

The comment says “use index as fallback,” but Line 581 sets channel_key = None; every stream missing MD_OUT_WL then shares the same key, so only the first such FM stream is collected.

Proposed fix
-        for s in fm_zstacks + fm_images:
+        for idx, s in enumerate(fm_zstacks + fm_images):
+            channel_key = ("index", idx)
             try:
-                channel_key = s.raw[0].metadata.get(model.MD_OUT_WL)
-            except (IndexError, AttributeError):
-                channel_key = None
+                out_wl = s.raw[0].metadata.get(model.MD_OUT_WL)
+                if out_wl is not None:
+                    channel_key = ("out_wl", repr(out_wl))
+            except (IndexError, AttributeError, TypeError):
+                pass
             if channel_key not in seen_channels:
                 seen_channels.add(channel_key)
                 selected_fm.append(s)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/acq/feature.py` around lines 573 - 584, The code currently sets
channel_key = None when MD_OUT_WL is missing, causing all such streams to
collide and only the first to be kept; change to use a per-stream fallback like
the stream's position so missing MD_OUT_WL yields a unique key. Iterate with an
index over the concatenated list (fm_zstacks + fm_images) and set channel_key =
s.raw[0].metadata.get(model.MD_OUT_WL, f"fallback:{i}") (or similar unique
identifier from the stream) before checking seen_channels; update references to
selected_fm and seen_channels accordingly.

Comment on lines +349 to +365
def test_payload_has_no_feature_name(self):
"""Payload must not contain the feature name string as a key or value."""
f = self._make_feature_with_stream(collect=True)
f.name.value = "my_secret_feature_name"
captured = {}

def fake_record(event_name, schema_version, payload, **kwargs):
captured.update(payload)

with patch("odemis.acq.feature.DataCollector") as MockDC:
MockDC.return_value.get_consent.return_value = True
MockDC.return_value.record.side_effect = fake_record
collect_feature_data(f)

self.assertNotIn("my_secret_feature_name", captured)
self.assertNotIn("my_secret_feature_name", str(captured.keys()))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the feature name is absent from payload values too.

The docstring says “as a key or value,” but the assertions only inspect keys. A leaked feature name in "status", metadata, or another value would not fail this test.

Proposed fix
         self.assertNotIn("my_secret_feature_name", captured)
         self.assertNotIn("my_secret_feature_name", str(captured.keys()))
+        self.assertNotIn("my_secret_feature_name", str(captured.values()))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/acq/test/feature_test.py` around lines 349 - 365, The test
test_payload_has_no_feature_name only checks that f.name.value
("my_secret_feature_name") is not present among payload keys; update the
assertions to also ensure the feature name does not appear in payload values by
checking captured.values() and the stringified values (for nested/serialized
values) after collect_feature_data runs — e.g., add assertions using
self.assertNotIn("my_secret_feature_name", captured.values()) and
self.assertNotIn("my_secret_feature_name", str(list(captured.values()))) to the
test (keep existing fake_record/captured usage and collect_feature_data
invocation).

import threading

import wx
from typing import Dict, List
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

head -n 110 src/odemis/gui/cont/features.py | tail -n 80

Repository: delmic/odemis

Length of output: 3014


Import Optional from typing.

Line 98 uses Optional[CryoFeature] in the type annotation for self._status_collect_feature, but line 31 only imports Dict and List from typing. This causes an undefined name error and must be fixed for Python 3.10+ code validity.

Proposed fix
-from typing import Dict, List
+from typing import Dict, List, Optional
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Dict, List
from typing import Dict, List, Optional
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/gui/cont/features.py` at line 31, The module imports Dict and List
from typing but uses Optional in the annotation for self._status_collect_feature
(type Optional[CryoFeature]) which is undefined; update the import statement
that currently lists Dict and List to also import Optional so Optional is
available for the annotation (refer to the import line and the attribute
self._status_collect_feature / CryoFeature usage to locate the change).

Comment thread src/odemis/gui/cont/tabs/cryo_chamber_tab.py
Comment thread src/odemis/gui/main.py
Comment on lines +436 to +441
if result == ConsentDialog.RESULT_OPT_IN:
self._data_collector.set_consent(True)
elif result == ConsentDialog.RESULT_OPT_OUT:
self._data_collector.set_consent(False)
else:
self._data_collector.postpone_consent()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the menu consent checkbox synchronized after the dialog.

The menu item is initialized before this dialog runs, so opting in here leaves Help > Share data unchecked while collection consent is enabled.

🛠️ Proposed fix
             if result == ConsentDialog.RESULT_OPT_IN:
                 self._data_collector.set_consent(True)
             elif result == ConsentDialog.RESULT_OPT_OUT:
                 self._data_collector.set_consent(False)
             else:
                 self._data_collector.postpone_consent()
+
+            if self._menu_controller is not None:
+                self._menu_controller.refresh_data_sharing_state()

Add this helper in src/odemis/gui/cont/menu.py:

def refresh_data_sharing_state(self) -> None:
    """Refresh the data sharing menu checkbox from persisted consent."""
    if self._consent_menu_item is not None:
        self._consent_menu_item.Check(self._data_collector.get_consent() is True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/gui/main.py` around lines 436 - 441, The consent dialog updates
consent via self._data_collector.set_consent(...) but the Help > Share data menu
checkbox isn't refreshed; add a helper method refresh_data_sharing_state(self)
in the menu controller (e.g. class in src/odemis/gui/cont/menu.py) that does if
self._consent_menu_item is not None:
self._consent_menu_item.Check(self._data_collector.get_consent() is True), then
call that helper after the dialog result handling in main.py (after the branches
that call set_consent(True/False) or postpone_consent()) so the menu checkbox
reflects the new persisted consent state immediately.

Comment on lines +151 to +165
``consent`` is always present as ``none`` / ``true`` / ``false``.
``reminder_date`` is commented out until explicitly set, then written
in ``YYYY-MM-DD`` format.
"""
self.file_path.parent.mkdir(parents=True, exist_ok=True)

consent_val = self.consent
remind_val = self.remind_date

if consent_val is True:
consent_line = "consent = true"
elif consent_val is False:
consent_line = "consent = false"
else:
consent_line = "consent = none"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle persisted consent = none when reading config.

_write() persists undecided consent as none, but getboolean() raises ValueError for that value. A fresh DataCollectorConfig() reading the file can then break consent checks instead of returning None.

Proposed fix
     `@property`
     def consent(self) -> Optional[bool]:
         """Return the consent state, or ``None`` if not yet set."""
         try:
-            return self._cp.getboolean("general", "consent")
+            value = self._cp.get("general", "consent").strip().lower()
         except (configparser.NoSectionError, configparser.NoOptionError):
             return None
+        if value in ("", "none", "null"):
+            return None
+        if value in ("1", "yes", "true", "on"):
+            return True
+        if value in ("0", "no", "false", "off"):
+            return False
+        logging.warning("Invalid datacollector consent value %r; treating as undecided.", value)
+        return None

Also applies to: 190-195

Comment on lines +328 to +373
# Limit event_name length so the filename stays within filesystem limits.
safe_event = item.event_name[:64] if item.event_name else "event"
zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip"

tmp_dir = Path(tempfile.mkdtemp(prefix="dc_"))
try:
payload_meta: dict = {}
extra_files: list = [] # list of (arcname, abs_path)

for key, value in item.payload.items():
if value is None or isinstance(value, (str, int, float, bool)):
payload_meta[key] = value

elif isinstance(value, numpy.ndarray):
if item.image_format.upper() == "HDF5":
arc_name = f"{key}.h5"
abs_path = tmp_dir / arc_name
try:
da = value if isinstance(value, model.DataArray) else model.DataArray(value)
hdf5.export(str(abs_path), da)
except Exception:
logging.exception("Failed to export DataArray to HDF5 at %s", abs_path)
abs_path = None
else:
arc_name = f"{key}.ome.tiff"
abs_path = tmp_dir / arc_name
try:
da = value if isinstance(value, model.DataArray) else model.DataArray(value)
tiff.export(str(abs_path), da)
except Exception:
logging.exception("Failed to export DataArray to TIFF at %s", abs_path)
abs_path = None

if abs_path is not None and abs_path.exists():
extra_files.append((arc_name, abs_path))
payload_meta[key] = arc_name
else:
payload_meta[key] = None
payload_meta["export_error"] = True

elif isinstance(value, (dict, list)):
arc_name = f"extra_{key}.json"
abs_path = tmp_dir / arc_name
abs_path.write_text(json.dumps(value, default=str), encoding="utf-8")
extra_files.append((arc_name, abs_path))
payload_meta[key] = arc_name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sanitize event names and payload keys before using them as paths.

event_name and payload keys flow into filesystem paths and ZIP arcnames. Values containing / or .. can write outside the queue/temp directory or create unsafe ZIP entries.

Proposed fix
+import re
+_SAFE_FILENAME_RE = re.compile(r"[^A-Za-z0-9_.-]+")
+
+
+def _safe_name(value: object, fallback: str, max_length: int = 64) -> str:
+    """Return a filesystem-safe component for generated archive names."""
+    safe = _SAFE_FILENAME_RE.sub("_", str(value).strip())[:max_length].strip("._-")
+    return safe or fallback
+
+
 def _serialize(item: _WorkItem, queue_dir: Path) -> Path:
-    # Limit event_name length so the filename stays within filesystem limits.
-    safe_event = item.event_name[:64] if item.event_name else "event"
+    safe_event = _safe_name(item.event_name, "event")
     zip_name = f"{safe_event}-{timestamp_str}-{uuid8}.zip"
         for key, value in item.payload.items():
+            safe_key = _safe_name(key, "payload")
             if value is None or isinstance(value, (str, int, float, bool)):
                 payload_meta[key] = value
 
             elif isinstance(value, numpy.ndarray):
                 if item.image_format.upper() == "HDF5":
-                    arc_name = f"{key}.h5"
+                    arc_name = f"{safe_key}.h5"
                     abs_path = tmp_dir / arc_name
@@
                 else:
-                    arc_name = f"{key}.ome.tiff"
+                    arc_name = f"{safe_key}.ome.tiff"
                     abs_path = tmp_dir / arc_name
@@
             elif isinstance(value, (dict, list)):
-                arc_name = f"extra_{key}.json"
+                arc_name = f"extra_{safe_key}.json"
                 abs_path = tmp_dir / arc_name
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/util/datacollector.py` around lines 328 - 373, The event_name and
payload keys are used directly to build filesystem names (safe_event, zip_name,
arc_name, extra_files paths) which allows path traversal or unsafe ZIP entries;
add a sanitization step that normalizes item.event_name and each payload key
into safe filenames (strip or replace path separators like "/" and "\" and
sequences like "..", collapse to a whitelist of allowed chars such as
alphanumerics, hyphen, underscore, enforce a max length like 64) before using
them to construct zip_name, tmp_dir children, or arc_name; apply this sanitizer
to safe_event, every arc_name (e.g., when creating extra_{key}.json or
{key}.h5/.ome.tiff) and when writing to tmp_dir or adding to extra_files so no
user-supplied string can escape tmp_dir or create dangerous ZIP entries (update
references in the code around safe_event, zip_name, arc_name, payload_meta
assignments, and extra_files population).

Comment on lines +588 to +604
def _run(self) -> None:
"""Main loop: process items until the thread is stopped."""
while True:
try:
if self._process_pending_zips(self._queue_dir):
continue
except Exception:
logging.exception("DataCollector error while processing pending uploads")
self._schedule_retry()
continue
try:
item = self._queue.get(timeout=1.0)
except queue.Empty:
continue

try:
self._process_work_item(item)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not starve new records while pending uploads are backing off.

When _process_pending_zips() returns True during retry backoff, Line 593 immediately loops and skips _queue.get(). New samples then remain only in memory until uploads recover, so they can be lost on shutdown.

Proposed fix
     def _run(self) -> None:
         """Main loop: process items until the thread is stopped."""
         while True:
             try:
-                if self._process_pending_zips(self._queue_dir):
-                    continue
+                self._process_pending_zips(self._queue_dir)
             except Exception:
                 logging.exception("DataCollector error while processing pending uploads")
                 self._schedule_retry()
-                continue
             try:
                 item = self._queue.get(timeout=1.0)
             except queue.Empty:
                 continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/util/datacollector.py` around lines 588 - 604, The loop currently
does "if self._process_pending_zips(...): continue" which starves new in-memory
records; change _run so that when _process_pending_zips(...) returns True you
still attempt to drain or process queued items instead of skipping the
queue.get() step — e.g., replace the continue with a non-blocking attempt to
fetch work (use self._queue.get_nowait() in a try/except queue.Empty or
self._queue.get(timeout=0.0)) and call self._process_work_item(item) if you get
one; keep the existing exception handling (_schedule_retry, logging.exception)
and avoid a tight busy-loop by falling back to the original blocking
self._queue.get(timeout=1.0) when no items are available.

Comment on lines +40 to +47
text = value.strip()
if len(text) == 10:
parsed = datetime.strptime(text, "%Y-%m-%d")
return parsed.replace(tzinfo=timezone.utc)
parsed = datetime.fromisoformat(text)
if parsed.tzinfo is None:
return parsed.replace(tzinfo=timezone.utc)
return parsed.astimezone(timezone.utc)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

❓ Verification inconclusive

Script executed:

#!/bin/bash
python3.10 - <<'PY'
from datetime import datetime
datetime.fromisoformat("2026-03-01T12:30:00Z")
PY

Repository: delmic/odemis


Repository: delmic/odemis
Exit code: 127

stderr:

/bin/bash: line 2: python3.10: command not found

Script executed:

# First, locate and examine the target file
head -60 src/odemis/util/dc_fetch.py | tail -30

Repository: delmic/odemis


Repository: delmic/odemis
Exit code: 0

stdout:


from odemis.util.datacollector import DataCollectorConfig, S3UploadBackend


def parse_since_utc(value: str) -> datetime:
    """Parse a date/datetime string to UTC-aware datetime.

    Accepts ISO-8601 date (`YYYY-MM-DD`) and datetime (`YYYY-MM-DDTHH:MM:SS` with optional timezone).
    """
    text = value.strip()
    if len(text) == 10:
        parsed = datetime.strptime(text, "%Y-%m-%d")
        return parsed.replace(tzinfo=timezone.utc)
    parsed = datetime.fromisoformat(text)
    if parsed.tzinfo is None:
        return parsed.replace(tzinfo=timezone.utc)
    return parsed.astimezone(timezone.utc)


def parse_key_timestamp_utc(key: str) -> Optional[datetime]:
    """Parse `<event>-<YYYYMMDDTHHmmss>-<uuid>.zip` timestamp from key basename."""
    name = Path(key).name
    if not name.endswith(".zip"):
        return None
    stem = name[:-4]
    parts = stem.rsplit("-", 2)
    if len(parts) != 3:
        return None
    ts = parts[1]
    try:

Script executed:

# Check Python version requirements in the project
find . -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" | head -5 | xargs cat 2>/dev/null | grep -i "python" | head -20

Repository: delmic/odemis


Repository: delmic/odemis
Exit code: 0

stdout:

#!/usr/bin/env python3
Run `python setup.py build_ext --inplace` to compile
#!/usr/bin/env python3
# sudo apt-get install python-setuptools cython
# python3 setup.py build_ext --inplace
                          "Programming Language :: Python",

Script executed:

# Also check if there are any existing tests for this function
find . -name "*test*dc_fetch*" -o -path "*/test*" -name "*dc_fetch*" 2>/dev/null

Repository: delmic/odemis


Repository: delmic/odemis
Exit code: 0

stdout:

./src/odemis/util/test/dc_fetch_test.py

Python 3.10 datetime.fromisoformat() support for trailing Z timezone


No, Python 3.10's datetime.fromisoformat does not support trailing 'Z' timezone suffix. It raises ValueError for strings like '2011-11-04T00:05:23Z'. Support was added in Python 3.11, which expanded parsing to most ISO 8601 formats including 'Z' as UTC alias. In 3.10, it only parses formats matching datetime.isoformat output, which uses '+00:00' not 'Z'. Official 3.10 docs explicitly state it does not support arbitrary ISO 8601 strings. Python 3.10 receives only security fixes, no feature backports. Workaround: replace('Z', '+00:00').

Citations:


Script executed:

# Find Python version requirement in project root
find . -maxdepth 2 -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" -o -name "README.md" | xargs grep -l "python" 2>/dev/null | head -3

Repository: delmic/odemis


Repository: delmic/odemis
Exit code: 0

stdout:

./setup.py

Script executed:

# Check the test file for parse_since_utc
cat src/odemis/util/test/dc_fetch_test.py

Repository: delmic/odemis


Repository: delmic/odemis
Exit code: 0

stdout:

#!/usr/bin/env python3
"""
Created on 11 March 2026

`@author`: Karishma Kumar

Copyright © 2026 Karishma Kumar, Delmic

This file is part of Odemis.

Odemis is free software: you can redistribute it and/or modify it under the
terms of the GNU General Public License version 2 as published by the Free
Software Foundation.

Odemis is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
Odemis. If not, see http://www.gnu.org/licenses/.
"""

import tempfile
import unittest
from datetime import datetime, timezone
from pathlib import Path
from unittest.mock import Mock, patch

from odemis.util import dc_fetch


class DCFetchTest(unittest.TestCase):
    """Unit tests for S3 retrieval helpers."""

    def test_parse_since_utc_date(self) -> None:
        """Date input should parse as UTC midnight."""
        parsed = dc_fetch.parse_since_utc("2026-03-22")
        self.assertEqual(parsed, datetime(2026, 3, 22, 0, 0, 0, tzinfo=timezone.utc))

    def test_parse_key_timestamp(self) -> None:
        """Timestamp should be parsed from key basename."""
        parsed = dc_fetch.parse_key_timestamp_utc("host/z_stack_acquired-20260322T104530-a1b2c3d4.zip")
        self.assertEqual(parsed, datetime(2026, 3, 22, 10, 45, 30, tzinfo=timezone.utc))

    def test_should_download_key_filters(self) -> None:
        """Event and since filters should both be enforced."""
        key = "host/z_stack_acquired-20260322T104530-a1b2c3d4.zip"
        since_before = datetime(2026, 3, 22, 10, 0, 0, tzinfo=timezone.utc)
        since_after = datetime(2026, 3, 22, 11, 0, 0, tzinfo=timezone.utc)
        self.assertTrue(dc_fetch.should_download_key(key, "z_stack_acquired", since_before))
        self.assertFalse(dc_fetch.should_download_key(key, "other_event", since_before))
        self.assertFalse(dc_fetch.should_download_key(key, "z_stack_acquired", since_after))

    def test_iter_s3_objects_paginates(self) -> None:
        """S3 iterator should follow continuation tokens."""
        client = Mock()
        client.list_objects_v2.side_effect = [
            {
                "Contents": [{"Key": "host/a.zip"}],
                "IsTruncated": True,
                "NextContinuationToken": "token-1",
            },
            {
                "Contents": [{"Key": "host/b.zip"}],
                "IsTruncated": False,
            },
        ]
        keys = [item["Key"] for item in dc_fetch.iter_s3_objects(client, "bucket", "host/")]
        self.assertEqual(keys, ["host/a.zip", "host/b.zip"])
        self.assertEqual(client.list_objects_v2.call_count, 2)

    def test_parse_host_filters_comma_list(self) -> None:
        """Host parser should accept comma-separated values and normalize them."""
        hosts = dc_fetch.parse_host_filters("meteor-5099, atlas-001 ,/secom-22/")
        self.assertEqual(hosts, ["meteor-5099", "atlas-001", "secom-22"])

    def test_fetch_samples_downloads_matching_keys(self) -> None:
        """Fetch flow should download matching keys and report counters."""
        with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir:
            output_dir = Path(tmp_dir)
            client = Mock()
            client.list_objects_v2.return_value = {
                "Contents": [
                    {"Key": "host/evt-20260322T100000-aaaa1111.zip"},
                    {"Key": "host/other-20260322T100000-bbbb2222.zip"},
                ],
                "IsTruncated": False,
            }

            def _download_file(_bucket: str, _key: str, filename: str) -> None:
                Path(filename).write_bytes(b"zip")

            client.download_file.side_effect = _download_file

            with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")):
                result = dc_fetch.fetch_samples(
                    event_filter="evt",
                    since_utc=datetime(2026, 3, 22, 9, 0, 0, tzinfo=timezone.utc),
                    output_dir=output_dir,
                )

            self.assertEqual(result["listed"], 2)
            self.assertEqual(result["matched"], 1)
            self.assertEqual(result["downloaded"], 1)
            self.assertEqual(result["failed"], 0)
            self.assertTrue((output_dir / "evt-20260322T100000-aaaa1111.zip").exists())

    def test_fetch_samples_applies_host_filter_prefix(self) -> None:
        """Host filter should become the S3 list prefix."""
        with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir:
            output_dir = Path(tmp_dir)
            client = Mock()
            client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False}

            with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")):
                dc_fetch.fetch_samples(
                    event_filter=None,
                    since_utc=None,
                    output_dir=output_dir,
                    host_filter="meteor-5099",
                )

            call_kwargs = client.list_objects_v2.call_args.kwargs
            self.assertEqual(call_kwargs["Bucket"], "bucket")
            self.assertEqual(call_kwargs["Prefix"], "meteor-5099/")

    def test_fetch_samples_applies_multiple_host_prefixes(self) -> None:
        """Comma-separated hosts should trigger one listing call per host prefix."""
        with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir:
            output_dir = Path(tmp_dir)
            client = Mock()
            client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False}

            with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")):
                dc_fetch.fetch_samples(
                    event_filter=None,
                    since_utc=None,
                    output_dir=output_dir,
                    host_filter="meteor-5099,atlas-001",
                )

            self.assertEqual(client.list_objects_v2.call_count, 2)
            first_prefix = client.list_objects_v2.call_args_list[0].kwargs["Prefix"]
            second_prefix = client.list_objects_v2.call_args_list[1].kwargs["Prefix"]
            self.assertEqual(first_prefix, "meteor-5099/")
            self.assertEqual(second_prefix, "atlas-001/")

    def test_fetch_samples_passes_bucket_endpoint_region_overrides(self) -> None:
        """Overrides should be forwarded to the S3 client builder."""
        with tempfile.TemporaryDirectory(prefix="dc_fetch_") as tmp_dir:
            output_dir = Path(tmp_dir)
            client = Mock()
            client.list_objects_v2.return_value = {"Contents": [], "IsTruncated": False}

            with patch("odemis.util.dc_fetch.build_s3_client_from_config", return_value=(client, "bucket")) as builder:
                dc_fetch.fetch_samples(
                    event_filter=None,
                    since_utc=None,
                    output_dir=output_dir,
                    host_filter=None,
                    bucket_override="other-bucket",
                    endpoint_override="https://s3.eu-west-1.amazonaws.com",
                    region_override="eu-west-1",
                )

            kwargs = builder.call_args.kwargs
            self.assertEqual(kwargs["bucket_override"], "other-bucket")
            self.assertEqual(kwargs["endpoint_override"], "https://s3.eu-west-1.amazonaws.com")
            self.assertEqual(kwargs["region_override"], "eu-west-1")

    def test_build_s3_client_uses_backend_region_as_default(self) -> None:
        """build_s3_client_from_config should use backend._region when no override is given."""
        from odemis.util.datacollector import S3UploadBackend, S3_REGION

        backend = S3UploadBackend(
            access_key="key",
            secret_key="secret",
            region=S3_REGION,
            bucket="test-bucket",
        )
        mock_config = Mock()
        mock_config.get_upload_backend.return_value = backend

        with patch("boto3.client") as mock_boto3_client:
            mock_boto3_client.return_value = Mock()
            dc_fetch.build_s3_client_from_config(mock_config)

        call_kwargs = mock_boto3_client.call_args.kwargs
        self.assertEqual(call_kwargs.get("region_name"), S3_REGION)
        # endpoint_url must be None so boto3 resolves the regional endpoint automatically
        self.assertIsNone(call_kwargs.get("endpoint_url"))


if __name__ == "__main__":
    unittest.main()

Script executed:

# Look for Python version spec in setup.py
head -100 setup.py 2>/dev/null | grep -A5 -B5 "python"

Repository: delmic/odemis


Repository: delmic/odemis
Exit code: 0

stdout:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# To rebuild just the cython modules, use these commands:
# sudo apt-get install python-setuptools cython
# python3 setup.py build_ext --inplace
from setuptools import setup, find_packages, Extension
from Cython.Build import cythonize # Warning: must be _after_ setup import
import glob
import os
import subprocess

Normalize trailing Z before fromisoformat() for Python 3.10 compatibility.

UTC timestamps with trailing Z (e.g., 2026-03-01T12:30:00Z) are standard ISO 8601 format commonly provided as CLI input. However, datetime.fromisoformat() does not support the Z suffix in Python 3.10—it raises ValueError. Support was added in Python 3.11. Since the coding guidelines require Python 3.10+ compatibility, this code path needs a normalization step.

Proposed fix
     text = value.strip()
     if len(text) == 10:
         parsed = datetime.strptime(text, "%Y-%m-%d")
         return parsed.replace(tzinfo=timezone.utc)
+    if text.endswith("Z"):
+        text = text[:-1] + "+00:00"
     parsed = datetime.fromisoformat(text)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
text = value.strip()
if len(text) == 10:
parsed = datetime.strptime(text, "%Y-%m-%d")
return parsed.replace(tzinfo=timezone.utc)
parsed = datetime.fromisoformat(text)
if parsed.tzinfo is None:
return parsed.replace(tzinfo=timezone.utc)
return parsed.astimezone(timezone.utc)
text = value.strip()
if len(text) == 10:
parsed = datetime.strptime(text, "%Y-%m-%d")
return parsed.replace(tzinfo=timezone.utc)
if text.endswith("Z"):
text = text[:-1] + "+00:00"
parsed = datetime.fromisoformat(text)
if parsed.tzinfo is None:
return parsed.replace(tzinfo=timezone.utc)
return parsed.astimezone(timezone.utc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/odemis/util/dc_fetch.py` around lines 40 - 47, The ISO parse path must
normalize a trailing 'Z' before calling datetime.fromisoformat to maintain
Python 3.10 compatibility: in the code handling text (the branch that calls
datetime.fromisoformat(text)), detect and replace a trailing 'Z' (or '+00:00'
equivalent if present) with '+00:00' or otherwise remove it so fromisoformat
won't raise ValueError, then proceed to set tzinfo to timezone.utc when
parsed.tzinfo is None and use parsed.astimezone(timezone.utc) when it has
tzinfo; update the logic around the parsed = datetime.fromisoformat(text) call
accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants