-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Evaluation, Reproducibility, Benchmarks Meeting 44
Michela edited this page May 27, 2026
·
14 revisions
Date: 27th May, 2026
- Rucha
- Carole
- Anne
- Annika
- Michela
- Oliver
-
From Michela
- No major progress since last meeting on the planned task to automatically extract basic information from DICOM headers (time constraints).
Next Steps:
- Aim to prepare something by the next meeting.
-
From Rusha
- Attended the first meeting of the deploy working group and gave an overview of potential collaboration between the evaluation and deploy teams.
- Developed a tool originally built for synthetic data evaluation that generalizes to other types of problems (e.g., segmentation). Not yet tested beyond the current modality/application; expected to work where shapes are characteristic and segmentation is robust.
- Possible extension to other modalities: The method is theoretically generalisable to any domain with characteristic shapes (organs on CT, cell shapes in histopathology, etc.) as long as a robust segmentation mask is available.
- Discussed potential starting datasets: Medical Segmentation Decathlon (many organs, masks available) and Unicorn challenge data.
Next Steps:
- Try the tool on different datasets
- Follow-up meeting with the deploy working group next week to define more concrete plans.
-
From Oliver:
Medical Image Analysis Reviews:
- Paper reviews received from Medical Image Analysis for the CIA paper.
- Overall feasible revision, but substantial experimentation required.
- Main requested addition: sensitivity analysis for kernel density estimation choices: try different kernels and bandwidths.
- Reviewers request practical recommendations/best practices rather than results only.
- Group discussion: avoid “formal guidelines” framing; propose best practices / practical guidance with careful caveats: Not consensus-derived by a large consortium, can be presented as evidence-driven guidance + future work toward broader consensus.
Systematic review initiative (brain imaging benchmarks/challenges, past decade):
- Olivier highlighted Michela's spider plot/benchmark ID card idea from the previous meeting as potentially very useful for objectively characterising benchmark quality across multiple dimensions.
Next step
- Work on the MEDIA revision
- Share the spider plot with the students for context.
- Invite Michela and Rusha for screening papers (Carole and Annika have already agreed)
-
From Annika:
Public Data Repository Initiative:
- Originally proposed by MICCAI 2026 general chairs: a shared platform for hosting medical imaging datasets (currently scattered).
- Several groups are pursuing the same goal — the current focus is on identifying who leads this effort (SIG for Challenges, Open Data Initiative, etc.).
- The advisory board has agreed in principle to help facilitate data access once a hosting platform is identified.
- Will keep the group posted.
Foundation Model Pre-training & Duplicate Datasets
- Approached by Ralph Luca (Monai Human-AI Interaction Working Group) about the problem of duplicate datasets in foundation model pre-training (models trained on unknown or overlapping data sources).
- Potential connection to Michela's benchmark work (e.g., as a dimension in the spider plot).
- Details still to be confirmed — Anikai will report back at the next meeting.
-
From Carole
- Bandwidth constraints: heavy marking/teaching/writing; intends to refocus in summer.
- Dependency issue: A package was removed from the setup/installation, breaking the metrics installation pipeline.
- Good news: A community member has agreed to implement a wrapper for object detection / assignment-localisation metrics for Monai Metrics. This is positive for the paper, demonstrates active community engagement.
- Code coverage currently at 93% but the coverage badge is not displaying correctly.
Next steps:
- Resolve dependency/package issue blocking metrics installation
- Fix code coverage badge display