MICS-Lab
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 34 additions & 0 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 67 additions & 0 deletions b/‎README.md‎
Lines changed: 67 additions & 0 deletions
diff --git a/‎docs/api.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/api.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/banner.svg‎
Lines changed: 1 addition & 0 deletions b/‎docs/banner.svg‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/custom_config.md‎
Lines changed: 83 additions & 0 deletions b/‎docs/custom_config.md‎
Lines changed: 83 additions & 0 deletions
diff --git a/‎docs/custom_model.md‎
Lines changed: 71 additions & 0 deletions b/‎docs/custom_model.md‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎docs/getting_started.md‎
Lines changed: 79 additions & 0 deletions b/‎docs/getting_started.md‎
Lines changed: 79 additions & 0 deletions
@@ -0,0 +1,34 @@
+# This workflow will install Python dependencies, run tests and lint with a single version of Python
+# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
+
+name: Python application
+
+on:
+  push:
+    branches: [ "main" ]
+  pull_request:
+    branches: [ "main" ]
+
+permissions:
+  contents: read
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.10"]
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install the latest version of uv and set the python version
+        uses: astral-sh/setup-uv@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+          enable-cache: false
+      - name: Install dependencies
+        run: uv pip install ".[dev]"
+      - name: Loggin wandb
+        shell: bash
+        run: wandb login "$(printf '%*s' 40 '' | tr ' ' 'X')"
+      - name: Run tests
+        run: pytest
@@ -0,0 +1,3 @@
+__pycache__/
+*.py[cod]
+*.out
@@ -0,0 +1,67 @@
+[![Python application](https://github.com/PierreMarza/fmbench2dev/actions/workflows/ci.yml/badge.svg)](https://github.com/PierreMarza/fmbench2dev/actions/workflows/ci.yml)
+
+# Tile-level Histopathology image Understanding benchmark
+
+<img src="docs/banner.svg" />
+
+We introduce **THUNDER**, a comprehensive benchmark designed to rigorously compare foundation models across various downstream tasks in computational pathology. THUNDER enables the evaluation and analysis of feature representations, robustness, and uncertainty quantification of these models across different datasets. Our benchmark encompasses a diverse collection of well-established datasets, covering multiple cancer types, image magnifications, and varying image and sample sizes. We propose an extensive set of tasks aimed at thoroughly assessing the capabilities and limitations of foundation models in digital pathology.
+
+
+## Overview
+
+We propose a benchmark to compare and study foundation models across three axes: (i) downstream task performance, (ii) feature space comparisons, and (iii) uncertainty and robustness. Our current version integrates 23 foundation models, vision-only, vision-language, trained on pathology or natural images, on 16 datasets covering different magnifications and organs. THUNDER also supports the use of new user-defined models for direct comparisons.
+
+<img src="docs/overview.svg" />
+
+
+## Usage
+
+An API and command line interface (CLI) are provided to allow users to download datasets, models, and run benchmarks. The API is designed to be user-friendly and allows for easy integration into existing workflows. The CLI provides a convenient way to access the same functionality from the command line.
+
+> [!IMPORTANT]
+> **Downloading supported foundation models**: you will have to visit the Huggingface URL of supported models you wish to use in order to accept usage conditions.
+
+### API Usage
+When using the API you can run the following code to download datasets, models and run a benchmark:
+
+```python
+from thunder import benchmark
+
+benchmark("phikon", "break_his", "knn")
+```
+
+### CLI Usage
+When using the CLI you can run the following command to see all available options,
+
+```console
+thunder --help
+```
+
+In order to reproduce the above example you can run the following command:
+
+```console
+thunder benchmark phikon break_his knn
+```
+
+## Installing thunder
+
+Code tested with Python 3.10. To replicate, you can create the following conda environment and activate it,
+```console
+conda create -n thunder_env python=3.10
+conda activate thunder_env
+```
+
+To install `thunder` run the following command:
+
+```console
+pip install -e . # install the package in editable mode
+pip install . # install the package
+```
+
+Before running `thunder`, ensure that the environment variable `THUNDER_BASE_DATA_FOLDER` is defined. This variable specifies the path where outputs, foundation models, and datasets will be stored. You can set it by running:
+
+```console
+export THUNDER_BASE_DATA_FOLDER="/path/to/your/data/folder"
+```
+
+Replace `/path/to/your/data/folder` with your desired storage directory.
@@ -0,0 +1,9 @@
+::: thunder.benchmark.benchmark
+
+::: thunder.download_datasets
+
+::: thunder.download_models
+
+::: thunder.generate_splits
+
+::: thunder.models.PretrainedModel
@@ -0,0 +1,83 @@
+## Overriding config parameters
+
+Default parameters are used for various aspects of the benchmark, e.g., the batch sizes, learning rates. These default parameters can be overriden using the following syntaxes for both CLI and API uses.
+
+```bash
+thunder benchmark hiboub bach knn --task.pre_comp_emb_batch_size 123 \
+                                  --task.k_vals "[1, 2, 3]"
+```
+
+```python
+import thunder
+thunder.benchmark('hiboub',
+                  'bach',
+                  'knn',
+                  **{'task.pre_comp_emb_batch_size': 123, 'task.k_vals': [1, 2, 3]})
+```
+
+## Overridable parameters
+Here is a non exhaustive list of the parameters that you may want to override per task, as well as the type and a small description.
+
+### Frozen linear probing
+| Name | Type | Description |
+|------|---|---|
+| adaptation.batch_size | int | Batch size used for training. |
+| adaptation.num_workers | int | Number of workers for the data loader. |
+| adaptation.lr | list[int] | List of learning rates used for the grid search. |
+| adaptation.weight_decay | list[int] | List of weight decays used for the grid search. |
+| adaptation.epochs | int | Number of training epochs. |
+
+
+### LoRA linear probing
+| Name | Type | Description |
+|------|---|---|
+| adaptation.lora_rank | int | Rank for the LoRA adapter. |
+| adaptation.lora_alpha | int | Alpha parameter for LoRA. |
+| adaptation.batch_size | int | Batch size used for training. |
+| adaptation.num_workers | int | Number of workers for the data loader. |
+| adaptation.lr | list[int] | List of learning rates used for the grid search. |
+| adaptation.weight_decay | list[int] | List of weight decays used for the grid search. |
+| adaptation.epochs | int | Number of training epochs. |
+
+### Adversarial attack
+| Name | Type | Description |
+|------|---|---|
+| task.pre_comp_emb_batch_size | int | Batch size for precomputing the embeddings. |
+| task.attack_batch_size | int | Batch size for the attacks. |
+| task.nb_attack_images | int | Number of images to use. |
+| task.attack.eps | float | Radius of the norm ball. |
+| task.attack.alpha | float | Step size per PGD iteration. |
+| task.attack.n_steps | int | Number of PGD iterations. |
+
+### Alignment scoring
+| Name | Type | Description |
+|------|---|---|
+| task.pre_comp_emb_batch_size | int | Batch size for precomputing the embeddings. |
+
+### Image retrieval
+| Name | Type | Description |
+|------|---|---|
+| task.pre_comp_emb_batch_size | int | Batch size for precomputing the embeddings. |
+| task.k_vals | list[int] | Values of k to use. |
+
+### K-nn
+| Name | Type | Description |
+|------|---|---|
+| task.pre_comp_emb_batch_size | int | Batch size for precomputing the embeddings. |
+| task.k_vals | list[int] | Values of k to use. |
+
+### Precomputing embeddings
+| Name | Type | Description |
+|------|---|---|
+| task.pre_comp_emb_batch_size | int | Batch size for precomputing the embeddings. |
+
+### Simple shot
+| Name | Type | Description |
+|------|---|---|
+| task.pre_comp_emb_batch_size | int | Batch size for precomputing the embeddings. |
+
+### Transformation invariance
+| Name | Type | Description |
+|------|---|---|
+| task.pre_comp_emb_batch_size | int | Batch size for precomputing the embeddings. |
+| task.nb_images | int | Number of images to use. |
@@ -0,0 +1,71 @@
+You can use any custom model to run the benchmark by inheriting from the `thunder.models.PretrainedModel` class.
+
+!!!note
+    A few examples of such files described bellow can be found in the `examples` folder of the repository.
+
+To do so you will need to prepare a `.py` with a class definition of your model that inherits from `thunder.models.PretrainedModel` and overrides the following methods:
+
+- `get_transform`: This method should return a transform function that will be used to preprocess the input data. The transform function should take a single argument, which is the input data, and return the transformed data.
+- `get_linear_probing_embeddings`: This method should return the embeddings for the linear probing task. It should take a single argument, which is the input data, and return the embeddings (bs, emb_size).
+- `get_segmentation_embeddings`: This method should return the embeddings for the segmentation task. It should take a single argument, which is the input data, and return the embeddings (bs, tokens, emb_size).
+
+Additionally two properties should be available in the class: 
+
+- `name`: This property should return the name of the model.  
+- `emb_dim`: This property should return the embedding dimension of the model.
+
+Here is an example of such a file:
+
+```python
+# my_model.py
+from thunder.models import PretrainedModel
+
+class DINOv2Features(PretrainedModel):
+    def __init__(self):
+        super().__init__()
+        
+        import torch
+        from torchvision import transforms
+
+        self.dinov2 = torch.hub.load("facebookresearch/dinov2", "dinov2_vits14")
+        self.t = transforms.Compose(
+            [
+                transforms.ToTensor(),
+                transforms.Resize((224, 224)),
+            ]
+        )
+        self.name = "dinov2_vits14"
+        self.emb_dim = 384
+
+    def forward(self, x):
+        feats = self.dinov2.forward_features(x)
+        return feats
+
+    def get_transform(self):
+        return self.t
+
+    def get_linear_probing_embeddings(self, x):
+        x = self.dinov2.forward_features(x)
+        return x["x_norm_clstoken"]
+    
+    def get_segmentation_embeddings(self, x):
+        x = self.dinov2.forward_features(x)
+        return x['x_norm_patchtokens']
+```
+
+With this file ready, you can run any benchmark task using the following command:
+
+```console
+thunder benchmark custom:my_model.py db_name task_name
+```
+
+or through the API:
+
+```python
+from thunder import benchmark
+from my_model import DINOv2Features
+
+if __name__ == "__main__":
+    model = DINOv2Features()
+    benchmark(model, dataset="ccrcc", task="linear_probing")
+```
@@ -0,0 +1,79 @@
+## Installation
+
+In order to use the package, you need to install it first. You can do this by running the following command in your terminal:
+
+```console
+pip install thundr
+```
+
+The package is storing all the datasets, models and results under a folder that you will need to define through the environment variable `THUNDER_BASE_DATA_FOLDER`. You can do this by running the following command in your terminal:
+```console
+export THUNDER_BASE_DATA_FOLDER=/path/to/thunder_base_data_folder
+```
+
+!!!important
+    Without this environment variable, the package will not work. The folder should be empty and the package will create the necessary subfolders.
+
+## CLI Usage
+
+You can run the following command to see all available options,
+```console
+> thunder --help
+ Usage: thunder [OPTIONS] COMMAND [ARGS]...
+```
+
+The available commands are:  
+- `benchmark`: Benchmarks the models on the datasets for a task.  
+- `download-datasets`: Downloads datasets.  
+- `download-models`: Downloads models.  
+- `generate-data-splits`: Generate data splits for the downloaded datasets.  
+- `results-summary`: Compiles a summary csv file of the results.
+
+
+To benchmark the models, you can run the following command,
+```console
+> thunder benchmark --help
+Usage: thunder benchmark [OPTIONS] MODEL DATASET TASK
+> thunder benchmark phikon ccrcc knn
+```
+
+In case you want to download a datasets, you can run the following command,
+```console
+> thunder download-datasets ccrcc patch_camelyon bach
+> thunder download-datasets classification
+> thunder download-datasets all --make_splits # Generates splits after downloading
+```
+
+To download the models, you can run the following command,
+```console
+> thunder download-models phikon keep
+> thunder download-models dinov2base
+```
+
+To generate splits for the downloaded, you can run the following command,
+```console
+> thunder generate-data-splits ccrcc patch_camelyon bach
+> thunder generate-data-splits classification
+> thunder generate-data-splits all
+```
+
+## API Usage
+
+You can also use the package as a library. For example, you can run the following code to download datasets,
+```python
+from thunder import download_datasets, download_models, generate_splits, benchmark
+
+# Download datasets
+download_datasets(["ccrcc", "patch_camelyon", "bach"])
+download_datasets(["all"])
+download_datasets(["classification"])
+
+# Download models
+download_models(["phikon", "dinov2base"])
+
+# Generate data splits
+generate_splits(["all"])
+
+# Benchmark
+benchmark(model="phikon", dataset="ccrcc", task="knn")
+```
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+__pycache__/`
	`2`	`+*.py[cod]`
	`3`	`+*.out`