diff --git a/.DS_Store b/.DS_Store index 6bedce7..73046c7 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/.gitignore b/.gitignore index a337d16..840a91a 100644 --- a/.gitignore +++ b/.gitignore @@ -156,6 +156,9 @@ data/ .DS_Store # Exclude not needed notebooks files -notebooks/exploration.ipynb +#notebooks/exploration.ipynb notebooks/class_template.ipynb -notebooks/few_shot_approaches_setup.ipynb \ No newline at end of file +notebooks/few_shot_approaches_setup.ipynb + +presentation/tutorial-new-tutorial-group-1-NARRATIVE.html +presentation/figures/tutorial-new-tutorial-group-1-NARRATIVE.ipynb \ No newline at end of file diff --git a/dataset_strategy_comparison.md b/dataset_strategy_comparison.md deleted file mode 100644 index 4ce596c..0000000 --- a/dataset_strategy_comparison.md +++ /dev/null @@ -1,525 +0,0 @@ -# Dataset Strategy Comparison for Few-Shot Rooftop Segmentation Tutorial - -**Goal**: Compare three different dataset strategies for teaching few-shot learning for rooftop/building segmentation. - -#### ATTENTION: This document is AI generated, and should be used as a checklist / orientative document. Information have been checked, and should be acceptably accurate. - ---- - -## Executive Summary - -| Strategy | Complexity | Realism | Tutorial Clarity | Implementation Effort | Recommendation | -|----------|------------|---------|------------------|----------------------|----------------| -| **Geneva + Inria** | Medium | High | ⭐⭐⭐⭐⭐ | Medium | **Best for education** | -| **Only Inria** | Low-Medium | Medium-High | ⭐⭐⭐⭐ | Low | **Best for simplicity** | -| **Only RID** | High | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | High | **Best for realism** | - -**Quick Recommendation**: -- **For teaching fundamentals**: Use Geneva + Inria (Strategy 1) -- **For quick implementation**: Use Only Inria (Strategy 2) -- **For real-world impact**: Use Only RID (Strategy 3) - ---- - -## Dataset Deep Dive - -### Geneva Satellite Dataset -**Source**: HuggingFace (`raphaelattias/overfitteam-geneva-satellite-images`) - -**Key Facts**: -- **Size**: 1,050 labeled image-mask pairs -- **Task**: Binary segmentation (rooftop vs background) -- **Geographic splits**: 3 grids (1301_11, 1301_13, 1301_31) -- **Image size**: 250x250 pixels -- **Categories**: All, Industrial, Residential -- **Characteristics**: Single city, well-defined geographic regions - -**Strengths**: -- ✅ Clean binary task -- ✅ Built-in geographic splits -- ✅ Easy to use (already on HuggingFace) -- ✅ Good size for tutorial (not too large) -- ✅ Clear within-city domain shift - -**Limitations**: -- ❌ Only one city -- ❌ Relatively small compared to others -- ❌ Simple task (binary only) - ---- - -### Inria Aerial Image Dataset -**Source**: https://project.inria.fr/aerialimagelabeling/ -**HuggingFace**: `Jonathan/INRIA-Aerial-Dataset` - -**Key Facts**: -- **Size**: 360 images (180 train + 180 test) -- **Coverage**: 180 km² across 5 cities -- **Cities**: - - **Austin, Texas** (US) - Suburban sprawl - - **Chicago, Illinois** (US) - Dense urban - - **Kitsap County, Washington** (US) - Rural/suburban - - **Vienna, Austria** (EU) - European urban - - **Tyrol-Innsbruck, Austria** (EU) - Mountain town -- **Task**: Binary segmentation (building vs background) -- **Image size**: 5000x5000 pixels (need tiling) -- **Resolution**: 0.3m per pixel - -**Strengths**: -- ✅ Multiple cities (5 different domains) -- ✅ Geographic diversity (US + Europe) -- ✅ Urban diversity (dense city, suburban, rural) -- ✅ Large coverage area -- ✅ Same task as Geneva (binary) -- ✅ Well-documented benchmark dataset -- ✅ Freely available - -**Limitations**: -- ❌ Large file sizes -- ❌ Only building footprints (not roof-specific) -- ❌ Fixed train/test split per city - -**City Characteristics**: -``` -Austin: Large, low-density sprawl, similar-looking houses -Chicago: Dense urban, varied building types, tall buildings -Kitsap: Rural/suburban, scattered buildings, trees -Vienna: European architecture, moderate density, distinct style -Tyrol: Mountain town, alpine architecture, complex terrain -``` - ---- - -### RID - Roof Information Dataset -**Source**: mediaTUM (TUM) + GitHub (https://github.com/TUMFTM/RID) - -**Key Facts**: -- **Size**: 1,880 images (+ 26 for annotation experiment) -- **Task 1**: Roof segments - 18 classes - - Background + 16 azimuth directions (N, NNE, NE, ENE, E, ...) + Flat -- **Task 2**: Roof superstructures - Multiple classes - - PV modules, windows, chimneys, dormer windows, etc. -- **Image source**: Google Maps Static API (georeferenced) -- **Cities**: Multiple (not explicitly listed, but multi-city dataset) -- **Split**: 1880 images with train/val/test splits provided - -**Strengths**: -- ✅ **Directly relevant**: Designed for solar panel assessment -- ✅ **Multi-class**: More realistic task complexity -- ✅ **Rich annotations**: Both segments and superstructures -- ✅ **Large size**: 1,880 images -- ✅ **Quality control**: Reviewed annotations, multiple labelers -- ✅ **Practical application**: Real-world solar potential assessment -- ✅ **Code available**: GitHub repo with data preparation - -**Limitations**: -- ❌ **Complex**: Multi-class segmentation harder to learn -- ❌ **Google Maps imagery**: Different from satellite (closer, angled) - -**Task Complexity Comparison**: -``` -Geneva: 2 classes (rooftop, background) -Inria: 2 classes (building, background) -RID-S: 18 classes (16 directions + flat + background) -RID-SS: ~10 classes (PV, window, chimney, dormer, ...) -``` ---- - -## Strategy 1: Geneva + Inria - -### Overview -Use Geneva as base dataset, Inria cities as cross-city transfer targets. - -### Task Structure -``` -Phase 1: Within-City Transfer (Geneva) -├─ Train: Grid 1301_11 (295 images) -├─ Test: Grid 1301_13 (76 images) - Different neighborhood -└─ Test: Grid 1301_31 (49 images) - Different neighborhood - -Phase 2: Cross-City Transfer (Geneva → Inria) -├─ Train: Grid 1301_11 (Geneva) -├─ Test: Vienna (European, similar to Geneva) -├─ Test: Austin (US suburban, different architecture) -└─ Optional: Chicago, Kitsap, Tyrol (varying difficulty) -``` - -### Few-Shot Setup -**Training** (on Geneva Grid 1301_11): -- Fine-tuning baseline: Standard supervised learning -- Prototypical Networks: Episodic training (K=3-5 per episode) -- PANet: Episodic training with MAP - -**Evaluation** (on target domains): -```python -For each target (1301_13, 1301_31, Vienna, Austin): - For K in [1, 3, 5, 10, 20]: - # Select K support examples from target - support_set = random_sample(target_data, K) - query_set = remaining_data - - # Apply method - predictions = method.predict(support_set, query_set) - - # Evaluate - iou = compute_iou(predictions, query_masks) -``` - -### Expected Domain Shift -``` -Geneva 1301_11 → Geneva 1301_13: SMALL (same city, diff neighborhood) -Geneva 1301_11 → Geneva 1301_31: SMALL-MEDIUM (different area) -Geneva 1301_11 → Vienna: MEDIUM (Euro→Euro, diff city) -Geneva 1301_11 → Austin: LARGE (Euro→US, diff architecture) -``` - -### Advantages -✅ **Progressive difficulty**: Small → Large domain shift -✅ **Same task throughout**: Binary segmentation -✅ **Clear learning progression**: Students see increasing challenge -✅ **Multiple test cases**: 5 target domains (2 Geneva + 3 Inria) -✅ **Well-documented**: Both datasets established -✅ **Best for teaching**: Clear concept progression - -### Disadvantages -❌ **Two datasets**: More setup complexity -❌ **Preprocessing needed**: Tile Inria 5000x5000 → 250x250 -❌ **Different sources**: Satellite vs aerial imagery mix - -### Implementation Complexity: **Medium** (6/10) - -### Tutorial Timeline -``` -Week 1: Geneva setup + baseline (Grid 1301_11 → 1301_13/31) -Week 2: Prototypical Networks on Geneva -Week 3: PANet on Geneva + comparison -Week 4: Cross-city transfer (Vienna, Austin) -``` - -### Key Learning Outcomes -1. Understand few-shot learning fundamentals -2. See domain adaptation on same task -3. Compare methods across varying domain shifts -4. Learn practical deployment scenario (city → city) - ---- - -## Strategy 2: Only Inria (Multi-City) - -### Overview -Use only Inria dataset, train on one city, test on others. - -### Task Structure -``` -Single-Dataset Multi-City Transfer - -Train: Vienna (36 train images × tiled = ~576 patches) - -Test Domains: -├─ Austin (US suburban - LARGE shift from Vienna) -├─ Chicago (US urban - LARGE shift, but urban like Vienna) -├─ Kitsap (US rural - VERY LARGE shift) -└─ Tyrol (Austrian mountain - MEDIUM shift, both European) -``` - -### Alternative Training Cities -``` -Option A: Train on Vienna (European urban) - → Test on Austin, Chicago, Kitsap, Tyrol - → Bias: European → American - -Option B: Train on Austin (US suburban) - → Test on Vienna, Chicago, Kitsap, Tyrol - → Bias: Suburban → diverse targets - -Option C: Train on Chicago (Dense urban) - → Test on Vienna, Austin, Kitsap, Tyrol - → Bias: Dense urban → varying densities -``` - -### Few-Shot Setup -**Training** (e.g., on Vienna): -```python -# Tile Vienna images into 250x250 patches -vienna_patches = tile_images(vienna_train, patch_size=250) -# Result: ~576 training patches - -# Episodic training -for episode in range(num_episodes): - support, query = sample_episode(vienna_patches, K=5, Q=5) - # Train meta-learning model -``` - -**Evaluation** (on other cities): -```python -For each target_city in [Austin, Chicago, Kitsap, Tyrol]: - # Tile target city images - city_patches = tile_images(target_city, patch_size=250) - - For K in [1, 3, 5, 10, 20]: - support = random_sample(city_patches, K) - query = remaining_patches - - predictions = method.predict(support, query) - iou = compute_iou(predictions, query_masks) -``` - -### Expected Domain Shift -``` -Vienna → Tyrol: MEDIUM (both Austrian, diff terrain) -Vienna → Chicago: LARGE (Euro→US, but both urban) -Vienna → Austin: VERY LARGE (Euro→US, urban→suburban) -Vienna → Kitsap: VERY LARGE (Euro→US, urban→rural) -``` - -### Advantages -✅ **Single dataset**: Simpler setup, one data source -✅ **Multiple targets**: 4 different cities to test on -✅ **Established benchmark**: Well-known dataset -✅ **Same task**: Binary segmentation throughout -✅ **Good size**: After tiling, ~500-600 patches per city -✅ **Geographic diversity**: US + Europe, urban + suburban + rural -✅ **Easy to extend**: Can add all 5 cities as targets - -### Disadvantages -❌ **No within-city baseline**: Can't show small domain shift first -❌ **All shifts are large**: Harder to see method differences -❌ **Requires tiling**: Must preprocess 5000x5000 images -❌ **Building vs rooftop**: Not exactly rooftops (full building footprints) -❌ **Less progression**: Jumps straight to hard cross-city transfer - -### Implementation Complexity: **Low-Medium** (4/10) - -### Tutorial Timeline -``` -Week 1: Inria setup + tiling + Vienna baseline -Week 2: Prototypical Networks on Vienna -Week 3: PANet on Vienna + comparison -Week 4: Multi-city evaluation (4 cities) -``` - -### Key Learning Outcomes -1. Understand few-shot learning on real cross-city task -2. See performance across very different domains -3. Compare method robustness to large domain shifts -4. Learn practical multi-city deployment - -### Recommendation for This Strategy -**Best training city**: Vienna -- **Why**: European city, moderate density, good variety -- **Test on**: Austin (easiest), Chicago (medium), Kitsap (hardest) -- **Skip**: Tyrol initially (too similar to Vienna) - ---- - -## Strategy 3: Only RID (Multi-Class, Multi-City) - -### Overview -Use only RID dataset for realistic solar panel assessment task. - -### Task Structure - -**Option A: Roof Segments (18-class)** -``` -Task: Predict roof orientation (azimuth + flat) -Classes: Background, N, NNE, NE, ENE, E, ESE, SE, SSE, - S, SSW, SW, WSW, W, WNW, NW, NNW, Flat - -Train: Subset of cities (e.g., 60% of 1880 = ~1128 images) -Val: 20% (~376 images) -Test: 20% (~376 images) - -Few-shot: Use K examples from test set cities -``` - -**Option B: Roof Superstructures (Multi-class)** -``` -Task: Detect roof features for solar assessment -Classes: Background, PV module, Window, Chimney, - Dormer, Satellite dish, etc. - -Same split as Option A -``` - -**Option C: Hierarchical (Recommended)** -``` -Phase 1: Coarse segmentation (Roof vs Background) - - Simplify RID masks to binary - - Train baseline model - -Phase 2: Fine-grained segmentation (Superstructures) - - Given roof regions, detect PV modules, windows, etc. - - Use K examples for few-shot superstructure detection - -This shows: Can we adapt from "find roofs" to "analyze roofs"? -``` - -### Few-Shot Setup (Option B - Superstructures) - -**Data Preparation**: -```python -# RID provides georeferenced images + annotations -# Use their GitHub code to generate masks - -from rid_tools import generate_masks - -# Generate superstructure masks -masks = generate_masks( - annotations_path='RID/annotations/', - task='superstructures', - classes=['background', 'pvmodule', 'window', 'chimney', 'dormer'] -) -``` - -**Training**: -```python -# Option 1: Standard split (not city-based) -train_images = rid_data[:1128] # 60% -val_images = rid_data[1128:1504] # 20% -test_images = rid_data[1504:] # 20% - -# Option 2: City-based split (if city info available) -train_cities = ['CityA', 'CityB', 'CityC'] -test_cities = ['CityD', 'CityE'] - -# Episodic training on train set -for episode in range(num_episodes): - # N-way K-shot episodes - classes = sample_classes(N=3) # e.g., pvmodule, window, chimney - support, query = sample_episode(classes, K=5, Q=5) -``` - -**Evaluation**: -```python -# Few-shot evaluation on test set -For K in [1, 3, 5, 10, 20]: - # Sample K examples of each class - support_set = sample_per_class(test_images, K_per_class=K) - query_set = remaining_test_images - - # Predict - predictions = method.predict(support_set, query_set) - - # Compute metrics - iou_per_class = compute_iou(predictions, query_masks, classes) - mean_iou = iou_per_class.mean() -``` - -### Expected Challenges -``` -Binary (Geneva/Inria): Easy - 2 classes, clear boundaries -RID Segments: Hard - 18 classes, subtle differences (NNE vs NE) -RID Superstructures: Medium-Hard - Fewer classes but small objects -``` - -### Advantages -✅ **Most realistic**: Actual solar panel assessment task -✅ **Rich annotations**: Multiple semantic levels -✅ **Direct application**: PV module detection is the end goal -✅ **Multi-class few-shot**: Shows method capability on hard task -✅ **Single dataset**: No need to integrate multiple sources -✅ **Large size**: 1,880 images is substantial -✅ **Quality data**: Reviewed annotations, multiple labelers -✅ **Code available**: GitHub repo with utilities - -### Disadvantages -❌ **High complexity**: Multi-class harder to learn/teach -❌ **Requires preprocessing**: Must generate masks from annotations -❌ **Different task**: Not comparable to Geneva/Inria -❌ **Harder baselines**: Multi-class needs more careful implementation -❌ **Less established**: Newer dataset, fewer examples to follow -❌ **License restrictions**: CC-BY-NC (non-commercial) -❌ **Google Maps imagery**: Different from satellite -❌ **Small objects**: PV modules, windows are small, harder to segment - -### Implementation Complexity: **High** (8/10) - -### Tutorial Timeline -``` -Week 1: RID setup + mask generation + data exploration -Week 2: Binary baseline + multi-class baseline -Week 3-4: Prototypical Networks (need more time for multi-class) -Week 5: PANet + comparison -Week 6: Analysis and visualization -``` - -### Key Learning Outcomes -1. Understand few-shot learning on complex multi-class task -2. Learn N-way K-shot episodic training -3. See real-world application (solar assessment) -4. Handle class imbalance (some classes rare) -5. Deal with small objects (PV modules) - -### Recommendation for This Strategy -**If using RID, focus on**: -- **Task**: Roof superstructures (more interpretable than 18 directions) -- **Classes**: PV module, Window, Chimney, Background (4-way task) -- **Approach**: Start with binary (roof vs background), then few-shot superstructures -- **K values**: Use higher K (5-20) due to multi-class complexity - ---- - -## Head-to-Head Comparison - -### 1. Tutorial Clarity - -**Winner: Geneva + Inria** ⭐⭐⭐⭐⭐ - -**Reasoning**: -- Clear progression: small shift (Geneva grids) → large shift (Inria cities) -- Same task (binary) throughout makes concept learning easier -- Students can focus on few-shot methods, not task complexity - -**Rankings**: -1. Geneva + Inria (5/5) - Clearest learning path -2. Only Inria (4/5) - Still clear, but no easy baseline -3. Only RID (3/5) - Multi-class adds complexity - ---- - -### 2. Real-World Relevance - -**Winner: Only RID** ⭐⭐⭐⭐⭐ - -**Reasoning**: -- Direct solar panel application (PV module detection) -- Multi-class is more realistic than binary -- Actually detects roof features, not just roofs - -**Rankings**: -1. Only RID (5/5) - Most realistic solar assessment -2. Geneva + Inria (4/5) - City deployment scenario realistic -3. Only Inria (3.5/5) - Building footprints less specific than roofs - ---- - -### 3. Implementation Effort - -**Winner: Only Inria** ⭐⭐⭐⭐⭐ - -**Reasoning**: -- Single dataset to manage -- Simple binary task -- Well-documented, many examples - -**Rankings**: -1. Only Inria (5/5) - Simplest -2. Geneva + Inria (3/5) - Two datasets, tiling needed -3. Only RID (2/5) - Multi-class, preprocessing, less docs - ---- - -### 4. Dataset Size & Diversity - -**Winner: Only RID** (size) / **Only Inria** (diversity) - -**For Size**: -1. Only RID: 1,880 images -2. Geneva + Inria: ~1,050 + ~360 (after tiling ~1,050 + 1,800) = ~2,850 -3. Only Inria: ~360 (after tiling ~1,800) - -**For Geographic Diversity**: -1. Only Inria: 5 cities, 3 countries, varied terrain -2. Geneva + Inria: 6 cities total -3. Only RID: Multiple cities (but not specified) - diff --git a/notebooks/try_few_shot.ipynb b/notebooks/try_few_shot.ipynb index 85f0946..f9c51cc 100644 --- a/notebooks/try_few_shot.ipynb +++ b/notebooks/try_few_shot.ipynb @@ -101,14 +101,14 @@ "import matplotlib.pyplot as plt\n", "\n", "import torch\n", - "import torch.nn as nn\n", + "from torch import nn\n", "import torch.nn.functional as F\n", "from torch.utils.data import Dataset, DataLoader, Subset\n", "from torchvision import transforms, models\n", "from huggingface_hub import snapshot_download\n", "\n", "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", - "print(\"Using device:\", device)\n" + "print(\"Using device:\", device)" ] }, { @@ -131,10 +131,7 @@ "metadata": {}, "outputs": [], "source": [ - "dataset_root = snapshot_download(\n", - " repo_id=\"raphaelattias/overfitteam-geneva-satellite-images\",\n", - " repo_type=\"dataset\"\n", - ")\n", + "dataset_root = snapshot_download(repo_id=\"raphaelattias/overfitteam-geneva-satellite-images\", repo_type=\"dataset\")\n", "print(\"Dataset root:\", dataset_root)" ] }, @@ -166,28 +163,34 @@ "\n", "IMAGE_SIZE = 256 # resize tiles to this\n", "\n", - "img_transform = transforms.Compose([\n", - " transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),\n", - " transforms.ToTensor(),\n", - " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", - " std=[0.229, 0.224, 0.225]),\n", - "])\n", + "img_transform = transforms.Compose(\n", + " [\n", + " transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n", + " ]\n", + ")\n", + "\n", + "mask_transform = transforms.Compose(\n", + " [\n", + " transforms.Resize((IMAGE_SIZE, IMAGE_SIZE), interpolation=Image.NEAREST),\n", + " transforms.ToTensor(), # gives float [0,1] for grayscale\n", + " ]\n", + ")\n", "\n", - "mask_transform = transforms.Compose([\n", - " transforms.Resize((IMAGE_SIZE, IMAGE_SIZE), interpolation=Image.NEAREST),\n", - " transforms.ToTensor(), # gives float [0,1] for grayscale\n", - "])\n", "\n", "def get_grid_id_from_filename(fname):\n", " parts = fname.split(\"_\")\n", " # 3th and 3th parts give the grid ID\n", " return f\"{parts[2]}_{parts[3]}\"\n", "\n", + "\n", "class GenevaRooftopDataset(Dataset):\n", " \"\"\"\n", " Dataset filtered by geographic grid IDs.\n", " Can read from multiple splits (train/val/test) at once.\n", " \"\"\"\n", + "\n", " def __init__(self, root, splits=[\"train\", \"val\", \"test\"], category=\"all\", grid_ids=None):\n", " super().__init__()\n", " self.root = root\n", @@ -228,9 +231,8 @@ " return img, mask\n", "\n", "\n", - "\n", "train_grids = [\"1301_11\", \"1301_31\"]\n", - "test_grids = [\"1301_13\"]\n", + "test_grids = [\"1301_13\"]\n", "\n", "# Train dataset reads from all three folders\n", "train_base = GenevaRooftopDataset(dataset_root, splits=[\"train\", \"val\", \"test\"], grid_ids=train_grids)\n", @@ -238,7 +240,7 @@ "# Test dataset can read from just one folder or multiple if needed\n", "test_base = GenevaRooftopDataset(dataset_root, splits=[\"train\", \"val\", \"test\"], grid_ids=test_grids)\n", "\n", - "print(f\"Train samples: {len(train_base)}, Test samples: {len(test_base)}\")\n" + "print(f\"Train samples: {len(train_base)}, Test samples: {len(test_base)}\")" ] }, { @@ -263,8 +265,7 @@ "\n", " # Undo normalisation for plotting\n", " img_np = img.permute(1, 2, 0).numpy()\n", - " img_np = (img_np * np.array([0.229, 0.224, 0.225]) +\n", - " np.array([0.485, 0.456, 0.406]))\n", + " img_np = img_np * np.array([0.229, 0.224, 0.225]) + np.array([0.485, 0.456, 0.406])\n", " img_np = np.clip(img_np, 0, 1)\n", "\n", " # mask: [1, H, W] -> [H, W]\n", @@ -282,7 +283,8 @@ " plt.axis(\"off\")\n", " plt.show()\n", "\n", - "show_sample(train_base)\n" + "\n", + "show_sample(train_base)" ] }, { @@ -323,6 +325,7 @@ " \"\"\"\n", " Yields (support_imgs [K,3,H,W], support_masks [K,1,H,W], query_img, query_mask)\n", " \"\"\"\n", + "\n", " def __init__(self, base_dataset, episodes_per_epoch=750, K=1):\n", " self.base = base_dataset\n", " self.episodes_per_epoch = episodes_per_epoch\n", @@ -344,14 +347,15 @@ "\n", " img_q, mask_q = self.base[query_idx]\n", "\n", - " imgs_s = torch.stack(imgs_s, dim=0) # [K,3,H,W]\n", - " masks_s = torch.stack(masks_s, dim=0) # [K,1,H,W]\n", + " imgs_s = torch.stack(imgs_s, dim=0) # [K,3,H,W]\n", + " masks_s = torch.stack(masks_s, dim=0) # [K,1,H,W]\n", "\n", " return imgs_s, masks_s, img_q, mask_q\n", "\n", + "\n", "episodes_per_epoch = 2000\n", "episode_dataset = EpisodeDataset(train_base, episodes_per_epoch=episodes_per_epoch)\n", - "episode_loader = DataLoader(episode_dataset, batch_size=1, shuffle=True)\n" + "episode_loader = DataLoader(episode_dataset, batch_size=1, shuffle=True)" ] }, { @@ -416,6 +420,7 @@ "source": [ "from torchgeo.models import resnet18, ResNet18_Weights\n", "\n", + "\n", "class Encoder(nn.Module):\n", " def __init__(self, out_channels=256, pretrained=True):\n", " super().__init__()\n", @@ -435,14 +440,15 @@ " def forward(self, x):\n", " x = self.stem(x)\n", " f1 = self.layer1(x)\n", - " f2 = self.layer2(f1) # [B,128,H/4,W/4]\n", - " f3 = self.layer3(f2) # [B,256,H/8,W/8]\n", + " f2 = self.layer2(f1) # [B,128,H/4,W/4]\n", + " f3 = self.layer3(f2) # [B,256,H/8,W/8]\n", "\n", " f2_up = F.interpolate(f2, size=f3.shape[-2:], mode=\"bilinear\", align_corners=False)\n", " f = torch.cat([f2_up, f3], dim=1) # [B,384,H',W']\n", " f = self.proj(f)\n", " return f\n", "\n", + "\n", "encoder = Encoder(out_channels=256).to(device)\n", "print(encoder)" ] @@ -480,7 +486,6 @@ "# 6. Prototype computation & query classification\n", "# ============================================================\n", "\n", - "import torch.nn.functional as F\n", "\n", "def compute_prototypes(feat_support, mask_support):\n", " \"\"\"\n", @@ -491,15 +496,17 @@ " Returns: prototypes [2, C] (0=background, 1=foreground)\n", " \"\"\"\n", " # Downsample mask to feature resolution\n", - " mask_small = F.interpolate(mask_support, size=feat_support.shape[2:], mode=\"nearest\") # Downsamples masks to feature size via nearest neighbor\n", - " mask_fg = (mask_small > 0.5).float() # [K,1,H',W']\n", - " mask_bg = 1.0 - mask_fg # [K,1,H',W']\n", + " mask_small = F.interpolate(\n", + " mask_support, size=feat_support.shape[2:], mode=\"nearest\"\n", + " ) # Downsamples masks to feature size via nearest neighbor\n", + " mask_fg = (mask_small > 0.5).float() # [K,1,H',W']\n", + " mask_bg = 1.0 - mask_fg # [K,1,H',W']\n", "\n", " K, C, Hf, Wf = feat_support.shape\n", "\n", " # Flatten across batch and spatial dims: [K,C,H',W'] -> [C, K*H'*W']\n", " fs = feat_support.permute(1, 0, 2, 3).contiguous().view(C, -1) # [C, K*H'*W']\n", - " fg_w = mask_fg.view(1, -1) # [1, K*H'*W']\n", + " fg_w = mask_fg.view(1, -1) # [1, K*H'*W']\n", " bg_w = mask_bg.view(1, -1)\n", "\n", " eps = 1e-6\n", @@ -512,6 +519,7 @@ " prototypes = torch.stack([bg_proto, fg_proto], dim=0) # [2,C]\n", " return prototypes\n", "\n", + "\n", "def classify_query(feat_query, prototypes):\n", " \"\"\"\n", " Classify query pixels by distance to prototypes.\n", @@ -533,13 +541,13 @@ " # fq_batch: [1, H'*W', C], protos_batch: [1, 2, C]\n", " dists = torch.cdist(fq.unsqueeze(0), protos.unsqueeze(0)) # [1, H'*W', 2]\n", " dists = dists.squeeze(0) # [H'*W', 2]\n", - " dists = dists ** 2\n", + " dists = dists**2\n", "\n", " # Convert distances to similarity logits: negative distance\n", " logits_flat = -dists # [H'*W', 2]\n", " logits = logits_flat.t().view(1, 2, Hq, Wq) # [1,2,H',W']\n", "\n", - " return logits\n" + " return logits" ] }, { @@ -582,7 +590,7 @@ " union = pred.sum() + target.sum() - intersection\n", "\n", " iou = (intersection + eps) / (union + eps)\n", - " return iou.item()\n" + " return iou.item()" ] }, { @@ -616,40 +624,32 @@ " if \"layer3\" not in name and \"layer2\" not in name:\n", " param.requires_grad = False\n", "\n", - "optimizer = torch.optim.Adam(\n", - " encoder.parameters(),\n", - " lr=3e-4,\n", - " weight_decay=1e-4\n", - ")\n", + "optimizer = torch.optim.Adam(encoder.parameters(), lr=3e-4, weight_decay=1e-4)\n", "\n", "scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)\n", "\n", + "\n", "def meta_train(num_epochs=5):\n", " for epoch in range(1, num_epochs + 1):\n", " encoder.train()\n", "\n", " total_loss = 0.0\n", "\n", - " for (img_s, mask_s, img_q, mask_q) in episode_loader:\n", + " for img_s, mask_s, img_q, mask_q in episode_loader:\n", " # img_s: [1,K,3,H,W] – squeeze batch dim\n", - " img_s = img_s.squeeze(0).to(device) # [K,3,H,W]\n", - " mask_s = mask_s.squeeze(0).to(device) # [K,1,H,W]\n", - " img_q = img_q.to(device)\n", + " img_s = img_s.squeeze(0).to(device) # [K,3,H,W]\n", + " mask_s = mask_s.squeeze(0).to(device) # [K,1,H,W]\n", + " img_q = img_q.to(device)\n", " mask_q = mask_q.to(device)\n", "\n", " optimizer.zero_grad()\n", "\n", - " feat_s = encoder(img_s) # [K,C,H',W']\n", - " feat_q = encoder(img_q) # [1,C,H',W']\n", + " feat_s = encoder(img_s) # [K,C,H',W']\n", + " feat_q = encoder(img_q) # [1,C,H',W']\n", "\n", " prototypes = compute_prototypes(feat_s, mask_s)\n", " logits_q = classify_query(feat_q, prototypes)\n", - " logits_q = F.interpolate(\n", - " logits_q,\n", - " size=mask_q.shape[-2:],\n", - " mode=\"bilinear\",\n", - " align_corners=False\n", - " )\n", + " logits_q = F.interpolate(logits_q, size=mask_q.shape[-2:], mode=\"bilinear\", align_corners=False)\n", "\n", " target_q = mask_q.long().squeeze(1)\n", " loss = F.cross_entropy(logits_q, target_q)\n", @@ -664,8 +664,9 @@ "\n", " scheduler.step()\n", "\n", + "\n", "# Run meta-training\n", - "meta_train(num_epochs=10)\n" + "meta_train(num_epochs=10)" ] }, { @@ -677,14 +678,17 @@ "# After training epoch loop\n", "checkpoint_path = \"meta_train_checkpoint.pth\"\n", "\n", - "torch.save({\n", - " \"epoch\": 10,\n", - " \"model_state_dict\": encoder.state_dict(),\n", - " \"optimizer_state_dict\": optimizer.state_dict(),\n", - " \"scheduler_state_dict\": scheduler.state_dict(),\n", - "}, checkpoint_path)\n", + "torch.save(\n", + " {\n", + " \"epoch\": 10,\n", + " \"model_state_dict\": encoder.state_dict(),\n", + " \"optimizer_state_dict\": optimizer.state_dict(),\n", + " \"scheduler_state_dict\": scheduler.state_dict(),\n", + " },\n", + " checkpoint_path,\n", + ")\n", "\n", - "print(f\"Saved checkpoint to {checkpoint_path}\")\n" + "print(f\"Saved checkpoint to {checkpoint_path}\")" ] }, { @@ -732,6 +736,7 @@ "# 9. Few-shot inference on test images\n", "# ============================================================\n", "\n", + "\n", "def k_shot_predict(encoder, support_imgs, support_masks, query_img):\n", " \"\"\"\n", " K-shot segmentation for a query image given K support images+masks.\n", @@ -743,13 +748,13 @@ " \"\"\"\n", " encoder.eval()\n", " with torch.no_grad():\n", - " support_imgs = support_imgs.to(device) # [K,3,H,W]\n", - " support_masks = support_masks.to(device) # [K,1,H,W]\n", + " support_imgs = support_imgs.to(device) # [K,3,H,W]\n", + " support_masks = support_masks.to(device) # [K,1,H,W]\n", " query_img = query_img.to(device).unsqueeze(0) # [1,3,H,W]\n", "\n", " # Pass through encoder\n", " feat_s = encoder(support_imgs) # [K,C,H',W']\n", - " feat_q = encoder(query_img) # [1,C,H',W']\n", + " feat_q = encoder(query_img) # [1,C,H',W']\n", "\n", " # Compute prototypes\n", " prototypes = compute_prototypes(feat_s, support_masks) # [2,C]\n", @@ -765,13 +770,14 @@ "\n", " return logits.cpu()\n", "\n", + "\n", "def one_shot_predict(encoder, support_img, support_mask, query_img):\n", " \"\"\"\n", " 1-shot helper that wraps single support into K=1 form.\n", " \"\"\"\n", - " support_imgs = support_img.unsqueeze(0) # [1,3,H,W]\n", - " support_masks = support_mask.unsqueeze(0) # [1,1,H,W]\n", - " return k_shot_predict(encoder, support_imgs, support_masks, query_img)\n" + " support_imgs = support_img.unsqueeze(0) # [1,3,H,W]\n", + " support_masks = support_mask.unsqueeze(0) # [1,1,H,W]\n", + " return k_shot_predict(encoder, support_imgs, support_masks, query_img)" ] }, { @@ -801,9 +807,9 @@ "# 9a. K-shot inference on test images\n", "# ============================================================\n", "\n", - "import numpy as np\n", "import torch\n", "\n", + "\n", "def evaluate_kshot_iou(encoder, train_dataset, test_dataset, K=5, num_samples=None):\n", " \"\"\"\n", " Evaluate K-shot IoU on 'num_samples' random test images.\n", @@ -830,8 +836,8 @@ " img_s, mask_s = train_dataset[si]\n", " support_imgs.append(img_s)\n", " support_masks.append(mask_s)\n", - " support_imgs = torch.stack(support_imgs, dim=0) # [K,3,H,W]\n", - " support_masks = torch.stack(support_masks, dim=0) # [K,1,H,W]\n", + " support_imgs = torch.stack(support_imgs, dim=0) # [K,3,H,W]\n", + " support_masks = torch.stack(support_masks, dim=0) # [K,1,H,W]\n", "\n", " # run K-shot prediction\n", " logits = k_shot_predict(encoder, support_imgs, support_masks, img_q) # [1,2,H,W]\n", @@ -870,11 +876,11 @@ "def tensor_to_rgb(img_tensor):\n", " \"\"\"Undo normalisation and convert [3,H,W] tensor to [H,W,3] RGB numpy.\"\"\"\n", " img_np = img_tensor.detach().cpu().permute(1, 2, 0).numpy()\n", - " img_np = (img_np * np.array([0.229, 0.224, 0.225]) +\n", - " np.array([0.485, 0.456, 0.406]))\n", + " img_np = img_np * np.array([0.229, 0.224, 0.225]) + np.array([0.485, 0.456, 0.406])\n", " img_np = np.clip(img_np, 0, 1)\n", " return img_np\n", "\n", + "\n", "def visualise_few_shot_example(encoder, train_dataset, test_dataset):\n", " encoder.eval()\n", " rng = np.random.default_rng()\n", @@ -925,7 +931,8 @@ " plt.tight_layout()\n", " plt.show()\n", "\n", - "visualise_few_shot_example(encoder, train_base, test_base)\n" + "\n", + "visualise_few_shot_example(encoder, train_base, test_base)" ] }, { @@ -968,8 +975,8 @@ " img_s, mask_s = train_dataset[si]\n", " support_imgs.append(img_s)\n", " support_masks.append(mask_s)\n", - " support_imgs = torch.stack(support_imgs, dim=0) # [K,3,H,W]\n", - " support_masks = torch.stack(support_masks, dim=0) # [K,1,H,W]\n", + " support_imgs = torch.stack(support_imgs, dim=0) # [K,3,H,W]\n", + " support_masks = torch.stack(support_masks, dim=0) # [K,1,H,W]\n", "\n", " # prediction\n", " logits = k_shot_predict(encoder, support_imgs, support_masks, img_q)\n", @@ -1016,8 +1023,9 @@ " plt.tight_layout()\n", " plt.show()\n", "\n", + "\n", "# Example 5-shot visualisation\n", - "visualise_kshot_example(encoder, train_base, test_base, K=5)\n" + "visualise_kshot_example(encoder, train_base, test_base, K=5)" ] }, { @@ -1054,8 +1062,7 @@ "ious_10shot = evaluate_kshot_iou(encoder, train_base, test_base, K=10, num_samples=None)\n", "\n", "# 20-shot\n", - "ious_20shot = evaluate_kshot_iou(encoder, train_base, test_base, K=20, num_samples=None)\n", - "\n" + "ious_20shot = evaluate_kshot_iou(encoder, train_base, test_base, K=20, num_samples=None)" ] }, { diff --git a/presentation/dimmery.scss b/presentation/dimmery.scss index 49c65c2..4957c26 100644 --- a/presentation/dimmery.scss +++ b/presentation/dimmery.scss @@ -44,7 +44,7 @@ $presentation-h3-font-size: 1.3em; // Title slide styling .reveal .title-slide { text-align: center; - + h1.title { font-family: $font-family-sans-serif; color: $primary-color; @@ -53,7 +53,7 @@ $presentation-h3-font-size: 1.3em; margin-bottom: 0.5em; text-shadow: 2px 2px 4px rgba(0,0,0,0.1); } - + .subtitle { font-family: $font-family-sans-serif; color: $theme-dark-gray; @@ -95,12 +95,12 @@ $presentation-h3-font-size: 1.3em; margin-left: 1em; margin-top: 0em; margin-bottom: 0em; - + li { margin-bottom: 0em; line-height: 1.2; } - + ul, ol { margin-top: 0em; margin-bottom: 0em; @@ -116,7 +116,7 @@ $presentation-h3-font-size: 1.3em; .reveal ul { list-style-type: none; - + li::before { content: "›"; color: inherit; @@ -125,17 +125,17 @@ $presentation-h3-font-size: 1.3em; margin-left: -1em; position: absolute; } - + ul { list-style-type: none; - + li::before { content: "»"; } - + ul { list-style-type: none; - + li::before { content: "⋙"; } @@ -147,7 +147,7 @@ $presentation-h3-font-size: 1.3em; .reveal table { border-collapse: collapse; margin: 1em auto; - + th { background-color: $secondary-color; color: white; @@ -155,12 +155,12 @@ $presentation-h3-font-size: 1.3em; padding: 0.8em 1em; border: 1px solid darken($secondary-color, 10%); } - + td { padding: 0.6em 1em; border: 1px solid $theme-light-gray; } - + tr:nth-child(even) { background-color: $theme-light-gray; } @@ -172,7 +172,7 @@ $presentation-h3-font-size: 1.3em; border: 1px solid $theme-medium-gray; border-radius: 4px; padding: 1em; - + code { background-color: transparent; color: $text-color; @@ -245,7 +245,7 @@ $presentation-h3-font-size: 1.3em; // Custom classes for specific content .course-schedule { list-style-type: decimal; - + li { position: relative; padding-left: 0; @@ -270,7 +270,7 @@ $presentation-h3-font-size: 1.3em; .reveal a { color: $accent-color; text-decoration: underline; - + &:hover { color: darken($accent-color, 15%); } @@ -291,4 +291,78 @@ $presentation-h3-font-size: 1.3em; .flushleft { text-align: left; -} \ No newline at end of file +} + +// ------------------------------------------------- +// Custom two-column layouts: 60/40 and 80/20 +// ------------------------------------------------- + +// Avoid content (esp. images) being cut off by slide viewport +.reveal .slides section { + overflow: visible; +} + +// Shared base for both layouts +.reveal .two-col-60-40, +.reveal .two-col-80-20 { + display: flex; + gap: 1.5rem; + align-items: flex-start; +} + +// 60/40 layout: more text, less illustration +.reveal .two-col-60-40 > .col-left { + flex: 0 0 60%; + min-width: 0; +} + +.reveal .two-col-60-40 > .col-right { + flex: 0 0 40%; + min-width: 0; +} + +// 80/20 layout: almost full-width text, small illustration +.reveal .two-col-80-20 > .col-left { + flex: 0 0 80%; + min-width: 0; +} + +.reveal .two-col-80-20 > .col-right { + flex: 0 0 20%; + min-width: 0; +} + +// Make sure illustrations scale nicely and are not cut off +.reveal .two-col-60-40 img, +.reveal .two-col-80-20 img { + max-width: 100%; + height: auto; + display: block; +} + +// On narrow screens, stack columns +@media (max-width: 900px) { + .reveal .two-col-60-40, + .reveal .two-col-80-20 { + flex-direction: column; + } + + .reveal .two-col-60-40 > .col-left, + .reveal .two-col-60-40 > .col-right, + .reveal .two-col-80-20 > .col-left, + .reveal .two-col-80-20 > .col-right { + flex: 1 1 100%; + } +} + +/* Make references tiny */ +.refs-super-small { + font-size: 0.5em !important; + line-height: 1.1em !important; +} + +/* Reduce bullet spacing further */ +.refs-super-small li { + margin-bottom: 0.15em !important; +} + diff --git a/presentation/figures/geneva-map-gif.gif b/presentation/figures/geneva-map-gif.gif new file mode 100644 index 0000000..25039b0 Binary files /dev/null and b/presentation/figures/geneva-map-gif.gif differ diff --git a/presentation/figures/geneva_image_raw.png b/presentation/figures/geneva_image_raw.png new file mode 100644 index 0000000..6dbb81a Binary files /dev/null and b/presentation/figures/geneva_image_raw.png differ diff --git a/presentation/figures/geneva_masks.png b/presentation/figures/geneva_masks.png new file mode 100644 index 0000000..07c70d9 Binary files /dev/null and b/presentation/figures/geneva_masks.png differ diff --git a/presentation/figures/geneva_outline.png b/presentation/figures/geneva_outline.png new file mode 100644 index 0000000..2fa6c90 Binary files /dev/null and b/presentation/figures/geneva_outline.png differ diff --git a/presentation/figures/geneva_overlay.png b/presentation/figures/geneva_overlay.png new file mode 100644 index 0000000..a09b024 Binary files /dev/null and b/presentation/figures/geneva_overlay.png differ diff --git a/presentation/figures/grids_animation.gif b/presentation/figures/grids_animation.gif new file mode 100644 index 0000000..ed9e0fb Binary files /dev/null and b/presentation/figures/grids_animation.gif differ diff --git a/presentation/figures/illustration_prototypical_network.png b/presentation/figures/illustration_prototypical_network.png new file mode 100644 index 0000000..fb999c9 Binary files /dev/null and b/presentation/figures/illustration_prototypical_network.png differ diff --git a/presentation/figures/meta_training_loss.png b/presentation/figures/meta_training_loss.png new file mode 100644 index 0000000..ddbbe31 Binary files /dev/null and b/presentation/figures/meta_training_loss.png differ diff --git a/presentation/figures/picture_use_case.png b/presentation/figures/picture_use_case.png new file mode 100644 index 0000000..433bc4a Binary files /dev/null and b/presentation/figures/picture_use_case.png differ diff --git a/presentation/figures/predicted_mask.png b/presentation/figures/predicted_mask.png new file mode 100644 index 0000000..c49da5e Binary files /dev/null and b/presentation/figures/predicted_mask.png differ diff --git a/presentation/tutorial-new-tutorial-group-1.html b/presentation/tutorial-new-tutorial-group-1.html index b99dc39..d537d2d 100644 --- a/presentation/tutorial-new-tutorial-group-1.html +++ b/presentation/tutorial-new-tutorial-group-1.html @@ -441,7 +441,7 @@ - + Few-Shot Learning for Rooftop Detection in Satellite Imagery @@ -486,7 +486,7 @@ margin: 0 0.8em 0.2em -1em; vertical-align: middle; } - + \n\t\n\n\t\n\n\t\t
Loading speaker view...
\n\n\t\t
\n\t\t
Upcoming
\n\t\t
\n\t\t\t
\n\t\t\t\t

Time Click to Reset

\n\t\t\t\t
\n\t\t\t\t\t0:00 AM\n\t\t\t\t
\n\t\t\t\t
\n\t\t\t\t\t00:00:00\n\t\t\t\t
\n\t\t\t\t
\n\n\t\t\t\t

Pacing – Time to finish current slide

\n\t\t\t\t
\n\t\t\t\t\t00:00:00\n\t\t\t\t
\n\t\t\t
\n\n\t\t\t
\n\t\t\t\t

Notes

\n\t\t\t\t
\n\t\t\t
\n\t\t
\n\t\t
\n\t\t\t\n\t\t\t\n\t\t
\n\n\t\t @@ -2114,7 +2187,7 @@

Wrap-Up: ","content":""}],"openButton":true}, 'smaller': false, - + // Display controls in the bottom right corner controls: false, @@ -2300,7 +2373,7 @@

Wrap-Up: Wrap-Up: Wrap-Up: +![](figures/grids_animation.gif){width="50%"} + -- Challenges: small rooftops, shadows, label noise, class imbalance +
+Geneva Animation: raw image → overlay rooftop → binary mask +
-**Insert & dont forget**: visuals notebook data preprocessing etc -## Model & Methods -- Data Preprocessing +## Few Shot Learning in General -- Model Architecture +#### Few-Shot Learning (FSL) +- Learning new **tasks, labels, or segmentations** from very few labeled examples + *(N-way, K-shot)* -- Few-Shot in a Nutshell (modified figure from paper) +#### Few-Shot Semantic Segmentation (FSSS) +- **Goal**: Segment novel object classes using only a few annotated examples +- Assigning a class label to **every pixel** -- Few-Shot in implementation (ntoebook reference/ pseudocode for logic?) -- Training strategy +--- + +## Prototypical Networks (ProtNets) + +* Learn a shared **embedding space** via a backbone model +* Pixels belonging to the same class are **close in feature space** +* Class representations are formed as **prototypes** +* Training follows an **episodic framework** +* Each episode consists of: + - **Support set**: + Few images with **pixel-level masks** + Defines the target classes + - **Query image**: + Image where the model must segment the target classes + +## Prototypical Network Overview -- Loss function +#### Workflow +* Support Image → Prototype → Similarity → Query Segmentation -- Evaluation metrics -## Prototypical Networks +#### Feature Extraction +* **Backbone:** ResNet-18 CNN, pretrained on ImageNet +* **Projection:** feature maps → embedding dimension (256 channels) -**Insert modified figure here** -- high-level schematic (support → prototype → similarity → segmentation) +#### Evaluation Metric +$$ +\mathrm{IoU} = \frac{|A \cap B|}{|A \cup B|} +$$ + + +--- + +## Prototypical Network Overview + +![](figures/illustration_prototypical_network.png){width=100% fig-align="center"} + +
+Modified figure from (Ding et al. 2022) +
+ + +--- + +## (Preliminary) Results + +#### (1) Meta training loss + +The “avg episode loss” at each epoch is the average cross-entropy error over all support–query tasks in that epoch. The encoder is successfully learning a feature space where prototype-based segmentation works increasingly well. + +![](figures/meta_training_loss.png){width="50%" fig-align="center"} + + +--- -- literature reference: [SRPNet](https://arxiv.org/abs/2210.16829) +## (Preliminary) Results +#### (2) Predicted masks -## Main Notebook in Detail +With 5-shot learning, the predicted masks have a mean IoU over 102 test samples of 0.485. -**how deep should we go?** +Here an example: -lets discuss that regarding time +![](figures/predicted_mask.png){width=80% fig-align="center"} -(presentation should be 10 minutes, followed by 5 minutes of Q&A) +## Discussion +**Room for improvement:** -## Expected Results +- Fine-tune / tweak model parameters + - Add regularization + - Increase number of epochs -- Show performance for 1-shot / 5-shot / full-data comparison +- Implement rough approximation of solar potential + - e.g. based on IoU over roof area -- Show predicted masks -Open to Discuss: +**Open for discussion:** -- strengths +- Try a different encoder ? + - e.g. ResNet-50 -- weaknesses +- Change train / test split strategy ? + - e.g. random shuffle regardless of geographic regions -- failure cases (shadows, tiny rooftops) -## Wrap-Up: [GitHub Repo](https://github.com/hertie-data-science-lab/tutorial-new-tutorial-group-1/tree/main) +
+ + GitHub Repo + +
-insert more from discussion + memo here -**What we have so far**: +## References -- insert bullet point here +::: {.refs-super-small} -- insert bullet point here +- **Alsentzer, E., Li, M. M., Kobren, S. N., Noori, A., Undiagnosed Diseases Network, Kohane, I. S., & Zitnik, M.** (2025). Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases. *npj Digital Medicine, 8*(1), 380. https://doi.org/10.1038/s41746-025-01749-1 +- **Castello, R., Walch, A., Attias, R., Cadei, R., Jiang, S., & Scartezzini, J.-L.** (2021). Quantification of the suitable rooftop area for solar panel installation from overhead imagery using convolutional neural networks. *Journal of Physics: Conference Series, 2042*(1), 012002. https://doi.org/10.1088/1742-6596/2042/1/012002 -**What we still need to finalize**: +- **Chen, Y., Wei, C., Wang, D., Ji, C., & Li, B.** (2022). Semi-supervised contrastive learning for few-shot segmentation of remote sensing images. *Remote Sensing, 14*(17), 4254. https://doi.org/10.3390/rs14174254 -- insert bullet point here +- **Ding, H., Zhang, H., & Jiang, X.** (2022). Self-regularized prototypical network for few-shot semantic segmentation. *Pattern Recognition, 132*, 109018. https://doi.org/10.1016/j.patcog.2022.109018 -- insert bullet point here +- **Finn, C., Abbeel, P., & Levine, S.** (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In *International Conference on Machine Learning* (pp. 1126–1135). https://doi.org/10.48550/arXiv.1703.03400 +- **Ge, Z., Fan, X., Zhang, J., & Jin, S.** (2025). SegPPD-FS: Segmenting plant pests and diseases in the wild using few-shot learning. *Plant Phenomics*, 100121. https://doi.org/10.1016/j.plaphe.2025.100121 -**Questions to discuss in class/ lynn** +- **Hu, Y., Liu, C., Li, Z., Xu, J., Han, Z., & Guo, J.** (2022). Few-shot building footprint shape classification with relation network. *ISPRS International Journal of Geo-Information, 11*(5), 311. https://doi.org/10.3390/ijgi11050311 -- insert bullet point here +- **Jadon, S.** (2021). COVID-19 detection from scarce chest X-ray image data using few-shot deep learning. In *Medical Imaging 2021* (pp. 161–170). https://doi.org/10.1117/12.2581496 +- **Lee, G. Y., Dam, T., Ferdaus, M. M., Poenar, D. P., & Duong, V.** (2025). Enhancing Few-Shot Classification of Benchmark and Disaster Imagery with ATTBHFA-Net. *arXiv preprint* arXiv:2510.18326. https://doi.org/10.48550/arXiv.2510.18326 +- **Li, X., He, Z., Zhang, L., Guo, S., Hu, B., & Guo, K.** (2025). CDCNet: Cross-domain few-shot learning with adaptive representation enhancement. *Pattern Recognition, 162*, 111382. https://doi.org/10.1016/j.patcog.2025.111382 +- **Puthumanaillam, G., & Verma, U.** (2023). Texture based prototypical network for few-shot semantic segmentation of forest cover: Generalizing for different geographical regions. *Neurocomputing, 538*, 126201. https://doi.org/10.1016/j.neucom.2023.03.062 +- **Snell, J., Swersky, K., & Zemel, R.** (2017). Prototypical networks for few-shot learning. *Advances in Neural Information Processing Systems, 30*. https://doi.org/10.48550/arXiv.1703.05175 +- **Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M.** (2018). Learning to compare: Relation network for few-shot learning. In *CVPR* (pp. 1199–1208). https://doi.org/10.1109/CVPR.2018.00131 +:::