|
29 | 29 | "* [Software Requirements](#software-requirements)\n", |
30 | 30 | "* [Data Description](#data-description)\n", |
31 | 31 | "* [Methodology](#methodology)\n", |
32 | | - "* [Results & Discussion](#results-and-discussion)\n", |
| 32 | + "* [Results](#results)\n", |
| 33 | + "* [Discussion & Limitations](#discussion-limitations)\n", |
33 | 34 | "* [⭐ Challenge ⭐](#challenge)\n", |
| 35 | + "* [Further Ressources](#further-ressources)\n", |
34 | 36 | "* [References](#references)" |
35 | 37 | ] |
36 | 38 | }, |
|
1436 | 1438 | "cell_type": "markdown", |
1437 | 1439 | "metadata": {}, |
1438 | 1440 | "source": [ |
1439 | | - "<a name=\"discussion\"></a>\n", |
1440 | | - "## Discussion\n", |
| 1441 | + "In this tutorial, we evaluate whether a **Prototypical Network** can learn to segment rooftops using only a few labeled examples per task. Because this is a **few-shot learning** setting, traditional large-scale training metrics are not directly applicable. Instead, we monitor two key indicators.\n", |
1441 | 1442 | "\n", |
1442 | | - "tbd" |
| 1443 | + "### **1. Meta-Training Loss**\n", |
| 1444 | + "\n", |
| 1445 | + "During training, the model optimizes over **episodes**, each composed of:\n", |
| 1446 | + "\n", |
| 1447 | + "- a small **support set** \n", |
| 1448 | + "- a **query image**\n", |
| 1449 | + "\n", |
| 1450 | + "For each epoch, we compute the **average episode loss**, defined as the mean cross-entropy error across all sampled support–query tasks. \n", |
| 1451 | + "Across epochs, this loss **decreases steadily**, indicating that:\n", |
| 1452 | + "\n", |
| 1453 | + "- the encoder learns a progressively more meaningful **feature representation** \n", |
| 1454 | + "- rooftop vs. non-rooftop pixels become **more separable** in feature space \n", |
| 1455 | + "- prototype-based segmentation improves throughout meta-training \n", |
| 1456 | + "\n", |
| 1457 | + "Even with very limited supervision, the model internalizes **rooftop characteristics** and reduces prediction errors effectively.\n", |
| 1458 | + "\n", |
| 1459 | + "### **2. Predicted Masks & Quantitative Performance**\n", |
| 1460 | + "\n", |
| 1461 | + "To evaluate the model, we use **5-shot segmentation**: \n", |
| 1462 | + "five labeled support examples define the rooftop prototype for each test episode.\n", |
| 1463 | + "\n", |
| 1464 | + "On a set of **102 test tiles** drawn from a geographically distinct region of Geneva, the model achieves:\n", |
| 1465 | + "\n", |
| 1466 | + "➡️ Mean IoU: ~0.49\n", |
| 1467 | + "\n", |
| 1468 | + "While modest, this result is encouraging given:\n", |
| 1469 | + "\n", |
| 1470 | + "- the strong label constraints \n", |
| 1471 | + "- the complexity of urban rooftop structures \n", |
| 1472 | + "- the geographic domain shift between training and testing \n", |
| 1473 | + "\n", |
| 1474 | + "A qualitative example highlights this behavior:\n", |
| 1475 | + "\n", |
| 1476 | + "- Support images & masks define the relevant rooftop characteristics \n", |
| 1477 | + "- On the query image, the predicted mask captures **major rooftop shapes** \n", |
| 1478 | + "- Large rooftop surfaces are generally identified \n", |
| 1479 | + "- Fine-grained details remain imperfect, but the model generalizes to **textures and geometries not seen during training**\n", |
| 1480 | + "\n", |
| 1481 | + "These results suggest that the Prototypical Network successfully learns a **useful and transferable feature embedding**. \n", |
| 1482 | + "With only minimal additional supervision, it is able to extend rooftop segmentation knowledge to **new geographic areas**—demonstrating the strength of metric-based few-shot learning for remote sensing tasks." |
| 1483 | + ] |
| 1484 | + }, |
| 1485 | + { |
| 1486 | + "cell_type": "markdown", |
| 1487 | + "metadata": {}, |
| 1488 | + "source": [ |
| 1489 | + "---" |
1443 | 1490 | ] |
1444 | 1491 | }, |
1445 | 1492 | { |
1446 | 1493 | "cell_type": "markdown", |
1447 | 1494 | "metadata": {}, |
1448 | 1495 | "source": [ |
1449 | | - "### Limitations\n", |
| 1496 | + "<a name=\"discussion-limitations\"></a>\n", |
| 1497 | + "\n", |
| 1498 | + "## **Discussion**\n", |
| 1499 | + "\n", |
| 1500 | + "The results show that Prototypical Networks can learn meaningful rooftop representations even with very limited supervision. However, there remains substantial room to improve performance and explore alternative design choices within the few-shot learning setup. Several aspects of the training procedure could be refined to enhance segmentation accuracy:\n", |
| 1501 | + "\n", |
| 1502 | + "### **Model Tuning and Regularization**\n", |
| 1503 | + "\n", |
| 1504 | + "Incorporating techniques such as **weight decay**, **dropout**, **data augmentation**, or **early stopping** could stabilize feature learning and reduce overfitting to the small support sets typically used in few-shot learning.\n", |
| 1505 | + "\n", |
| 1506 | + "### **Training for More Epochs**\n", |
| 1507 | + "\n", |
| 1508 | + "For demonstration purposes, the model was trained for only a limited number of epochs. \n", |
| 1509 | + "Extending training duration or increasing the number of sampled episodes per epoch could help the encoder converge toward a **more discriminative embedding space**, potentially improving segmentation performance.\n", |
| 1510 | + "\n", |
| 1511 | + "### **Extending the Task Toward Policy Relevance**\n", |
| 1512 | + "\n", |
| 1513 | + "A rough solar potential approximation could be built on top of the segmentation task. \n", |
| 1514 | + "For example:\n", |
| 1515 | + "\n", |
| 1516 | + "- predicted rooftop area \n", |
| 1517 | + "- combined with IoU-based uncertainty estimates \n", |
1450 | 1518 | "\n", |
1451 | | - "tbd" |
| 1519 | + "could provide a **first-order indicator of solar suitability**, connecting the model’s outputs to real-world energy planning applications.\n", |
| 1520 | + "\n", |
| 1521 | + "### **Trying Different Encoder Backbones**\n", |
| 1522 | + "\n", |
| 1523 | + "The current prototype uses a lightweight CNN encoder for simplicity. \n", |
| 1524 | + "Replacing it with stronger architectures—such as **ResNet-50** or a **Vision Transformer (ViT)**—may yield more robust and generalizable feature representations, though at increased computational cost.\n", |
| 1525 | + "\n", |
| 1526 | + "### **Train/Test Split Strategy**\n", |
| 1527 | + "\n", |
| 1528 | + "To simulate domain shift across neighborhoods, the dataset was split by geographic region. \n", |
| 1529 | + "An alternative would be a **random shuffle across all tiles**, which usually produces higher accuracy but does **not** evaluate generalization to new geographic areas.\n", |
| 1530 | + "\n", |
| 1531 | + "Exploring both strategies would highlight the trade-offs between:\n", |
| 1532 | + "\n", |
| 1533 | + "- benchmark performance \n", |
| 1534 | + "- and real-world deployment robustness under geographic variation.\n", |
| 1535 | + "\n", |
| 1536 | + "\n", |
| 1537 | + "---\n", |
| 1538 | + "\n", |
| 1539 | + "\n", |
| 1540 | + "## Limitations\n", |
| 1541 | + "\n", |
| 1542 | + "While this tutorial successfully demonstrates the core ideas behind few-shot segmentation with **Prototypical Networks**, several important simplifications limit its applicability. Many of these choices were made intentionally to ensure the tutorial remains computationally lightweight and easy to reproduce.\n", |
| 1543 | + "\n", |
| 1544 | + "### **Simplified Experimental Setup**\n", |
| 1545 | + "\n", |
| 1546 | + "To keep the workflow accessible, we used:\n", |
| 1547 | + "\n", |
| 1548 | + "- a **very small training set**, both in the number of tiles and support examples per episode \n", |
| 1549 | + "- a **lightweight encoder**, rather than higher-capacity backbones common in remote sensing (e.g., ResNet-50, Swin Transformer) \n", |
| 1550 | + "- a **short training schedule**, with few epochs and limited episode sampling \n", |
| 1551 | + "\n", |
| 1552 | + "These design choices improve reproducibility but also **restrict the achievable segmentation performance**. \n", |
| 1553 | + "In practical applications—such as large-scale rooftop or solar mapping—substantially **more data**, **stronger feature extractors**, and **longer training** would be necessary.\n", |
| 1554 | + "\n", |
| 1555 | + "### **Modelling Choices Intentionally Kept Simple**\n", |
| 1556 | + "\n", |
| 1557 | + "Several simplifications reduce the robustness of the resulting predictions:\n", |
| 1558 | + "\n", |
| 1559 | + "- **Binary segmentation** (roof vs. non-roof) ignores roof type, material, shadows, and occlusions—all of which matter for accurate solar potential estimation. \n", |
| 1560 | + "- **No post-processing** was applied (e.g., morphological filters, CRFs), even though such steps typically improve mask quality. \n", |
| 1561 | + "- **No uncertainty estimation** was included, despite being crucial for planning and policy-relevant applications.\n", |
| 1562 | + "\n", |
| 1563 | + "These omissions help focus on core concepts but limit real-world applicability.\n", |
| 1564 | + "\n", |
| 1565 | + "### **Dataset Biases and Generalization Limits**\n", |
| 1566 | + "\n", |
| 1567 | + "The dataset itself introduces structural biases:\n", |
| 1568 | + "\n", |
| 1569 | + "- Imagery is taken exclusively from **Geneva**, a wealthy European city with relatively homogeneous architectural styles. \n", |
| 1570 | + "- Rooftop morphology varies globally—informal housing, climate-adapted roof shapes, and diverse materials are **not represented** here. \n", |
| 1571 | + "- The geographic split (North–Center–South) creates a **stylized domain shift**, but does not reflect true global variation.\n", |
| 1572 | + "\n", |
| 1573 | + "If applied uncritically in policy contexts, these limitations could **reinforce geographic inequities**—for example, overestimating solar potential in well-represented neighborhoods and underestimating it in underrepresented ones.\n", |
| 1574 | + "\n", |
| 1575 | + "### **Addressing These Challenges**\n", |
| 1576 | + "\n", |
| 1577 | + "To improve real-world deployment, data analysts and practitioners should consider:\n", |
| 1578 | + "\n", |
| 1579 | + "- **Expanding dataset diversity** (more cities, varied roof types, lighting conditions, seasons). \n", |
| 1580 | + "- **Evaluating fairness and generalization** across socioeconomic and geographic groups. \n", |
| 1581 | + "- **Incorporating uncertainty estimation**, especially when predictions support infrastructure or planning decisions. \n", |
| 1582 | + "- **Validating model outputs** with domain experts (urban planners, energy modelers, local authorities).\n", |
| 1583 | + "\n", |
| 1584 | + "By recognizing these limitations, we can better understand the conditions under which few-shot rooftop segmentation performs well—and the steps required to make such models reliable for operational or policy-driven use.\n" |
| 1585 | + ] |
| 1586 | + }, |
| 1587 | + { |
| 1588 | + "cell_type": "markdown", |
| 1589 | + "metadata": {}, |
| 1590 | + "source": [ |
| 1591 | + "---" |
1452 | 1592 | ] |
1453 | 1593 | }, |
1454 | 1594 | { |
|
1493 | 1633 | "cell_type": "markdown", |
1494 | 1634 | "metadata": {}, |
1495 | 1635 | "source": [ |
1496 | | - "<a name=\"Further Ressources\"></a>\n", |
| 1636 | + "---" |
| 1637 | + ] |
| 1638 | + }, |
| 1639 | + { |
| 1640 | + "cell_type": "markdown", |
| 1641 | + "metadata": {}, |
| 1642 | + "source": [ |
| 1643 | + "<a name=\"further-ressources\"></a>\n", |
1497 | 1644 | "## Further Ressources\n", |
1498 | 1645 | "\n", |
1499 | 1646 | "### **Foundational Papers**\n", |
|
1536 | 1683 | " https://github.com/facebookresearch/segment-anything\n" |
1537 | 1684 | ] |
1538 | 1685 | }, |
| 1686 | + { |
| 1687 | + "cell_type": "markdown", |
| 1688 | + "metadata": {}, |
| 1689 | + "source": [ |
| 1690 | + "---" |
| 1691 | + ] |
| 1692 | + }, |
1539 | 1693 | { |
1540 | 1694 | "cell_type": "markdown", |
1541 | 1695 | "metadata": {}, |
|
0 commit comments