You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add all 12 figures organized into 4 sections (EDA, Training,
Evaluation, Classification Map) with descriptive titles,
explanations, table of contents, pipeline diagram, and
per-class pixel count table.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
# Crop Classification with Graph Convolutional Networks (GCN)
2
2
3
-
Pixel-level crop classification from Sentinel-2 satellite imagery using a Graph Convolutional Network built with PyTorch Geometric.
3
+
Pixel-level crop classification from **Sentinel-2** satellite imagery using a **Graph Convolutional Network** built with PyTorch Geometric. The model classifies agricultural land into 5 crop/land-cover classes at 10 m spatial resolution.
4
+
5
+
---
6
+
7
+
## Table of Contents
8
+
9
+
-[Overview](#overview)
10
+
-[Method](#method)
11
+
-[Project Structure](#project-structure)
12
+
-[Results](#results)
13
+
-[Exploratory Data Analysis](#1-exploratory-data-analysis)
This project classifies agricultural land into **5 crop/land-cover classes** using 23 spectral and vegetation index features derived from Sentinel-2 imagery:
2.**Graph construction** -- K-nearest neighbor graph (k=8) built in feature space to capture spectral similarity
59
+
3.**GCN training** -- 3-layer GCN with batch normalization, dropout (0.5), and inverse-frequency class weighting
60
+
4.**Raster inference** -- Tiled KNN-graph prediction over the full 2262x1424 Sentinel-2 composite
23
61
24
62
## Project Structure
25
63
26
64
```
27
-
├── explore_data.py # EDA and feature visualization
28
-
├── gcn_crop_classification.py # GCN model training and evaluation
29
-
├── apply_gcn_to_raster.py # Apply trained model to full raster
30
-
├── data/ # Input data (not tracked in git)
31
-
│ ├── crop_training_data_5classes_2020.csv
32
-
│ ├── S2_composite_24bands_2020_Q1.tif
33
-
│ └── crop_classification_map.tif (output)
34
-
└── figures/ # Generated plots and maps
35
-
├── 01_class_distribution.png
36
-
├── 02_correlation_heatmap.png
37
-
├── gcn_training_curves.png
38
-
├── gcn_confusion_matrix.png
39
-
├── gcn_per_class_accuracy.png
40
-
├── gcn_tsne_embeddings.png
41
-
└── crop_classification_map.png
65
+
.
66
+
|-- explore_data.py # EDA and feature visualization
67
+
|-- gcn_crop_classification.py # GCN model training and evaluation
68
+
|-- apply_gcn_to_raster.py # Apply trained model to full raster
69
+
|-- requirements.txt # Python dependencies
70
+
|-- LICENSE # MIT License
71
+
|-- data/ # Input data (not tracked in git)
72
+
| |-- crop_training_data_5classes_2020.csv
73
+
| |-- S2_composite_24bands_2020_Q1.tif
74
+
| +-- crop_classification_map.tif (output)
75
+
+-- figures/ # Generated plots and maps
76
+
|-- 01_class_distribution.png
77
+
|-- 02_correlation_heatmap.png
78
+
|-- 03_bands_boxplot.png
79
+
|-- 04_indices_boxplot.png
80
+
|-- 05_key_indices_hist.png
81
+
|-- 06_class_feature_profile.png
82
+
|-- gcn_training_curves.png
83
+
|-- gcn_confusion_matrix.png
84
+
|-- gcn_confusion_matrix_norm.png
85
+
|-- gcn_per_class_accuracy.png
86
+
|-- gcn_tsne_embeddings.png
87
+
+-- crop_classification_map.png
42
88
```
43
89
44
-
## Requirements
90
+
---
91
+
92
+
## Results
93
+
94
+
### 1. Exploratory Data Analysis
95
+
96
+
#### 1.1 Class Distribution
97
+
98
+
The training dataset contains ~24,000 labeled pixels across 5 classes. The distribution is imbalanced -- Fallow dominates at 45%, while Cotton (1.4%) and Water (0.6%) are minority classes. This imbalance is addressed during training using inverse-frequency class weighting.
The correlation heatmap reveals the relationships between all 23 spectral and index features. Strong positive correlations exist among vegetation indices (NDVI, EVI, SAVI, GNDVI) and among red-edge bands (B5-B7). Negative correlations appear between vegetation indices and bare-soil indicators (BSI, MNDWI), confirming their complementary roles for discrimination.
Box plots of the 10 Sentinel-2 spectral bands (B2-B12) and BSI broken down by crop class. Each class exhibits a distinct spectral signature -- Wheat shows consistently high reflectance in NIR bands (B7, B8), Water has low reflectance across all bands, and Fallow is characterized by high short-wave infrared (B11, B12) values relative to NIR.
119
+
120
+
<palign="center">
121
+
<imgsrc="figures/03_bands_boxplot.png"alt="Spectral Bands Distribution per Class"width="800">
122
+
</p>
123
+
124
+
---
125
+
126
+
#### 1.4 Vegetation Index Distributions per Class
127
+
128
+
Box plots of the 13 derived vegetation indices per class. Wheat stands out with high NDVI, EVI, and SAVI values (active vegetation), while Fallow and Water cluster near zero or negative ranges. CIgreen and CIrededge provide strong separability between vegetated crops (Wheat, Grass) and non-vegetated surfaces (Fallow, Water).
129
+
130
+
<palign="center">
131
+
<imgsrc="figures/04_indices_boxplot.png"alt="Vegetation Index Distributions per Class"width="800">
132
+
</p>
133
+
134
+
---
135
+
136
+
#### 1.5 Key Index Histograms by Class
137
+
138
+
Density histograms of 6 key indices (NDVI, EVI, NDWI, SAVI, MNDWI, BSI) overlaid by class. These reveal the degree of separability each index provides. NDVI and SAVI show clear bimodal patterns separating vegetated from non-vegetated classes. Water is distinctly separated by NDWI and MNDWI with values far from other classes.
139
+
140
+
<palign="center">
141
+
<imgsrc="figures/05_key_indices_hist.png"alt="Key Index Distributions by Class"width="800">
142
+
</p>
143
+
144
+
---
145
+
146
+
#### 1.6 Normalized Per-Class Feature Profiles
147
+
148
+
A grouped bar chart showing the normalized mean value of every feature for each class. This "spectral fingerprint" view highlights how each class has a unique profile across the 23 features. Wheat dominates in vegetation-sensitive features (NDVI, EVI, SAVI), Fallow peaks in bare-soil indicators (BSI, B11), and Water shows near-zero values across most features except MNDWI.
#### 2.1 Training Loss and Validation Accuracy Curves
159
+
160
+
The training loss decreases rapidly in the first 10 epochs and converges near zero, indicating effective learning. Validation accuracy climbs from ~79% to ~99.9%, with early stopping triggered after epoch 55. The smooth convergence with no divergence between training and validation suggests the model generalizes well without overfitting.
<imgsrc="figures/gcn_training_curves.png"alt="Training Loss and Validation Accuracy"width="800">
164
+
</p>
49
165
50
-
Install with conda (recommended):
166
+
---
167
+
168
+
### 3. Model Evaluation
169
+
170
+
#### 3.1 Confusion Matrix (Test Set)
171
+
172
+
The confusion matrix on the held-out test set (15% of data) shows near-perfect classification. All 5 classes achieve close to 100% recall, with only 3 misclassified samples total (all from the Grass class: 1 predicted as Cotton, 2 as Wheat). Cotton (48 samples), Wheat (1149), Fallow (1619), and Water (21) are classified with zero errors.
The row-normalized confusion matrix confirms all classes achieve >= 99.6% recall. Cotton, Wheat, Fallow, and Water reach a perfect 1.000, while Grass achieves 0.996. This demonstrates the GCN's ability to handle class imbalance effectively through weighted loss.
A bar chart summarizing per-class accuracy on the test set. All 5 classes exceed 99.6%, confirming consistently strong performance across both majority classes (Fallow, Wheat) and minority classes (Cotton, Water).
#### 3.4 t-SNE Visualization of GCN Node Embeddings
201
+
202
+
A 2D t-SNE projection of the learned 128-dimensional node embeddings from the GCN's second-to-last layer. The 5 classes form well-separated clusters, confirming that the GCN learns discriminative feature representations. Cotton (blue) and Water (purple) form tight, isolated clusters, while the larger classes (Fallow, Wheat, Grass) occupy distinct regions with clear boundaries.
203
+
204
+
<palign="center">
205
+
<imgsrc="figures/gcn_tsne_embeddings.png"alt="t-SNE of GCN Node Embeddings"width="600">
206
+
</p>
207
+
208
+
---
209
+
210
+
### 4. Spatial Classification Map
211
+
212
+
The final classified crop map produced by applying the trained GCN to the full Sentinel-2 raster (2262 x 1424 pixels, 10 m resolution). Over 1 million valid pixels were classified using tiled KNN-graph inference. Fallow (tan) dominates bare agricultural areas, Wheat (yellow) and Grass (green) cover vegetated parcels, Water (blue) aligns with river and canal features, and Cotton (red) appears in scattered agricultural plots.
0 commit comments