Skip to content

Commit f7641f6

Browse files
Update README with comprehensive results showcase
Add all 12 figures organized into 4 sections (EDA, Training, Evaluation, Classification Map) with descriptive titles, explanations, table of contents, pipeline diagram, and per-class pixel count table. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent ab7e85b commit f7641f6

1 file changed

Lines changed: 238 additions & 45 deletions

File tree

README.md

Lines changed: 238 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,255 @@
11
# Crop Classification with Graph Convolutional Networks (GCN)
22

3-
Pixel-level crop classification from Sentinel-2 satellite imagery using a Graph Convolutional Network built with PyTorch Geometric.
3+
Pixel-level crop classification from **Sentinel-2** satellite imagery using a **Graph Convolutional Network** built with PyTorch Geometric. The model classifies agricultural land into 5 crop/land-cover classes at 10 m spatial resolution.
4+
5+
---
6+
7+
## Table of Contents
8+
9+
- [Overview](#overview)
10+
- [Method](#method)
11+
- [Project Structure](#project-structure)
12+
- [Results](#results)
13+
- [Exploratory Data Analysis](#1-exploratory-data-analysis)
14+
- [Model Training](#2-model-training)
15+
- [Model Evaluation](#3-model-evaluation)
16+
- [Spatial Classification Map](#4-spatial-classification-map)
17+
- [Installation](#installation)
18+
- [Usage](#usage)
19+
- [Data](#data)
20+
- [License](#license)
21+
22+
---
423

524
## Overview
625

726
This project classifies agricultural land into **5 crop/land-cover classes** using 23 spectral and vegetation index features derived from Sentinel-2 imagery:
827

928
| Class | Description |
10-
|-------|-------------|
11-
| Cotton | Cotton fields |
12-
| Wheat | Wheat fields |
13-
| Fallow | Bare / fallow land |
14-
| Grass | Grassland / pasture |
15-
| Water | Water bodies |
29+
|:------|:------------|
30+
| **Cotton** | Cotton crop fields |
31+
| **Wheat** | Wheat crop fields |
32+
| **Fallow** | Bare / fallow agricultural land |
33+
| **Grass** | Grassland and pasture areas |
34+
| **Water** | Rivers, canals, and water bodies |
1635

1736
## Method
1837

19-
1. **Feature extraction** — 10 Sentinel-2 bands (B2–B12) + 13 spectral indices (NDVI, EVI, SAVI, etc.)
20-
2. **Graph construction** — K-nearest neighbor graph (k=8) in feature space
21-
3. **GCN training** — 3-layer GCN with batch normalization, dropout, and class-weighted loss
22-
4. **Raster inference** — Tiled KNN-graph prediction over the full Sentinel-2 composite
38+
```
39+
Sentinel-2 Image (24 bands)
40+
|
41+
v
42+
Feature Extraction (23 features: 10 spectral bands + 13 vegetation indices)
43+
|
44+
v
45+
KNN Graph Construction (k=8 neighbors in feature space)
46+
|
47+
v
48+
3-Layer GCN (128 hidden units, BatchNorm, Dropout, class-weighted loss)
49+
|
50+
v
51+
Tiled Raster Inference (512x512 pixel tiles)
52+
|
53+
v
54+
Classified Crop Map (GeoTIFF + PNG)
55+
```
56+
57+
1. **Feature extraction** -- 10 Sentinel-2 bands (B2-B12) + 13 spectral indices (NDVI, EVI, SAVI, etc.)
58+
2. **Graph construction** -- K-nearest neighbor graph (k=8) built in feature space to capture spectral similarity
59+
3. **GCN training** -- 3-layer GCN with batch normalization, dropout (0.5), and inverse-frequency class weighting
60+
4. **Raster inference** -- Tiled KNN-graph prediction over the full 2262x1424 Sentinel-2 composite
2361

2462
## Project Structure
2563

2664
```
27-
├── explore_data.py # EDA and feature visualization
28-
├── gcn_crop_classification.py # GCN model training and evaluation
29-
├── apply_gcn_to_raster.py # Apply trained model to full raster
30-
├── data/ # Input data (not tracked in git)
31-
│ ├── crop_training_data_5classes_2020.csv
32-
│ ├── S2_composite_24bands_2020_Q1.tif
33-
│ └── crop_classification_map.tif (output)
34-
└── figures/ # Generated plots and maps
35-
├── 01_class_distribution.png
36-
├── 02_correlation_heatmap.png
37-
├── gcn_training_curves.png
38-
├── gcn_confusion_matrix.png
39-
├── gcn_per_class_accuracy.png
40-
├── gcn_tsne_embeddings.png
41-
└── crop_classification_map.png
65+
.
66+
|-- explore_data.py # EDA and feature visualization
67+
|-- gcn_crop_classification.py # GCN model training and evaluation
68+
|-- apply_gcn_to_raster.py # Apply trained model to full raster
69+
|-- requirements.txt # Python dependencies
70+
|-- LICENSE # MIT License
71+
|-- data/ # Input data (not tracked in git)
72+
| |-- crop_training_data_5classes_2020.csv
73+
| |-- S2_composite_24bands_2020_Q1.tif
74+
| +-- crop_classification_map.tif (output)
75+
+-- figures/ # Generated plots and maps
76+
|-- 01_class_distribution.png
77+
|-- 02_correlation_heatmap.png
78+
|-- 03_bands_boxplot.png
79+
|-- 04_indices_boxplot.png
80+
|-- 05_key_indices_hist.png
81+
|-- 06_class_feature_profile.png
82+
|-- gcn_training_curves.png
83+
|-- gcn_confusion_matrix.png
84+
|-- gcn_confusion_matrix_norm.png
85+
|-- gcn_per_class_accuracy.png
86+
|-- gcn_tsne_embeddings.png
87+
+-- crop_classification_map.png
4288
```
4389

44-
## Requirements
90+
---
91+
92+
## Results
93+
94+
### 1. Exploratory Data Analysis
95+
96+
#### 1.1 Class Distribution
97+
98+
The training dataset contains ~24,000 labeled pixels across 5 classes. The distribution is imbalanced -- Fallow dominates at 45%, while Cotton (1.4%) and Water (0.6%) are minority classes. This imbalance is addressed during training using inverse-frequency class weighting.
99+
100+
<p align="center">
101+
<img src="figures/01_class_distribution.png" alt="Class Distribution" width="700">
102+
</p>
103+
104+
---
105+
106+
#### 1.2 Feature Correlation Matrix
107+
108+
The correlation heatmap reveals the relationships between all 23 spectral and index features. Strong positive correlations exist among vegetation indices (NDVI, EVI, SAVI, GNDVI) and among red-edge bands (B5-B7). Negative correlations appear between vegetation indices and bare-soil indicators (BSI, MNDWI), confirming their complementary roles for discrimination.
109+
110+
<p align="center">
111+
<img src="figures/02_correlation_heatmap.png" alt="Feature Correlation Matrix" width="700">
112+
</p>
113+
114+
---
115+
116+
#### 1.3 Spectral Band Distributions per Class
117+
118+
Box plots of the 10 Sentinel-2 spectral bands (B2-B12) and BSI broken down by crop class. Each class exhibits a distinct spectral signature -- Wheat shows consistently high reflectance in NIR bands (B7, B8), Water has low reflectance across all bands, and Fallow is characterized by high short-wave infrared (B11, B12) values relative to NIR.
119+
120+
<p align="center">
121+
<img src="figures/03_bands_boxplot.png" alt="Spectral Bands Distribution per Class" width="800">
122+
</p>
123+
124+
---
125+
126+
#### 1.4 Vegetation Index Distributions per Class
127+
128+
Box plots of the 13 derived vegetation indices per class. Wheat stands out with high NDVI, EVI, and SAVI values (active vegetation), while Fallow and Water cluster near zero or negative ranges. CIgreen and CIrededge provide strong separability between vegetated crops (Wheat, Grass) and non-vegetated surfaces (Fallow, Water).
129+
130+
<p align="center">
131+
<img src="figures/04_indices_boxplot.png" alt="Vegetation Index Distributions per Class" width="800">
132+
</p>
133+
134+
---
135+
136+
#### 1.5 Key Index Histograms by Class
137+
138+
Density histograms of 6 key indices (NDVI, EVI, NDWI, SAVI, MNDWI, BSI) overlaid by class. These reveal the degree of separability each index provides. NDVI and SAVI show clear bimodal patterns separating vegetated from non-vegetated classes. Water is distinctly separated by NDWI and MNDWI with values far from other classes.
139+
140+
<p align="center">
141+
<img src="figures/05_key_indices_hist.png" alt="Key Index Distributions by Class" width="800">
142+
</p>
143+
144+
---
145+
146+
#### 1.6 Normalized Per-Class Feature Profiles
147+
148+
A grouped bar chart showing the normalized mean value of every feature for each class. This "spectral fingerprint" view highlights how each class has a unique profile across the 23 features. Wheat dominates in vegetation-sensitive features (NDVI, EVI, SAVI), Fallow peaks in bare-soil indicators (BSI, B11), and Water shows near-zero values across most features except MNDWI.
149+
150+
<p align="center">
151+
<img src="figures/06_class_feature_profile.png" alt="Normalized Per-Class Feature Profiles" width="900">
152+
</p>
153+
154+
---
155+
156+
### 2. Model Training
157+
158+
#### 2.1 Training Loss and Validation Accuracy Curves
159+
160+
The training loss decreases rapidly in the first 10 epochs and converges near zero, indicating effective learning. Validation accuracy climbs from ~79% to ~99.9%, with early stopping triggered after epoch 55. The smooth convergence with no divergence between training and validation suggests the model generalizes well without overfitting.
45161

46-
- Python 3.9+
47-
- PyTorch + PyTorch Geometric
48-
- rasterio, scikit-learn, pandas, numpy, matplotlib, seaborn
162+
<p align="center">
163+
<img src="figures/gcn_training_curves.png" alt="Training Loss and Validation Accuracy" width="800">
164+
</p>
49165

50-
Install with conda (recommended):
166+
---
167+
168+
### 3. Model Evaluation
169+
170+
#### 3.1 Confusion Matrix (Test Set)
171+
172+
The confusion matrix on the held-out test set (15% of data) shows near-perfect classification. All 5 classes achieve close to 100% recall, with only 3 misclassified samples total (all from the Grass class: 1 predicted as Cotton, 2 as Wheat). Cotton (48 samples), Wheat (1149), Fallow (1619), and Water (21) are classified with zero errors.
173+
174+
<p align="center">
175+
<img src="figures/gcn_confusion_matrix.png" alt="Confusion Matrix" width="550">
176+
</p>
177+
178+
---
179+
180+
#### 3.2 Normalized Confusion Matrix (Test Set)
181+
182+
The row-normalized confusion matrix confirms all classes achieve >= 99.6% recall. Cotton, Wheat, Fallow, and Water reach a perfect 1.000, while Grass achieves 0.996. This demonstrates the GCN's ability to handle class imbalance effectively through weighted loss.
183+
184+
<p align="center">
185+
<img src="figures/gcn_confusion_matrix_norm.png" alt="Normalized Confusion Matrix" width="550">
186+
</p>
187+
188+
---
189+
190+
#### 3.3 Per-Class Accuracy
191+
192+
A bar chart summarizing per-class accuracy on the test set. All 5 classes exceed 99.6%, confirming consistently strong performance across both majority classes (Fallow, Wheat) and minority classes (Cotton, Water).
193+
194+
<p align="center">
195+
<img src="figures/gcn_per_class_accuracy.png" alt="Per-Class Accuracy" width="600">
196+
</p>
197+
198+
---
199+
200+
#### 3.4 t-SNE Visualization of GCN Node Embeddings
201+
202+
A 2D t-SNE projection of the learned 128-dimensional node embeddings from the GCN's second-to-last layer. The 5 classes form well-separated clusters, confirming that the GCN learns discriminative feature representations. Cotton (blue) and Water (purple) form tight, isolated clusters, while the larger classes (Fallow, Wheat, Grass) occupy distinct regions with clear boundaries.
203+
204+
<p align="center">
205+
<img src="figures/gcn_tsne_embeddings.png" alt="t-SNE of GCN Node Embeddings" width="600">
206+
</p>
207+
208+
---
209+
210+
### 4. Spatial Classification Map
211+
212+
The final classified crop map produced by applying the trained GCN to the full Sentinel-2 raster (2262 x 1424 pixels, 10 m resolution). Over 1 million valid pixels were classified using tiled KNN-graph inference. Fallow (tan) dominates bare agricultural areas, Wheat (yellow) and Grass (green) cover vegetated parcels, Water (blue) aligns with river and canal features, and Cotton (red) appears in scattered agricultural plots.
213+
214+
| Class | Classified Pixels | Percentage |
215+
|:------|------------------:|-----------:|
216+
| **Fallow** | 697,687 | 66.8% |
217+
| **Grass** | 163,758 | 15.7% |
218+
| **Wheat** | 141,385 | 13.5% |
219+
| **Cotton** | 29,329 | 2.8% |
220+
| **Water** | 11,696 | 1.1% |
221+
222+
<p align="center">
223+
<img src="figures/crop_classification_map.png" alt="GCN Crop Classification Map" width="900">
224+
</p>
225+
226+
---
227+
228+
## Installation
229+
230+
**Prerequisites:** Python 3.9+, CUDA-capable GPU (recommended)
51231

52232
```bash
233+
# Create conda environment (recommended)
53234
conda create -n geodl python=3.9
54235
conda activate geodl
55-
conda install pytorch torchvision -c pytorch
236+
237+
# Install PyTorch (adjust CUDA version as needed)
238+
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
239+
240+
# Install PyTorch Geometric
56241
conda install pyg -c pyg
242+
243+
# Install remaining dependencies
57244
conda install rasterio scikit-learn pandas matplotlib seaborn -c conda-forge
58245
```
59246

247+
Or install from `requirements.txt` (PyTorch and PyG must be installed separately):
248+
249+
```bash
250+
pip install -r requirements.txt
251+
```
252+
60253
## Usage
61254

62255
### 1. Explore data
@@ -65,37 +258,37 @@ conda install rasterio scikit-learn pandas matplotlib seaborn -c conda-forge
65258
python explore_data.py
66259
```
67260

261+
Generates EDA visualizations in `figures/`.
262+
68263
### 2. Train the GCN
69264

70265
```bash
71266
python gcn_crop_classification.py
72267
```
73268

74-
Saves `best_gcn_model.pth` and evaluation plots to `figures/`.
269+
Trains the model with early stopping and saves `best_gcn_model.pth` along with evaluation plots.
75270

76271
### 3. Classify full raster
77272

78273
```bash
79274
python apply_gcn_to_raster.py
80275
```
81276

82-
Produces `data/crop_classification_map.tif` (GeoTIFF) and `figures/crop_classification_map.png`.
277+
Applies the trained GCN to the Sentinel-2 composite and produces:
278+
- `data/crop_classification_map.tif` -- Classified GeoTIFF (same CRS/transform as input)
279+
- `figures/crop_classification_map.png` -- Color-coded visualization
83280

84281
## Data
85282

86-
Training data is derived from Sentinel-2 imagery (2020 Q1) over an agricultural region (EPSG:32636, 10m resolution). The 24-band composite includes:
87-
88-
- **Spectral bands**: B2, B3, B4, B5, B6, B7, B8, B8A, B11, B12
89-
- **Vegetation indices**: NDVI, EVI, SAVI, GNDVI, NDRE, NDRE2, NDWI, MNDWI, BSI, NDTI, CIgreen, CIrededge, MSAVI, GCVI
90-
91-
GCVI is dropped during training (duplicate of CIgreen), leaving 23 features.
92-
93-
## Results
283+
Training data is derived from Sentinel-2 imagery (2020 Q1) over an agricultural region (EPSG:32636, 10 m resolution). The 24-band composite includes:
94284

95-
The classification map produced by the GCN:
285+
| Category | Features |
286+
|:---------|:---------|
287+
| **Spectral bands** | B2, B3, B4, B5, B6, B7, B8, B8A, B11, B12 |
288+
| **Vegetation indices** | NDVI, EVI, SAVI, GNDVI, NDRE, NDRE2, NDWI, MNDWI, BSI, NDTI, CIgreen, CIrededge, MSAVI, GCVI |
96289

97-
![Crop Classification Map](figures/crop_classification_map.png)
290+
> GCVI is dropped during training (duplicate of CIgreen), leaving **23 features**.
98291
99292
## License
100293

101-
MIT
294+
This project is licensed under the [MIT License](LICENSE).

0 commit comments

Comments
 (0)