@@ -17,6 +17,10 @@ Pixel-level crop classification from **Sentinel-2** satellite imagery using a **
1717- [ Installation] ( #installation )
1818- [ Usage] ( #usage )
1919- [ Data] ( #data )
20+ - [ Study Area] ( #study-area )
21+ - [ Sentinel-2 Raster Composite] ( #sentinel-2-raster-composite )
22+ - [ Training Dataset] ( #training-dataset )
23+ - [ Preprocessing Pipeline] ( #preprocessing-pipeline )
2024- [ License] ( #license )
2125
2226---
@@ -280,14 +284,169 @@ Applies the trained GCN to the Sentinel-2 composite and produces:
280284
281285## Data
282286
283- Training data is derived from Sentinel-2 imagery (2020 Q1) over an agricultural region (EPSG:32636, 10 m resolution). The 24-band composite includes:
287+ ### Study Area
284288
285- | Category | Features |
286- | :---------| :---------|
287- | ** Spectral bands** | B2, B3, B4, B5, B6, B7, B8, B8A, B11, B12 |
288- | ** Vegetation indices** | NDVI, EVI, SAVI, GNDVI, NDRE, NDRE2, NDWI, MNDWI, BSI, NDTI, CIgreen, CIrededge, MSAVI, GCVI |
289+ The study area is located in the ** Gezira agricultural region, Sudan** -- one of the largest irrigated schemes in Africa, situated between the Blue Nile and White Nile rivers.
289290
290- > GCVI is dropped during training (duplicate of CIgreen), leaving ** 23 features** .
291+ | Property | Value |
292+ | :---------| :------|
293+ | ** Location** | Gezira State, Sudan |
294+ | ** Center coordinates** | 14.053° N, 32.623° E |
295+ | ** Coordinate system** | WGS 84 / UTM Zone 36N (EPSG:32636) |
296+ | ** Spatial extent** | 22.62 km x 14.24 km (322 km²) |
297+ | ** Temporal period** | Q1 2020 (January -- March) |
298+ | ** Satellite** | Sentinel-2 (ESA Copernicus) |
299+
300+ ---
301+
302+ ### Sentinel-2 Raster Composite
303+
304+ The input raster is a multi-temporal composite derived from Sentinel-2 Level-2A (surface reflectance) imagery.
305+
306+ ` S2_composite_24bands_2020_Q1.tif `
307+
308+ | Property | Value |
309+ | :---------| :------|
310+ | ** Dimensions** | 2,262 x 1,424 pixels |
311+ | ** Total pixels** | 3,221,088 |
312+ | ** Valid pixels** | 1,043,855 (32.4%) |
313+ | ** Spatial resolution** | 10 m |
314+ | ** Bands** | 24 (float32) |
315+ | ** Compression** | LZW |
316+ | ** File size** | ~ 107 MB |
317+
318+ #### Spectral Bands (10)
319+
320+ Sentinel-2 surface reflectance bands covering visible, red-edge, near-infrared, and short-wave infrared wavelengths:
321+
322+ | Band | Name | Wavelength (nm) | Description |
323+ | :-----| :-----| :---------------:| :------------|
324+ | 1 | ** B2** | 490 | Blue |
325+ | 2 | ** B3** | 560 | Green |
326+ | 3 | ** B4** | 665 | Red |
327+ | 4 | ** B5** | 705 | Red Edge 1 |
328+ | 5 | ** B6** | 740 | Red Edge 2 |
329+ | 6 | ** B7** | 783 | Red Edge 3 |
330+ | 7 | ** B8** | 842 | Near Infrared (NIR) |
331+ | 8 | ** B8A** | 865 | Narrow NIR |
332+ | 9 | ** B11** | 1610 | Short-Wave Infrared 1 (SWIR-1) |
333+ | 10 | ** B12** | 2190 | Short-Wave Infrared 2 (SWIR-2) |
334+
335+ #### Spectral Indices (14)
336+
337+ Derived vegetation, water, and soil indices computed from the spectral bands:
338+
339+ | Index | Formula | Purpose |
340+ | :------| :--------| :--------|
341+ | ** NDVI** | (NIR - Red) / (NIR + Red) | Vegetation greenness |
342+ | ** EVI** | 2.5 * (NIR - Red) / (NIR + 6* Red - 7.5* Blue + 1) | Enhanced vegetation (corrects atmospheric effects) |
343+ | ** SAVI** | 1.5 * (NIR - Red) / (NIR + Red + 0.5) | Soil-adjusted vegetation |
344+ | ** GNDVI** | (NIR - Green) / (NIR + Green) | Green-band vegetation |
345+ | ** NDRE** | (NIR - RedEdge1) / (NIR + RedEdge1) | Red-edge vegetation |
346+ | ** NDRE2** | (RedEdge3 - RedEdge1) / (RedEdge3 + RedEdge1) | Narrow red-edge vegetation |
347+ | ** NDWI** | (Green - NIR) / (Green + NIR) | Water content in vegetation |
348+ | ** MNDWI** | (Green - SWIR1) / (Green + SWIR1) | Modified water index (surface water) |
349+ | ** BSI** | ((SWIR1 + Red) - (NIR + Blue)) / ((SWIR1 + Red) + (NIR + Blue)) | Bare soil |
350+ | ** NDTI** | (SWIR1 - SWIR2) / (SWIR1 + SWIR2) | Non-photosynthetic vegetation / tillage |
351+ | ** CIgreen** | (NIR / Green) - 1 | Chlorophyll index (green) |
352+ | ** CIrededge** | (NIR / RedEdge1) - 1 | Chlorophyll index (red edge) |
353+ | ** MSAVI** | (2* NIR + 1 - sqrt((2* NIR+1)² - 8* (NIR-Red))) / 2 | Modified soil-adjusted vegetation |
354+ | ** GCVI** | (NIR / Green) - 1 | Green chlorophyll vegetation index |
355+
356+ > ** Note:** GCVI is identical to CIgreen and is dropped during training, leaving ** 23 features** .
357+
358+ ---
359+
360+ ### Training Dataset
361+
362+ Labeled ground-truth samples extracted from the raster at known crop field locations.
363+
364+ ` crop_training_data_5classes_2020.csv `
365+
366+ | Property | Value |
367+ | :---------| :------|
368+ | ** Total samples** | 24,556 |
369+ | ** After deduplication** | 24,556 (no duplicates) |
370+ | ** Features** | 23 (after dropping GCVI) |
371+ | ** Missing values** | 0 |
372+ | ** File size** | ~ 8.6 MB |
373+
374+ #### Class Distribution
375+
376+ | Class ID | Class Name | Samples | Percentage | Category |
377+ | :--------:| :-----------| --------:| :----------:| :---------|
378+ | 0 | ** Cotton** | 337 | 1.4% | Minority |
379+ | 1 | ** Wheat** | 7,901 | 32.2% | Majority |
380+ | 2 | ** Fallow** | 11,150 | 45.4% | Majority |
381+ | 3 | ** Grass** | 5,024 | 20.5% | Moderate |
382+ | 4 | ** Water** | 144 | 0.6% | Minority |
383+
384+ #### Data Split
385+
386+ The dataset is split using stratified random sampling (seed=42) to preserve class proportions:
387+
388+ | Split | Percentage | Samples | Purpose |
389+ | :------| :----------:| --------:| :--------|
390+ | ** Train** | 70% | 17,189 | Model training + scaler fitting |
391+ | ** Validation** | 15% | 3,684 | Early stopping & hyperparameter selection |
392+ | ** Test** | 15% | 3,683 | Final unbiased evaluation |
393+
394+ #### Feature Value Ranges
395+
396+ | Feature | Min | Max | Mean | Std |
397+ | :--------| ----:| ----:| -----:| ----:|
398+ | B2 | 0.0124 | 0.1406 | 0.0567 | 0.0321 |
399+ | B3 | 0.0336 | 0.1806 | 0.0912 | 0.0378 |
400+ | B4 | 0.0179 | 0.2885 | 0.1053 | 0.0682 |
401+ | B5 | 0.0379 | 0.3036 | 0.1490 | 0.0649 |
402+ | B6 | 0.0152 | 0.3286 | 0.1973 | 0.0792 |
403+ | B7 | 0.0187 | 0.3856 | 0.2172 | 0.0906 |
404+ | B8 | 0.0211 | 0.4196 | 0.2320 | 0.0989 |
405+ | B8A | 0.0173 | 0.3943 | 0.2366 | 0.0945 |
406+ | B11 | 0.0276 | 0.3987 | 0.2073 | 0.0641 |
407+ | B12 | 0.0223 | 0.3001 | 0.1526 | 0.0660 |
408+ | NDVI | -0.3794 | 0.9099 | 0.4226 | 0.2892 |
409+ | EVI | -0.1346 | 0.7345 | 0.1951 | 0.1741 |
410+ | SAVI | -0.0748 | 0.6408 | 0.2633 | 0.1842 |
411+ | GNDVI | -0.1824 | 0.8117 | 0.4384 | 0.2215 |
412+ | NDRE | -0.1618 | 0.4439 | 0.1583 | 0.1397 |
413+ | NDRE2 | -0.1485 | 0.2174 | 0.0792 | 0.0733 |
414+ | NDWI | -0.4701 | 0.5739 | 0.1183 | 0.2075 |
415+ | MNDWI | -0.6554 | 0.3752 | -0.3611 | 0.1686 |
416+ | BSI | -0.4143 | 0.2164 | -0.0017 | 0.1064 |
417+ | NDTI | 0.0034 | 0.2614 | 0.1193 | 0.0414 |
418+ | CIgreen | -0.2674 | 11.6385 | 2.0076 | 2.0345 |
419+ | CIrededge | -0.2386 | 3.8363 | 0.5632 | 0.5991 |
420+ | MSAVI | -0.1033 | 0.5920 | 0.1676 | 0.1568 |
421+
422+ ---
423+
424+ ### Preprocessing Pipeline
425+
426+ ```
427+ Raw CSV (24,556 samples, 28 columns)
428+ |
429+ v
430+ Drop metadata columns (system:index, .geo)
431+ |
432+ v
433+ Drop GCVI (duplicate of CIgreen)
434+ |
435+ v
436+ Remove duplicates (0 found)
437+ |
438+ v
439+ Stratified train/val/test split (70/15/15)
440+ |
441+ v
442+ StandardScaler (fit on train set only)
443+ |
444+ v
445+ KNN Graph Construction (k=8 neighbors)
446+ |
447+ v
448+ PyTorch Geometric Data Object
449+ ```
291450
292451## License
293452
0 commit comments