This repository presents a first-of-its-kind cross-sensor adaptation of a state-of-the-art conditional diffusion framework, designed to perform cloud removal on ISRO's LISS-IV high-resolution (5.8m) satellite imagery.
Due to the lack of large-scale paired datasets natively available for LISS-IV, we utilized zero-shot transfer learning from the Cornell AllClear foundation model. By using Sentinel-2 imagery as a spectral proxy (matching the Green, Red, and Near-Infrared bands), we successfully adapted a powerful diffusion prior to resolve cloud occlusion in LISS-IV data without requiring native retraining.
For a deep dive into the engineering, validation, and limitations of this system, explore the detailed documentation:
- Model Architecture & Grafting Process: Details on NAFNet-UNet, 3-channel input layer surgery, and the timestep injection framework.
- Model Validation & Quantitative Results: Detailed metric reports (PSNR/SSIM) and 3x3 visual comparison plots.
- Limitations & Future Directions: Analysis of the "Regression to the Mean" trap under heavy clouds and the roadmap for multi-modal Sentinel-1 SAR fusion.
This project is built upon the breakthrough research of the AllClear and DiffCR frameworks developed at Cornell University.
- Research Paper: AllClear: A Large-Scale Synthetic and Real-World Dataset for Cloud Removal
- Original Codebase: GitHub - Zhou-Hangyu/allclear
- Cornell Project Page: AllClear Website
Since LISS-IV lacks the extensive multi-temporal datasets available for Sentinel-2, training a SOTA diffusion model from scratch was impossible due to data scarcity.
Our core innovation is adapting a pretrained Sentinel-2 diffusion model to process LISS-IV spectral bands, grafting the inputs, and fine-tuning on matched spectral signatures.
graph TD
A["Pretrained<br>AllClear Model"] -->|1. Extract Weights| B("Weight Surgery")
C["LISS-IV Bands:<br>Green, Red, NIR"] -->|2. Align Inputs| B
B -->|"3. Graft Input<br>Conv Layer"| D["Fine-Tuned<br>LISS-4 Model"]
D -->|"4. Dynamic<br>Percentile Stretch"| E["Inference on<br>Raw GeoTIFFs"]
E -->|"5. Multi-Step<br>Diffusion Denoise"| F["Cloud-Free Land<br>Cover Output"]
During fine-tuning on the LISS-IV adapted dataset, the model demonstrated rapid convergence. By replacing the 13-channel input layer with our surgical 3-channel LISS-IV layer, the L1 Loss dropped dramatically in the first few epochs, proving the success of our "Zero-Shot" Transfer Learning strategy:
Our model was validated on real temporal pairs. Below is a summary of the metrics achieved:
| Test Set | PSNR (dB) | SSIM | Restoration Quality |
|---|---|---|---|
| AllClear Proxy (Sentinel-2) | 32.35 dB | 0.9008 | Excellent (High Structural Fidelity) |
| Local Validation ROI | 30.82 dB | 0.8415 | Strong (Clear Field Boundaries) |
The image below demonstrates the model successfully predicting cloud-free terrain with a high structural similarity (SSIM > 0.90) and PSNR > 32 dB on Sample 500:
To run inference on your local validation dataset or raw Bhoonidhi GeoTIFFs:
conda env create -f environment.yml
conda activate allclearGenerates the 3x3 side-by-side comparison on the validation dataset:
python evaluation/eval_allclear_comparison.pyThis project is licensed under the MIT License - see the LICENSE file for details.

