This repository provides a complete pipeline for training semantic segmentation models, applying standardized post-processing, and combining predictions through model ensembles.
All components are written in Python using TensorFlow/Keras and NumPy, and are organized into three independent modules:
- Semantic Models – UNet, FPN, and PSPNet with different backbones.
- Post-Processing Module – For cleaning binary segmentation masks.
- Ensemble Module – For combining multiple segmentation outputs into a unified final mask.
The repository assumes that images and masks are already preprocessed.
No data augmentation is performed internally, so augmentation must be handled externally before training.
Python 3.9–3.11
TensorFlow 2.10+ (CPU or GPU)
NumPy
OpenCV (cv2)
Matplotlib
You'll need to install the required dependencies:
pip install tensorflow numpy opencv-python matplotlib scikit-image
Download the OralEpitheliumDB dataset, used in the original paper.
To ensure compatibility across all models and modules, all images will be resized according to the following specifications:
- Format:
.png,.jpg, or.tif - Channels: 3-channel RGB
- Shape expected by the models:
256 × 256 × 3
- Format:
.pngor.tif - Shape:
256 × 256 × 1(grayscale) - Pixel values:
- 0 → background
- 1 → positive class
The ensemble module also expects masks in this exact format.
Masks with grayscale gradients will be binarized before training.
The main entry point for training models is semantic_segmentation_models.py. You can configure which models to train, dataset paths, and training hyperparameters directly in the main module of this script.
Change the following value to chose between train and inference modes. Accepted values are train and segment.
operation = "train"To specify the train and test sets, adapt the following commands. It is recommended that all images and masks follow the specifications described in Section 2.
train_imgs_path = "oed-aumentado/images"
train_masks_path = "oed-aumentado/masks"
val_imgs_path = "oed/images"
val_masks_path = "oed/masks"Adjust these values based on your dataset size, available GPU memory, and desired training schedule.
batch_size = 4 # Number of samples per batch
train_steps = 10 # Number of steps per training epoch
val_steps = 5 # Number of validation steps
n_epochs = 10 # Number of training epochsYou can specify which models to train by listing them in models_to_train. Only the models included in this list will be trained. Models not listed will be skipped.
models_to_train = [
"unet_resnet50",
"unet_mobilenetv3",
"fpn_resnet50",
"fpn_mobilenetv3",
"pspnet_resnet50",
"pspnet_mobilenetv3"
]Once all paths, hyperparameters, and model selections are set, to train the models using your data, simply run:
python semantic_segmentation_models.pyThe script will automatically:
- Load the selected datasets.
- Build and compile the selected models.
- Train each model for the specified number of epochs.
- Save training metrics and plots for each model.
In the main module, change the operation value to segment.
operation = "segment"The script will automatically:
- Load all .h5 files from trained models.
- Segment the test set (defined by
test_imgs_path). - Save segmentation results on the directory defined by
seg_output_dir.
This operation will load and segment all images from the test set using every .h5 file in the training directory. It is recommended that you take care not to save too many files in this directory to avoid unnecessary inferences.
You can configure the ensemble through the main modulo in ensemble_voting_rule_semantic.py.
To specify the directories to load and save data. The final masks will be saved in output_dir, maintaining the original filenames.
root_dir = "results_directory"
original_images_dir = "datasets/dysplasia"
output_dir = "ENSEMBLE_dynamic_output"To specify whether all classes should be processed or only one.
class_list = ["healthy", "mild", "moderate", "severe"]To define the models' weights (to run a simple sum, all weights=1) and threshold value. The number of weights should match the number of models included in root_dir.
weight_vector = [1, 1, 1, 1, 1, 1]
threshold_percent = 0.401 Silva, A. B., Tosta, T. A., Neves, L. A., Martins, A. S., De Faria, P. R., Do Nascimento, M. Z. (2024, September). Ensemble of Semantic Segmentation Models for Oral Epithelial Dysplasia Images. In 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (pp. 1-6). IEEE. https://doi.org/10.1109/SIBGRAPI62404.2024.10716304
2 Silva, A. B., Martins, A. S., Tosta, T. A. A., Loyola, A. M., Cardoso, S. V., Neves, L. A., De Faria, P. R., Do Nascimento, M. Z. (2024). Oralepitheliumdb: A dataset for oral epithelial dysplasia image segmentation and classification. Journal of Imaging Informatics in Medicine, 37(4), 1691-1710. https://doi.org/10.1007/s10278-024-01041-w