This document describes the scientific and computational methodology used by the DermaScan prototype. It also identifies assumptions and missing metadata required for rigorous publication-quality use.
DermaScan is framed as a binary image-classification prototype:
- input: RGB skin-lesion image
- output: model score and thresholded class label
- classes:
benign,malignant
The current system should be interpreted as a technical demonstration, not a clinically validated diagnostic method.
The repository acknowledges the International Skin Imaging Collaboration (ISIC) as the dataset source in archived project materials, but the active repository does not include a complete dataset manifest.
Known assumptions:
- Images are organized into class-labeled folders.
- Folder names provide labels.
- The expected labels are
benignandmalignant. - Dataset splits are expected to exist before training.
Missing information:
- TODO: Add exact dataset name, subset, and version.
- TODO: Add dataset license and permitted-use terms.
- TODO: Add download date.
- TODO: Add filtering and exclusion criteria.
- TODO: Add train/validation/test split manifest.
- TODO: Add class counts for each split.
- TODO: Add demographic and image-acquisition metadata if available.
Both training and inference use the same core image normalization idea:
- Load an image.
- Resize to
256x256. - Convert to an RGB-like tensor/array.
- Scale pixel values from
[0, 255]to[0, 1].
The current implementation does not document color-space conversion assumptions beyond TensorFlow image loading. Future work should ensure that training and inference use one shared preprocessing function.
The current model is a Keras Sequential convolutional neural network.
| Stage | Description |
|---|---|
| Input | 256x256x3 image tensor |
| Convolution block 1 | Conv2D(32, 3x3, relu) then max pooling |
| Convolution blocks 2-4 | Conv2D filters 64, 128, 256; batch normalization; ReLU activation; max pooling |
| Flatten | Converts feature maps to a dense vector |
| Dense | 128 ReLU units |
| Regularization | Dropout 0.35 |
| Output | Single sigmoid unit |
The architecture is suitable as an educational CNN baseline. It should not be assumed to be optimal for dermatology imaging.
Training is implemented in ml_pipeline/training/train_cnn.py.
The script:
- Reads configuration from
ml_pipeline/config/model_parameters.json. - Loads image splits from
data/melanoma_cancer_dataset/. - Normalizes pixel values.
- Builds the CNN architecture.
- Compiles the model with Adam and binary cross-entropy.
- Tracks accuracy, precision, and recall during training.
- Trains for the configured number of epochs.
- Saves the model to
artifacts/models/current/melanoma_detector.keras. - Evaluates on the test split.
- Appends summary metrics to
ml_pipeline/registry/progress_log.csv.
Evaluation is implemented in ml_pipeline/evaluation/evaluate_cnn.py.
The script:
- Loads model configuration.
- Loads the test split.
- Normalizes image tensors.
- Loads the packaged model artifact.
- Generates sigmoid-style prediction scores.
- Applies the configured threshold.
- Prints a classification report.
- Prints and displays a confusion matrix.
The project tracks:
| Metric | Meaning |
|---|---|
| Accuracy | Overall fraction of correct predictions. |
| Precision | Fraction of predicted malignant cases that are actually malignant. |
| Recall | Fraction of actual malignant cases predicted as malignant. |
| F1 score | Harmonic mean of precision and recall. |
| Confusion matrix | Counts of true negatives, false positives, false negatives, and true positives. |
For this problem domain, recall and false-negative analysis are especially important. A false benign result could create unsafe reassurance.
The ML pipeline configuration uses threshold = 0.35. The Flask app currently uses 0.4 for runtime labeling.
This mismatch should be resolved before any formal model release. A future threshold policy should document:
- selected operating threshold
- validation dataset used
- sensitivity/recall tradeoff
- false-positive burden
- intended workflow and user population
- calibration status
The current prototype assumes:
- directory labels are correct
- images are appropriate skin-lesion inputs
- train/validation/test splits are representative
- resizing to
256x256preserves sufficient signal - a binary benign/malignant label space is adequate for the prototype task
These assumptions require validation before scientific or clinical claims.
- No external validation set is documented.
- Dataset provenance is incomplete.
- No subgroup performance analysis is available.
- No image-acquisition-device analysis is available.
- No calibration analysis is available.
- No uncertainty quantification is implemented.
- No formal clinical workflow is defined.
- No model-card release record exists for the packaged artifact.
- Rebuild the dataset with a documented split manifest.
- Establish a reproducible baseline using the current CNN.
- Compare against transfer-learning baselines.
- Evaluate calibration and threshold sensitivity.
- Report confusion matrices by relevant cohorts if metadata permits.
- Create a model card for each trained artifact.
- Preserve generated evaluation reports with the model release.