This document describes the DermaScan prototype architecture, module boundaries, data flow, design decisions, naming conventions, and extension points.
DermaScan is organized as a small monorepo-style scientific prototype:
dermascan_app/contains the local Flask application and inference path.ml_pipeline/contains training, evaluation, configuration, and registry metadata.artifacts/contains model binaries and legacy sample inputs.data/is the expected local location for datasets and future manifests.var/contains runtime-generated files such as local uploads.docs/contains active project documentation and archived historical notes.
| Component | Path | Responsibility |
|---|---|---|
| Web application | dermascan_app/app.py |
Defines Flask routes, receives uploads, validates image readability, calls inference, returns JSON. |
| Inference helper | dermascan_app/inference.py |
Loads the current Keras model artifact, applies image preprocessing, returns raw prediction score. |
| Templates | dermascan_app/templates/ |
Provides the local browser interface and educational/disclaimer pages. |
| Training script | ml_pipeline/training/train_cnn.py |
Builds and trains the CNN baseline, logs metrics, and saves the model artifact. |
| Evaluation script | ml_pipeline/evaluation/evaluate_cnn.py |
Loads the current model and evaluates it on the test split. |
| Configuration | ml_pipeline/config/model_parameters.json |
Stores model and training parameters. |
| Registry log | ml_pipeline/registry/progress_log.csv |
Stores historical metric rows. |
| Model artifact | artifacts/models/current/melanoma_detector.keras |
Packaged Keras model used by local inference. |
| Runtime uploads | var/uploads/ |
Local development upload storage; ignored by Git. |
- User selects an image in the browser.
- Browser posts the file to
/upload. - Flask validates presence and extension.
- Flask writes the file to
var/uploads/. - OpenCV checks whether the saved file is readable as an image.
dermascan_app.inferencelazily loadsartifacts/models/current/melanoma_detector.keras.- TensorFlow loads the image, resizes it to
256x256, converts it to an array, and normalizes pixels to[0, 1]. - The model returns a single sigmoid-style score.
- The Flask route thresholds the score and returns JSON with
filename,prediction, andconfidence.
- A local dataset is placed under
data/melanoma_cancer_dataset/. - TensorFlow loads images from
train,validation, andtestdirectories. - Folder names supply binary labels:
benignandmalignant. - Images are resized to
256x256and normalized to[0, 1]. - The CNN is trained using parameters from
ml_pipeline/config/model_parameters.json. - The trained model is saved to
artifacts/models/current/melanoma_detector.keras. - A metrics row is appended to
ml_pipeline/registry/progress_log.csv.
| Decision | Rationale | Future Revisit |
|---|---|---|
| Flask app | Simple local demo and low overhead for upload/predict workflows. | Reassess for production API, authentication, rate limiting, and observability. |
| Keras model artifact | Portable format that packages model architecture and weights. | Add model registry metadata and release versioning. |
| Directory-based dataset loading | Easy label inference and approachable for capstone work. | Add dataset manifests and immutable split files. |
| Local runtime uploads | Simple local debugging. | Replace with secure temporary storage for any public demo. |
Separate artifacts/, data/, and var/ |
Keeps source, datasets, binaries, and runtime state conceptually distinct. | Preserve this separation as the project matures. |
See DECISION_RATIONALE.md for a fuller discussion.
| Pattern | Meaning |
|---|---|
dermascan_app |
Runtime application code. |
ml_pipeline |
Model development and evaluation code. |
artifacts/models/current |
Current packaged model used by app inference. |
artifacts/sample_inputs/legacy_uploads |
Preserved historical uploads, not runtime state. |
data/melanoma_cancer_dataset |
Local training/evaluation dataset root. |
var/uploads |
Local runtime uploads generated by the Flask app. |
Future contributors should add new functionality in these locations:
| Goal | Preferred Location |
|---|---|
| New Flask route | dermascan_app/app.py or a future dermascan_app/routes/ package. |
| New preprocessing function | dermascan_app/inference.py initially; later extract shared preprocessing to ml_pipeline/preprocessing/. |
| New model architecture | ml_pipeline/training/ with a clearly named script or module. |
| New evaluation report | ml_pipeline/evaluation/ and ml_pipeline/reports/. |
| Dataset manifest | data/ or future data/manifests/. |
| Model release metadata | ml_pipeline/registry/ or future artifacts/models/<version>/. |
| Documentation | docs/, with historical documents archived under docs/archived_previous_documents/. |
- Training and inference preprocessing are implemented separately and should be unified.
- Runtime threshold and evaluation threshold are not fully aligned.
- The app returns a field named
confidence, but the value is a raw model score. - Upload handling is local-development only.
- Training scripts are executable scripts rather than importable, tested modules.
- Dependencies are not pinned.
A production-quality research or pilot system should move toward:
Client
-> API service
-> validated temporary upload storage
-> shared preprocessing package
-> model inference service
-> calibrated result formatter
-> audit logs and monitoring
-> retention/deletion policy
The ML pipeline should move toward:
dataset manifest
-> reproducible preprocessing
-> training configuration
-> model artifact
-> evaluation report
-> model card
-> release approval