Architecture

This document describes the DermaScan prototype architecture, module boundaries, data flow, design decisions, naming conventions, and extension points.

System Summary

DermaScan is organized as a small monorepo-style scientific prototype:

dermascan_app/ contains the local Flask application and inference path.
ml_pipeline/ contains training, evaluation, configuration, and registry metadata.
artifacts/ contains model binaries and legacy sample inputs.
data/ is the expected local location for datasets and future manifests.
var/ contains runtime-generated files such as local uploads.
docs/ contains active project documentation and archived historical notes.

Major Components

Component	Path	Responsibility
Web application	`dermascan_app/app.py`	Defines Flask routes, receives uploads, validates image readability, calls inference, returns JSON.
Inference helper	`dermascan_app/inference.py`	Loads the current Keras model artifact, applies image preprocessing, returns raw prediction score.
Templates	`dermascan_app/templates/`	Provides the local browser interface and educational/disclaimer pages.
Training script	`ml_pipeline/training/train_cnn.py`	Builds and trains the CNN baseline, logs metrics, and saves the model artifact.
Evaluation script	`ml_pipeline/evaluation/evaluate_cnn.py`	Loads the current model and evaluates it on the test split.
Configuration	`ml_pipeline/config/model_parameters.json`	Stores model and training parameters.
Registry log	`ml_pipeline/registry/progress_log.csv`	Stores historical metric rows.
Model artifact	`artifacts/models/current/melanoma_detector.keras`	Packaged Keras model used by local inference.
Runtime uploads	`var/uploads/`	Local development upload storage; ignored by Git.

Runtime Data Flow

User selects an image in the browser.
Browser posts the file to /upload.
Flask validates presence and extension.
Flask writes the file to var/uploads/.
OpenCV checks whether the saved file is readable as an image.
dermascan_app.inference lazily loads artifacts/models/current/melanoma_detector.keras.
TensorFlow loads the image, resizes it to 256x256, converts it to an array, and normalizes pixels to [0, 1].
The model returns a single sigmoid-style score.
The Flask route thresholds the score and returns JSON with filename, prediction, and confidence.

Training Data Flow

A local dataset is placed under data/melanoma_cancer_dataset/.
TensorFlow loads images from train, validation, and test directories.
Folder names supply binary labels: benign and malignant.
Images are resized to 256x256 and normalized to [0, 1].
The CNN is trained using parameters from ml_pipeline/config/model_parameters.json.
The trained model is saved to artifacts/models/current/melanoma_detector.keras.
A metrics row is appended to ml_pipeline/registry/progress_log.csv.

Design Decisions

Decision	Rationale	Future Revisit
Flask app	Simple local demo and low overhead for upload/predict workflows.	Reassess for production API, authentication, rate limiting, and observability.
Keras model artifact	Portable format that packages model architecture and weights.	Add model registry metadata and release versioning.
Directory-based dataset loading	Easy label inference and approachable for capstone work.	Add dataset manifests and immutable split files.
Local runtime uploads	Simple local debugging.	Replace with secure temporary storage for any public demo.
Separate `artifacts/`, `data/`, and `var/`	Keeps source, datasets, binaries, and runtime state conceptually distinct.	Preserve this separation as the project matures.

See DECISION_RATIONALE.md for a fuller discussion.

Naming Conventions

Pattern	Meaning
`dermascan_app`	Runtime application code.
`ml_pipeline`	Model development and evaluation code.
`artifacts/models/current`	Current packaged model used by app inference.
`artifacts/sample_inputs/legacy_uploads`	Preserved historical uploads, not runtime state.
`data/melanoma_cancer_dataset`	Local training/evaluation dataset root.
`var/uploads`	Local runtime uploads generated by the Flask app.

Extension Points

Future contributors should add new functionality in these locations:

Goal	Preferred Location
New Flask route	`dermascan_app/app.py` or a future `dermascan_app/routes/` package.
New preprocessing function	`dermascan_app/inference.py` initially; later extract shared preprocessing to `ml_pipeline/preprocessing/`.
New model architecture	`ml_pipeline/training/` with a clearly named script or module.
New evaluation report	`ml_pipeline/evaluation/` and `ml_pipeline/reports/`.
Dataset manifest	`data/` or future `data/manifests/`.
Model release metadata	`ml_pipeline/registry/` or future `artifacts/models/<version>/`.
Documentation	`docs/`, with historical documents archived under `docs/archived_previous_documents/`.

Architectural Risks

Training and inference preprocessing are implemented separately and should be unified.
Runtime threshold and evaluation threshold are not fully aligned.
The app returns a field named confidence, but the value is a raw model score.
Upload handling is local-development only.
Training scripts are executable scripts rather than importable, tested modules.
Dependencies are not pinned.

Target Architecture

A production-quality research or pilot system should move toward:

Client
  -> API service
  -> validated temporary upload storage
  -> shared preprocessing package
  -> model inference service
  -> calibrated result formatter
  -> audit logs and monitoring
  -> retention/deletion policy

The ML pipeline should move toward:

dataset manifest
  -> reproducible preprocessing
  -> training configuration
  -> model artifact
  -> evaluation report
  -> model card
  -> release approval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

System Summary

Major Components

Runtime Data Flow

Training Data Flow

Design Decisions

Naming Conventions

Extension Points

Architectural Risks

Target Architecture

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture

System Summary

Major Components

Runtime Data Flow

Training Data Flow

Design Decisions

Naming Conventions

Extension Points

Architectural Risks

Target Architecture