Skip to content

Latest commit

 

History

History
124 lines (98 loc) · 6.02 KB

File metadata and controls

124 lines (98 loc) · 6.02 KB

Architecture

This document describes the DermaScan prototype architecture, module boundaries, data flow, design decisions, naming conventions, and extension points.

System Summary

DermaScan is organized as a small monorepo-style scientific prototype:

  • dermascan_app/ contains the local Flask application and inference path.
  • ml_pipeline/ contains training, evaluation, configuration, and registry metadata.
  • artifacts/ contains model binaries and legacy sample inputs.
  • data/ is the expected local location for datasets and future manifests.
  • var/ contains runtime-generated files such as local uploads.
  • docs/ contains active project documentation and archived historical notes.

Major Components

Component Path Responsibility
Web application dermascan_app/app.py Defines Flask routes, receives uploads, validates image readability, calls inference, returns JSON.
Inference helper dermascan_app/inference.py Loads the current Keras model artifact, applies image preprocessing, returns raw prediction score.
Templates dermascan_app/templates/ Provides the local browser interface and educational/disclaimer pages.
Training script ml_pipeline/training/train_cnn.py Builds and trains the CNN baseline, logs metrics, and saves the model artifact.
Evaluation script ml_pipeline/evaluation/evaluate_cnn.py Loads the current model and evaluates it on the test split.
Configuration ml_pipeline/config/model_parameters.json Stores model and training parameters.
Registry log ml_pipeline/registry/progress_log.csv Stores historical metric rows.
Model artifact artifacts/models/current/melanoma_detector.keras Packaged Keras model used by local inference.
Runtime uploads var/uploads/ Local development upload storage; ignored by Git.

Runtime Data Flow

  1. User selects an image in the browser.
  2. Browser posts the file to /upload.
  3. Flask validates presence and extension.
  4. Flask writes the file to var/uploads/.
  5. OpenCV checks whether the saved file is readable as an image.
  6. dermascan_app.inference lazily loads artifacts/models/current/melanoma_detector.keras.
  7. TensorFlow loads the image, resizes it to 256x256, converts it to an array, and normalizes pixels to [0, 1].
  8. The model returns a single sigmoid-style score.
  9. The Flask route thresholds the score and returns JSON with filename, prediction, and confidence.

Training Data Flow

  1. A local dataset is placed under data/melanoma_cancer_dataset/.
  2. TensorFlow loads images from train, validation, and test directories.
  3. Folder names supply binary labels: benign and malignant.
  4. Images are resized to 256x256 and normalized to [0, 1].
  5. The CNN is trained using parameters from ml_pipeline/config/model_parameters.json.
  6. The trained model is saved to artifacts/models/current/melanoma_detector.keras.
  7. A metrics row is appended to ml_pipeline/registry/progress_log.csv.

Design Decisions

Decision Rationale Future Revisit
Flask app Simple local demo and low overhead for upload/predict workflows. Reassess for production API, authentication, rate limiting, and observability.
Keras model artifact Portable format that packages model architecture and weights. Add model registry metadata and release versioning.
Directory-based dataset loading Easy label inference and approachable for capstone work. Add dataset manifests and immutable split files.
Local runtime uploads Simple local debugging. Replace with secure temporary storage for any public demo.
Separate artifacts/, data/, and var/ Keeps source, datasets, binaries, and runtime state conceptually distinct. Preserve this separation as the project matures.

See DECISION_RATIONALE.md for a fuller discussion.

Naming Conventions

Pattern Meaning
dermascan_app Runtime application code.
ml_pipeline Model development and evaluation code.
artifacts/models/current Current packaged model used by app inference.
artifacts/sample_inputs/legacy_uploads Preserved historical uploads, not runtime state.
data/melanoma_cancer_dataset Local training/evaluation dataset root.
var/uploads Local runtime uploads generated by the Flask app.

Extension Points

Future contributors should add new functionality in these locations:

Goal Preferred Location
New Flask route dermascan_app/app.py or a future dermascan_app/routes/ package.
New preprocessing function dermascan_app/inference.py initially; later extract shared preprocessing to ml_pipeline/preprocessing/.
New model architecture ml_pipeline/training/ with a clearly named script or module.
New evaluation report ml_pipeline/evaluation/ and ml_pipeline/reports/.
Dataset manifest data/ or future data/manifests/.
Model release metadata ml_pipeline/registry/ or future artifacts/models/<version>/.
Documentation docs/, with historical documents archived under docs/archived_previous_documents/.

Architectural Risks

  • Training and inference preprocessing are implemented separately and should be unified.
  • Runtime threshold and evaluation threshold are not fully aligned.
  • The app returns a field named confidence, but the value is a raw model score.
  • Upload handling is local-development only.
  • Training scripts are executable scripts rather than importable, tested modules.
  • Dependencies are not pinned.

Target Architecture

A production-quality research or pilot system should move toward:

Client
  -> API service
  -> validated temporary upload storage
  -> shared preprocessing package
  -> model inference service
  -> calibrated result formatter
  -> audit logs and monitoring
  -> retention/deletion policy

The ML pipeline should move toward:

dataset manifest
  -> reproducible preprocessing
  -> training configuration
  -> model artifact
  -> evaluation report
  -> model card
  -> release approval