icon	lucide/blocks

Architecture

STAC Catalog Structure

!!! info "Three collections, all items use MLM Extension and Version Extension"

Catalog: fair-models
|
+-- Collection: base-models
|     Model blueprints contributed via PR.
|     Each item = complete model card (weights, code, Docker, MLM spec).
|     Versioned by contributors, registered via CLI utility.
|     |
|     +-- Item: unet-segmentation (v1)           category: semantic-segmentation
|     +-- Item: resnet18-classification (v1)      category: classification
|     +-- Item: yolo11n-detection (v1)            category: object-detection
|
+-- Collection: local-models
|     Finetuned models produced by ZenML pipelines.
|     Only promoted (production) versions appear here.
|     |
|     +-- Item: unet-segmentation-finetuned-banepa-v2   (production, latest-version)
|     +-- Item: unet-segmentation-finetuned-banepa-v1   (deprecated: true)
|     +-- Item: yolo11n-detection-finetuned-banepa-v1   (production)
|
+-- Collection: datasets
      Training data registered via fAIr UI/backend.
      |
      +-- Item: buildings-banepa-semantic-segmentation    category: semantic-segmentation
      +-- Item: buildings-banepa-object-detection         category: object-detection

??? note "What STAC Items Contain"

Standard STAC/MLM fields are used wherever possible. A small set of
`fair:*` fields fills gaps where MLM has no equivalent:
`fair:metrics_spec` (evaluation metrics vocabulary),
`fair:split_spec` (train/val split strategy), and
`fair:hyperparameters_spec` (hyperparameter types, ranges, descriptions).

### Base model item

See [`models/unet_segmentation/stac-item.json`](https://github.com/hotosm/fAIr-models/tree/master/models/unet_segmentation/stac-item.json) for a complete example.
All three base models (`unet_segmentation`, `resnet18_classification`, `yolo11n_detection`) follow this structure.

Key properties: `mlm:name`, `mlm:architecture`, `mlm:tasks`, `mlm:framework`,
`mlm:input` (with `pre_processing_function`), `mlm:output` (with `post_processing_function`
and `classification:classes`), `mlm:hyperparameters`, `keywords`,
`fair:metrics_spec`, `fair:split_spec`, `fair:hyperparameters_spec`.

Key assets: `checkpoint` (torch weights, HTTPS URL), `model` (ONNX, optional for base models),
`source-code` (with `mlm:entrypoint`), `mlm:training` / `mlm:inference` (Docker image OCI references).

The `mlm:entrypoint` tells the backend which Python function to call.
`pre_processing_function` / `post_processing_function` are standard MLM
Processing Expression fields.

### Local model item

Same MLM fields as base model, plus:

- `derived_from` link pointing to the base model item
- `derived_from` link pointing to the dataset item used for training
- `checkpoint` asset (torch weights) + `model` asset (ONNX) pointing to S3 finetuned artifacts
- Runtime assets reference the same Docker image as parent base model
- Version Extension: `version`, `deprecated`, `predecessor-version` / `successor-version` / `latest-version` links
- `mlm:hyperparameters` reflects the actual training params used

### Dataset item

Label + file extensions. Properties: `label:type`, `label:tasks`, `label:classes`, `keywords`.
Assets: `chips` (image directory), `labels` (GeoJSON).

Tagging and Classification

Concept	Standard field	Example values
ML task	`mlm:tasks`	`semantic-segmentation`, `object-detection`
Feature type tags	`keywords` (STAC core)	`building`, `road`, `tree`
Output geometry	`keywords` (STAC core)	`polygon`, `line`, `point`
Output classes	`classification:classes`	`{name: "building", value: 1}`
Dataset label type	`label:type` (Label ext)	`vector`, `raster`
Dataset label task	`label:tasks` (Label ext)	`segmentation`, `detection`
Pre/post processing	`pre_processing_function` / `post_processing_function` (MLM)	Python entrypoint

Compatibility Validation

!!! warning

The backend validates that a base model and dataset are compatible before
triggering finetuning. Validation is based on matching `keywords` and
`mlm:tasks` / `label:tasks` between the model and dataset STAC items.

Flows

1. Base Model Registration (PR workflow)

flowchart TD
    A[Model Developer] -->|Prepares PR| B[fAIr-Models GitHub]
    B -->|CI: build, validate, test| C{Review}
    C -->|Merge| D[Post-merge CLI / CI]
    D --> E[Build + push Docker image]
    D --> F[Upload weights to S3]
    D --> G[Register STAC item in base-models]
    G --> H[STAC: base-models/model-name v1]

2. Finetuning (ZenML pipeline)

flowchart TD
    A[User picks base model + dataset] --> B[fAIr Backend]
    B -->|Read STAC items| C[Validate compatibility]
    C --> D[Generate ZenML YAML config]
    D --> E[ZenML Pipeline in model Docker]
    E --> F[split_dataset]
    F --> G[train_model]
    G --> H[evaluate_model]
    G --> I[export_onnx]
    G --> J[ZenML Model Control Plane]
    H --> J
    I --> J

3. Promotion to STAC

flowchart TD
    A[User picks best version] --> B[fAIr Backend]
    B --> C[ZenML: set stage = production]
    B --> D[StacCatalogManager]
    D --> E[Build STAC MLM item]
    D --> F[Deprecate previous version]
    D --> G[Add Version Extension links]
    E --> H[STAC: local-models/model-v3 production]

ZenML action	STAC effect
Promote to production	Create item, deprecate previous
Archive version	Set `deprecated: true` on item
Delete version	Remove item from collection
Delete model	Remove all items + clean up

4. Inference

Works for both base models and local models. The STAC item always has enough information to run inference: model weights, inference runtime, input/output spec.

Identity Model

Concept	Example	ZenML	STAC
Base model	`unet-segmentation`	Not in ZenML MCP	Item in `base-models`
Finetuned model	`unet-segmentation-finetuned-banepa`	ZenML Model (many versions)	Item(s) in `local-models`
Specific version	`unet-segmentation-finetuned-banepa` v2	ZenML Model Version 2	Item `unet-segmentation-finetuned-banepa-v2`
Dataset	`buildings-banepa-semantic-segmentation`	Not in ZenML MCP	Item in `datasets`

Infrastructure

Component	Local	Production
STAC Catalog	pystac JSON catalog	stac-fastapi + pgstac
ZenML	SQLite	ZenML Server (PostgreSQL)
Orchestrator	`local`	Kubernetes
Artifact Store	local filesystem	S3
Experiment Tracker	MLflow	MLflow
Container Registry	local Docker	ghcr.io

??? abstract "Architecture Decisions"

1. **STAC replaces ZenML Model Registry** : STAC is a downstream publish target via `StacCatalogManager`, not a ZenML stack component.
2. **STAC item = self-sufficient source of truth** : contains everything needed to run training or inference.
3. **Finetuned models share parent pipeline code** : only weights differ between base and local models.
4. **Standards first, `fair:*` only when needed** : prefer `mlm:tasks`, `keywords`, `classification:classes` and other MLM/STAC fields; use `fair:*` only where MLM has no equivalent (metrics vocabulary, split strategy, hyperparameter spec).
5. **YAML-based training & inference** : every run is driven by a generated config logged as a ZenML artifact.
6. **MLM Processing Expression for dispatch** : `pre_processing_function` / `post_processing_function` use Python entrypoints.
7. **Pipeline contract** : every model must export `training_pipeline` and `inference_pipeline` as `@pipeline`-decorated functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

STAC Catalog Structure

Tagging and Classification

Compatibility Validation

Flows

1. Base Model Registration (PR workflow)

2. Finetuning (ZenML pipeline)

3. Promotion to STAC

4. Inference

Identity Model

Infrastructure

References

STAC Extensions

ZenML

fAIr Ecosystem

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture

STAC Catalog Structure

Tagging and Classification

Compatibility Validation

Flows

1. Base Model Registration (PR workflow)

2. Finetuning (ZenML pipeline)

3. Promotion to STAC

4. Inference

Identity Model

Infrastructure

References

STAC Extensions

ZenML

fAIr Ecosystem