This document explains the current MIL path in GloViTa.
MIL is currently supported for:
- bags of precomputed features
- the
precomputed_featuresdataset config - the
precomputedencoder - the
clamhead
That means the active MIL workflow is feature-based, not raw-image-bag or raw-video-bag MIL.
The MIL path is built from:
Important design point:
- CLAM consumes raw bag features directly
- it does not use the standard pooled-token feature aggregation path
This is why the model factory distinguishes between:
- normal pooled heads
- raw-feature-consuming heads
The HDF5 loader supports three layouts.
features: (N, D)
labels: (N,)
This is not bag MIL. It is just standard feature-based classification or regression.
features: (B, N, D)
labels: (B,)
features: (M, D)
labels: (B,)
bag_ptr: (B + 1,)
or
features: (M, D)
labels: (B,)
bag_lengths: (B,)
At dataloader time, MIL bags are collated into:
features: padded tensor(B, N_max, D)mask: boolean tensor(B, N_max)
This padded structure is what the CLAM head receives.
The user-facing config class is:
Important options:
variantsbmb
gatesize_argdropoutk_samplesubtypingfeature_prepl2_normalize_featurescosine_headcosine_scaleinstance_evalinstance_loss_weightattn_dropstochastic_topktopk_ktopk_tautopk_noise_stdtopk_consistency_weight
The main bag prediction is computed from dense attention over the whole bag.
Optional auxiliary behavior:
- instance-level CLAM auxiliary loss:
- enabled by
instance_eval=True
- enabled by
- top-k perturbation consistency penalty:
- enabled by
stochastic_topk=True
- enabled by
Important detail:
- the top-k branch is auxiliary
- it does not replace the main bag prediction
glovita_train \
--data.dataset precomputed_features \
--data.data_root_dir . \
--data.num_classes 2 \
--data.train_features_file /path/to/train_bags.h5 \
--data.val_features_file /path/to/val_bags.h5 \
--model.encoder.encoder_type precomputed \
--model.encoder.feature_dim 1024 \
--model.head.head_type clam \
--model.head.variant mb \
--model.head.instance_eval \
--model.head.stochastic_topk \
--dataloading.batch_size 8data.num_classesmust be set explicitlymodel.encoder.feature_dimmust match the feature tensor dimension- augmentations are not used for
precomputed_features model.feature_aggregation_methodis ignored byclamfull_finetuningis the right PEFT config for this path unless you are reconstructing a checkpoint with a different PEFT method
- active MIL support is feature-based, not raw-bag image/video MIL
- inference and evaluation tooling is primarily oriented around standard classification use cases
- there is no separate generic MIL dataset abstraction yet
- precomputed_features.md: HDF5 file formats and feature extraction
- ../README.md: overall config and CLI model