Skip to content

BertScorer multilabel fine-tuning collapses on sparse data (no pos_weight/focal, no warmup, brittle default LR) #324

Description

@voorhs

Summary

Fine-tuning the bert scorer on sparse multilabel data collapses to the base-rate (degenerate output) unless the learning rate is tuned narrowly. Contributing factors: plain BCEWithLogitsLoss with no pos_weight/focal option (so the trivial "predict the ~4% base rate everywhere" minimizes loss), no LR warmup in the TrainingArguments, and a brittle preset default LR.

Where

autointent/modules/scoring/_bert.pyBertScorer._train():

  • TrainingArguments(...) has no warmup_ratio / warmup_steps.
  • Loss is the HF default for problem_type="multi_label_classification" (BCEWithLogitsLoss, no pos_weight).
  • _presets/transformers-no-hpo.yaml ships learning_rate: [7.0e-5]; transformers-light.yaml searches 1e-5…1e-4.

Evidence (bert-base-uncased, GoEmotions 28-class, ~2.5k balanced rows, MPS)

Best macro-F1 at the optimal threshold, full epochs:

LR result
1e-5 collapse (≈0.03; BCE plateaus at the base-rate floor ~0.17, near-constant outputs)
3e-5 collapse
2e-5 learns (≈0.22)
2e-5 + warmup 0.1 ≈0.18 (warmup alone didn't help here)

So the stable LR band is narrow and the shipped defaults (3e-5 / 7e-5) land in the collapse region on this task.

Suggested fixes (in rough priority)

  1. Expose a class-imbalance loss option (pos_weight or focal loss) via a Trainer subclass with a custom compute_loss — the principled fix for sparse multilabel; should widen the stable region substantially.
  2. Add warmup_ratio/warmup_steps to TrainingArguments and expose it as a hyperparameter.
  3. More robust preset defaults for multilabel (e.g. lr ≈ 2e-5 + warmup) so out-of-the-box runs don't silently collapse.

LR sensitivity is partly inherent, but (1)–(2) make it far less knife-edged.

Environment

AutoIntent 0.3.1, MPS, transformers/transformers-no-hpo presets, GoEmotions multilabel (28 classes, mean ~1.18 labels/example).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions