Skip to content

PP-DocLayoutV3 checkpoint/load-path mismatch under transformers 5.4.0 #179

@VooDisss

Description

@VooDisss

PP-DocLayoutV3 checkpoint/load-path mismatch in GLM-OCR

Summary

GLM-OCR loads PP-DocLayoutV3 through PPDocLayoutV3ForObjectDetection.from_pretrained(...).
With transformers==5.4.0, that load path reports the decoder detection heads as missing,
even though the published PaddlePaddle/PP-DocLayoutV3_safetensors checkpoint contains the
corresponding trained head weights under enc_* names.

This leads to newly initialized decoder heads instead of loading the tied trained weights.

Evidence

Runtime startup reported missing decoder head keys such as:

model.decoder.class_embed.weight
model.decoder.class_embed.bias
model.decoder.bbox_embed.layers.0.weight
model.decoder.bbox_embed.layers.0.bias
...

Checkpoint inspection showed these relevant keys:

model.denoising_class_embed.weight
model.enc_score_head.weight
model.enc_score_head.bias
model.enc_bbox_head.layers.0.weight
model.enc_bbox_head.layers.0.bias
model.enc_bbox_head.layers.1.weight
model.enc_bbox_head.layers.1.bias
model.enc_bbox_head.layers.2.weight
model.enc_bbox_head.layers.2.bias

But the model class expects:

model.denoising_class_embed.weight
model.decoder.class_embed.weight
model.decoder.class_embed.bias
model.decoder.bbox_embed.layers.0.weight
model.decoder.bbox_embed.layers.0.bias
model.decoder.bbox_embed.layers.1.weight
model.decoder.bbox_embed.layers.1.bias
model.decoder.bbox_embed.layers.2.weight
model.decoder.bbox_embed.layers.2.bias

The installed transformers implementation also declares these tied mappings:

_tied_weights_keys = {
    "decoder.class_embed": "enc_score_head",
    "decoder.bbox_embed": "enc_bbox_head",
}

Root cause

The checkpoint stores the trained detection heads under encoder-head names:

  • model.enc_score_head.*
  • model.enc_bbox_head.layers.*

But the object-detection wrapper expects decoder-head names:

  • model.decoder.class_embed.*
  • model.decoder.bbox_embed.layers.*

GLM-OCR's original load path did not alias the checkpoint keys before constructing
PPDocLayoutV3ForObjectDetection, so the decoder heads were treated as missing and
initialized from scratch.

Symptoms

  • startup warnings about missing decoder head weights
  • degraded or unstable layout detection in self-hosted OCR runs
  • deprecation warning for PPDocLayoutV3ImageProcessorFast

Minimal fix direction

  1. load the PP-DocLayoutV3 config separately
  2. load model.safetensors directly
  3. alias:
    • model.enc_score_head.* -> model.decoder.class_embed.*
    • model.enc_bbox_head.layers.* -> model.decoder.bbox_embed.layers.*
  4. construct the model with the prepared state dict

Separately, switch from PPDocLayoutV3ImageProcessorFast to PPDocLayoutV3ImageProcessor
to remove the transformers 5.4.0 deprecation warning.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions