PP-DocLayoutV3 checkpoint/load-path mismatch under transformers 5.4.0

# PP-DocLayoutV3 checkpoint/load-path mismatch in GLM-OCR

## Summary

GLM-OCR loads PP-DocLayoutV3 through `PPDocLayoutV3ForObjectDetection.from_pretrained(...)`.
With `transformers==5.4.0`, that load path reports the decoder detection heads as missing,
even though the published `PaddlePaddle/PP-DocLayoutV3_safetensors` checkpoint contains the
corresponding trained head weights under `enc_*` names.

This leads to newly initialized decoder heads instead of loading the tied trained weights.

## Evidence

Runtime startup reported missing decoder head keys such as:

```text
model.decoder.class_embed.weight
model.decoder.class_embed.bias
model.decoder.bbox_embed.layers.0.weight
model.decoder.bbox_embed.layers.0.bias
...
```

Checkpoint inspection showed these relevant keys:

```text
model.denoising_class_embed.weight
model.enc_score_head.weight
model.enc_score_head.bias
model.enc_bbox_head.layers.0.weight
model.enc_bbox_head.layers.0.bias
model.enc_bbox_head.layers.1.weight
model.enc_bbox_head.layers.1.bias
model.enc_bbox_head.layers.2.weight
model.enc_bbox_head.layers.2.bias
```

But the model class expects:

```text
model.denoising_class_embed.weight
model.decoder.class_embed.weight
model.decoder.class_embed.bias
model.decoder.bbox_embed.layers.0.weight
model.decoder.bbox_embed.layers.0.bias
model.decoder.bbox_embed.layers.1.weight
model.decoder.bbox_embed.layers.1.bias
model.decoder.bbox_embed.layers.2.weight
model.decoder.bbox_embed.layers.2.bias
```

The installed `transformers` implementation also declares these tied mappings:

```python
_tied_weights_keys = {
    "decoder.class_embed": "enc_score_head",
    "decoder.bbox_embed": "enc_bbox_head",
}
```

## Root cause

The checkpoint stores the trained detection heads under encoder-head names:

- `model.enc_score_head.*`
- `model.enc_bbox_head.layers.*`

But the object-detection wrapper expects decoder-head names:

- `model.decoder.class_embed.*`
- `model.decoder.bbox_embed.layers.*`

GLM-OCR's original load path did not alias the checkpoint keys before constructing
`PPDocLayoutV3ForObjectDetection`, so the decoder heads were treated as missing and
initialized from scratch.

## Symptoms

- startup warnings about missing decoder head weights
- degraded or unstable layout detection in self-hosted OCR runs
- deprecation warning for `PPDocLayoutV3ImageProcessorFast`

## Minimal fix direction

1. load the PP-DocLayoutV3 config separately
2. load `model.safetensors` directly
3. alias:
   - `model.enc_score_head.*` -> `model.decoder.class_embed.*`
   - `model.enc_bbox_head.layers.*` -> `model.decoder.bbox_embed.layers.*`
4. construct the model with the prepared state dict

Separately, switch from `PPDocLayoutV3ImageProcessorFast` to `PPDocLayoutV3ImageProcessor`
to remove the `transformers 5.4.0` deprecation warning.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PP-DocLayoutV3 checkpoint/load-path mismatch under transformers 5.4.0 #179

PP-DocLayoutV3 checkpoint/load-path mismatch in GLM-OCR

Summary

Evidence

Root cause

Symptoms

Minimal fix direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PP-DocLayoutV3 checkpoint/load-path mismatch under transformers 5.4.0 #179

Description

PP-DocLayoutV3 checkpoint/load-path mismatch in GLM-OCR

Summary

Evidence

Root cause

Symptoms

Minimal fix direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions