Skip to content

segmentation: tesseract5.3.0 vs ocrd/all:2022-08-15 #346

@jbarth-ubhd

Description

@jbarth-ubhd

I'm just wondering a bit about different recognition results using tesseract5.3.0 and OCR-D with ocrd-olena-binarize && ocrd-tesserocr-segment.

Original TIF: https://digi.ub.uni-heidelberg.de/diglitData/v/heidelberg1592_-_04manual.tif

Result using tesseract5.3.0 -l Fraktur_GT4Hist... (right column = ground truth)
image

and using tesserocr-segment and calamari-recognize (fraktur_historical1.0) with OCR-D:
image

and using tesserocr-segment and tesserocr-recognize (Fraktur_GT4Hist...) with OCR-D:
image

It seems that OCR-D-"tesserocr" segmentation is somewhat different to OCR-D segmentation (perhaps because olena-binarize?), but I can't find a big change in line/region/segmentation etc. in the tesseract changelog the last year.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions