Skip to content

frak models in ocrd resmgr #404

@jbarth-ubhd

Description

@jbarth-ubhd

I've compared these frak models:

ocrd: https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021/tessdata_best/frak2021-0.905.traineddata from ocrd resmgr

ubma: https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021/tessdata_fast/frak2021_1.069.traineddata from https://ocr-bw.bib.uni-mannheim.de/faq/

size & md5sum:

-rw-rw-r-- 1 jb jb 3421140 Mär 27  2021 ocrd--frak2021-0.905.traineddata
234e8bb819042f615576bd01aa2419fd  ocrd--frak2021-0.905.traineddata
-rw-rw-r-- 1 jb jb 5060763 Dez  9  2021 ubma--frak2021_1.069.traineddata
9405b1603db21cb066e4e7614a405dd4  ubma--frak2021_1.069.traineddata

content after combine_tessdata -u x.traineddata aa :

jb@nuc:~/models$ LC_ALL=C ls -lh ocrd ubma
ocrd:
total 3.3M
-rw-rw-r-- 1 jb jb 3.3M Dec 21 12:18 aa.lstm
-rw-rw-r-- 1 jb jb 2.8K Dec 21 12:18 aa.lstm-recoder
-rw-rw-r-- 1 jb jb  22K Dec 21 12:18 aa.lstm-unicharset
-rw-rw-r-- 1 jb jb   30 Dec 21 12:18 aa.version
-rw-rw-r-- 1 jb jb  345 Dec 21 12:18 extr.log

ubma:
total 4.9M
-rw-rw-r-- 1 jb jb 432K Dec 21 12:18 aa.lstm
-rw-rw-r-- 1 jb jb 6.3K Dec 21 12:18 aa.lstm-number-dawg
-rw-rw-r-- 1 jb jb 4.5K Dec 21 12:18 aa.lstm-punc-dawg
-rw-rw-r-- 1 jb jb 2.8K Dec 21 12:18 aa.lstm-recoder
-rw-rw-r-- 1 jb jb  22K Dec 21 12:18 aa.lstm-unicharset
-rw-rw-r-- 1 jb jb 4.4M Dec 21 12:18 aa.lstm-word-dawg
-rw-rw-r-- 1 jb jb   30 Dec 21 12:18 aa.version
-rw-rw-r-- 1 jb jb  553 Dec 21 12:18 extr.log

ubma is with .lstm-word-dawg, ocrd is without.

ocrd is 3.3M lstm size, ubma is 432k lstm size.

shouldn't ocrd use the ubma file for fraktur/gothic?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions