You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
REM ~ ###################################################################################################################
4
+
REM ~ Searchable Image PDF Creat-O-Mat
5
+
SETVERSION=1.2
6
+
REM ~ This script creates a searchable PDF out of a PDF with one or more scanned pages. It is possible to drag and drop one or multiple PDF files onto this batch file to start the process.
7
+
REM ~ But you can use the command line (<script name> [pdf filename #1] [pdf filename #2] ... [pdf filename #n]) too.
8
+
REM ~
9
+
REM ~ Author: TB / License: MIT / https://github.com/timberger/Searchable-Image-PDF-Creat-O-Mat/
10
+
REM ~
11
+
REM ~ Prerequisites:
12
+
REM ~ ImageMagick (7.0.8-27 and newer) https://imagemagick.org/ | License: https://imagemagick.org/script/license.php
13
+
REM ~ Ghostscript (9.x) https://www.ghostscript.com/
14
+
REM ~ Tesseract (4.0 and newer) https://github.com/tesseract-ocr/tesseract/wiki | http://www.apache.org/licenses/LICENSE-2.0
15
+
REM ~ OS: Microsoft Windows 7 (with PowerShell); 8; 8.1
16
+
REM ~
17
+
REM ~ Preferences:
18
+
REM ~ (leave no whitespace between the foldername and the '=' / do not use "):
REM ~ SRCLANG shall contain the abbreviations of the installed Tesseract languages which shall be searched for in the scanned files [default: eng]. Multiple languages e.g.: deu+eng - see https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
23
+
SETSRCLANG=deu
24
+
REM ~ The scanned page can be deskewed before it is processed with Tesseract or not [default: true / alternative: false]. It is recommended to deskew the sanned page because it increases the success rate of the OCR software. But it will take more time.
25
+
SETDESKEW=true
26
+
REM ~ RESULTFOLDER is the folder where the searchable PDF will be stored (%CD% is the directory which contains this script) [default: %CD%\results]
27
+
SETRESULTFOLDER=%CD%\searchable_PDF
28
+
REM ~ TMPFOLDER is the folder where the extracted image files will be stored temporaly (the folder will be created and removed automatically during each run) [default: %CD%\temp]
29
+
SETTMPFOLDER=%CD%\temp
30
+
REM ~ After Imagemagick and Tesseract have created the new PDF file it has usually a bigger file size. But it can be re-packed with Ghostscript which compresses the image file to a certain resolution e.g. screen (72dpi), ebook (150dpi), printer(300dpi), prepress(300dpi+colorpreserving)
31
+
SETREPACKPROFILE=printer
32
+
REM ~ ###################################################################################################################
33
+
34
+
REM ~ clear the screen (/ the command line window)
35
+
CLS
36
+
ECHOOFF
37
+
38
+
REM ~ starting the stop watch
39
+
SETStartPosition=%time:~0,8%
40
+
41
+
REM ~ command line window candy: blue background color / white font color (not in Windows 10)
42
+
COLOR 1F
43
+
44
+
ECHO ### Searchable Image PDF Creat-O-Mat %VERSION% ###
45
+
46
+
REM ~ Checking the preferences
47
+
REM ~ Does the ImageMagick location exist?
48
+
IFNOTEXIST"%IMAGEMAGIC%" (
49
+
ECHO The ImageMagick location seems to be wrong. Please check the preferences.
50
+
GOTO :SCRIPTEND
51
+
)
52
+
REM ~ Does the ImageMagick location exist?
53
+
IFNOTEXIST"%GHOSTSCRIPT%" (
54
+
ECHO The Ghostscript location seems to be wrong. Please check the preferences.
55
+
GOTO :SCRIPTEND
56
+
)
57
+
REM ~ Does the Tesseract location exist?
58
+
IFNOTEXIST"%TESSERACT%" (
59
+
ECHO The Tesseract location seems to be wrong. Please check the preferences.
60
+
GOTO :SCRIPTEND
61
+
)
62
+
REM ~ Is the Tesseract langauge package abbrevation of the correct pattern?
REM ~ `SHIFT` fills '%1' with the content of the second argument (`%2`), %2 with the content of third argument (`%3`) and so on
151
+
SHIFT
152
+
153
+
REM ~ IF the AMOUNT_OF_FILES dragged onto this .bat is smaller or equal to the total amount of file/arguments AND the next argument is not empty string THEN repeat the last step again. (Otherwise continue to the end of the script.)
154
+
IF%AMOUNT_OF_FILES%LEQ%ARGCOUNT%IFNOT"%~1"=="" (
155
+
GOTO :LOOP
156
+
)
157
+
:LOOPEND
158
+
159
+
REM ~ remove the temp folder
160
+
RMDIR"%TMPFOLDER%"
161
+
162
+
REM ~ setting the colors back to default
163
+
COLOR
164
+
165
+
REM ~ determining the duration (with the help of https://stackoverflow.com/questions/42603119/arithmetic-operations-with-hhmmss-times-in-batch-file/42603985#42603985)
166
+
SETEndPosition=%time:~0,8%
167
+
SET /A "ss=(((1%EndPosition::=-100)*60+1%-100)-(((1%StartPosition::=-100)*60+1%-100)"
168
+
SET /A "hh=ss/3600+100,ss%%=3600,mm=ss/60+100,ss=ss%%60+100"
0 commit comments