You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Convert images and scans to searchable and selectable (and merged) PDFs! The core logic resides in a Python script that you could run yourself, if you really wanted to. It extracts all the files from `todo`, transforms them with Tesseract via [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF), and loads them into `done`. Files in subfolders will be merged in alphabetical order, but will still be available individually.
15
+
Convert images and scans to searchable and selectable (and merged) PDFs! The core logic resides in a Python script that extracts all the files from `todo`, transforms them with Tesseract via [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF), and loads them into `done`.
16
+
17
+
> [!NOTE]
18
+
> Files in subfolders will be merged in alphabetical order, but will still be available individually.
17
19
18
20
I recommend you use either:
19
21
@@ -37,19 +39,19 @@ It's as easy as 1, 2, 3! Get up and going in no time with these options:
37
39
38
40
Are you on mobile or simply want an easy and seamless experience?
39
41
40
-
1.Open [Colab](https://colab.research.google.com/github/ipitio/ocr-pdf/blob/master/colab.ipynb) in your browser
41
-
2. Follow the instructions in the notebook
42
+
1.Run the [Colab](https://colab.research.google.com/github/ipitio/ocr-pdf/blob/master/colab.ipynb) cell in your browser
43
+
2. Follow the prompts to upload your files
42
44
3. Find the OCR'd files in your [Drive](https://drive.google.com/drive/my-drive)`/ocr-pdf`
43
45
44
-
To add OCRmyPDF options, append them to the `run` command in the code cell.
46
+
To add OCRmyPDF options, append them to the `run` command.
45
47
46
48
### Self-hosted: Prebuilt Docker Image
47
49
48
50
If you want to skip building an image, just use mine:
49
51
50
52
1. Install Docker, such as with Docker Desktop
51
53
2. Make a new `pdf` folder and put your files in `pdf/todo`
52
-
3. Run the following command from `pdf/..` to convert the files and move them into `pdf/done`
54
+
3. Run the following command from the parent of `pdf` to convert the files and move them into `pdf/done`
53
55
54
56
```bash
55
57
docker run --rm \
@@ -62,13 +64,13 @@ docker run --rm \
62
64
63
65
It's still easy as 1, 2, 3! You'll find the OCR'd files in `pdf/done`.
64
66
65
-
1.First (fork and) clone this repo
67
+
1.Fork and clone this repo
66
68
2.`cd` into it and put your files in `pdf/todo`
67
69
3. Complete one of the following:
68
70
69
71
### Cloud: GitHub Actions Workflow
70
72
71
-
If you made a fork and cloned it, Git is your best friend!
73
+
Enable Actions and push your files:
72
74
73
75
```bash
74
76
git add .
@@ -82,19 +84,19 @@ To add OCRmyPDF options, edit the command in the `predict.yml` file before commi
82
84
83
85
### Self-hosted
84
86
85
-
#### Build Docker Image
87
+
#### Docker Compose Service
86
88
87
-
If you aren't on Linux, or want to avoid polluting your system, use Docker Compose (which is included with Docker Desktop):
89
+
If you want to avoid polluting your system, use Docker Compose (which is included with Docker Desktop):
88
90
89
91
```bash
90
92
docker compose up
91
93
```
92
94
93
95
To add OCRmyPDF options, edit the command in the `compose.yml` file.
94
96
95
-
#### Use Bare Metal
97
+
#### Bash Install Script
96
98
97
-
Are you on Linux and want to make the most out of it?
" - The files in each zip will be merged in alphabetical order\n",
33
33
"- If you'd like to add any options for [OCRmyPDF](https://ocrmypdf.readthedocs.io/en/latest), append them to line 23 in the cell below\n",
34
34
"- The upload button will appear below the cell after running it\n",
35
-
"- At the end, you'll be offered a zip of the converted (and merged) files to download locally, whether or not Drive was connected\n",
35
+
"- Depending on your browser's settings, the resulting files will either be automatically downloaded or you will be prompted to save them, whether or not Drive was connected\n",
0 commit comments