You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Include exception when batch entry fails
* Clarify normalization when multiple drivers included in single BatchTask
* Circuit breaker for authentication and quota exceptions
* Parse command use batch mode, remove sync execution
* Always prefix file outputs when using parse command
---------
Co-authored-by: avvertix <5672748+avvertix@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/howto/batch_processing.md
+34Lines changed: 34 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -101,6 +101,40 @@ When `stop_on_error=True`:
101
101
- Only completed results (including the failed one) are returned
102
102
103
103
104
+
## Circuit Breaker
105
+
106
+
Batch processing includes a built-in circuit breaker that detects systemic driver failures and short-circuits remaining tasks for the affected driver. This prevents wasting API calls and time when a driver is guaranteed to fail (e.g., invalid API key, exhausted quota).
107
+
108
+
The circuit breaker trips immediately (after a single failure) for these exception types:
109
+
110
+
| Exception | Meaning |
111
+
|---|---|
112
+
|`AuthenticationException`| API key or token is invalid |
113
+
|`QuotaExceededException`| Account balance or credits exhausted |
114
+
|`RateLimitException`| Rate limit hit |
115
+
116
+
Per-file errors like `FileNotFoundException` or `ParsingException` do **not** trip the circuit, since they are specific to individual files and don't indicate a driver-wide problem.
117
+
118
+
The circuit breaker is **per-driver**: if LlamaParse fails with an authentication error, PyMuPDF tasks continue unaffected. Short-circuited results carry the original tripping exception in `BatchResult.exception` and `BatchResult.error`.
119
+
120
+
A new circuit breaker is created for each `batch()` / `batch_iter()` call, so previous failures do not carry over between calls.
121
+
122
+
```python
123
+
results = Parxy.batch(
124
+
tasks=['doc1.pdf', 'doc2.pdf', 'doc3.pdf'],
125
+
drivers=['llamaparse', 'pymupdf'],
126
+
)
127
+
128
+
for result in results:
129
+
if result.failed:
130
+
# If llamaparse auth fails on doc1, doc2 and doc3 are
131
+
# short-circuited immediately, i.e. no additional API calls.
Copy file name to clipboardExpand all lines: docs/tutorials/using_cli.md
+5-15Lines changed: 5 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,13 +26,13 @@ The `parse` command is a powerful tool for extracting text from documents with e
26
26
27
27
### Basic Usage
28
28
29
-
Parse a single document using the default settings (PyMuPDF driver, markdown output):
29
+
Parse a single document using the default settings (PyMuPDF driver, json output):
30
30
31
31
```bash
32
32
parxy parse document.pdf
33
33
```
34
34
35
-
This creates a `document.md` file in the same directory as the source file.
35
+
This creates a `pymupdf-document.json` file in the same directory as the source file. Parxy always prefix the output file with the driver name.
36
36
37
37
### Processing Multiple Files and Folders
38
38
@@ -103,29 +103,19 @@ Specify a driver with the `--driver` (`-d`) option:
103
103
104
104
```bash
105
105
parxy parse --driver llamaparse document.pdf
106
+
# output will be saved as llamaparse-document.json
106
107
```
107
108
108
109
### Using Multiple Drivers for Comparison
109
110
110
-
Parse the same document(s) with multiple drivers by specifying `--driver` multiple times:
111
+
Parse the same document(s) with multiple drivers by specifying `--driver`(or `-d` for short) multiple times:
111
112
112
113
```bash
113
114
parxy parse document.pdf -d pymupdf -d llamaparse
114
115
```
115
116
116
-
When using multiple drivers, Parxy automatically appends the driver name to the output filenames:
117
-
-`document_pymupdf.md`
118
-
-`document_llamaparse.md`
117
+
When using multiple drivers, Parxy always prepend the driver name to the output filenames, e.g. `pymupdf-document.json`, `llamaparse-document.json`. This is particularly useful for comparing extraction quality across different parsers.
119
118
120
-
This is particularly useful for comparing extraction quality across different parsers.
121
-
122
-
### Showing Output in Console
123
-
124
-
By default, output is only saved to files. To also display content in the console, use the `--show` (`-s`) flag:
0 commit comments