Skip to content

Commit add5925

Browse files
authored
Align markdown command to parse command (#58)
1 parent 64b32d3 commit add5925

3 files changed

Lines changed: 444 additions & 170 deletions

File tree

docs/tutorials/using_cli.md

Lines changed: 50 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The Parxy CLI lets you:
1212
|------------------|-------------------------------------------------------------------------------------------------------------|
1313
| `parxy parse` | Extract text content from documents with customizable detail levels and output formats. Process files or folders with multiple drivers. |
1414
| `parxy preview` | Interactive document viewer with metadata, table of contents, and scrollable content preview |
15-
| `parxy markdown` | Convert parsed documents into Markdown format (optionally combine multiple files) |
15+
| `parxy markdown` | Convert documents to Markdown files, with support for multiple drivers and folder processing |
1616
| `parxy pdf:merge`| Merge multiple PDF files into one, with support for page ranges |
1717
| `parxy pdf:split`| Split a PDF file into individual pages |
1818
| `parxy drivers` | List available document processing drivers |
@@ -176,27 +176,68 @@ This is ideal for quick document inspection before running a full parsing operat
176176

177177
## Converting to Markdown
178178

179-
The `markdown` command converts parsed documents into Markdown format, preserving structure such as headings and lists.
179+
The `markdown` command converts documents to Markdown format, preserving structure such as headings and lists. It follows the same conventions as the `parse` command: output files are prefixed with the driver name and saved next to the source file by default.
180+
181+
### Basic Usage
180182

181183
```bash
182184
parxy markdown document.pdf
183185
```
184186

185-
Output is printed to the console by default. To save Markdown files to disk:
187+
This creates a `pymupdf-document.md` file in the same directory as the source file.
188+
189+
### Processing Multiple Files and Folders
186190

187191
```bash
188-
parxy markdown -o output/ document1.pdf document2.pdf
192+
# Parse multiple files
193+
parxy markdown doc1.pdf doc2.pdf doc3.pdf
194+
195+
# Parse all PDFs in a folder (non-recursive by default)
196+
parxy markdown /path/to/folder
197+
198+
# Parse recursively
199+
parxy markdown /path/to/folder --recursive
200+
201+
# Limit recursion depth
202+
parxy markdown /path/to/folder --recursive --max-depth 2
189203
```
190204

191-
Each document will be saved as a `.md` file.
205+
### Output Directory
206+
207+
```bash
208+
parxy markdown document.pdf -o output/
209+
```
192210

193-
To combine multiple documents into a single Markdown file:
211+
### Using Multiple Drivers
212+
213+
Run the same documents through multiple drivers for comparison:
214+
215+
```bash
216+
parxy markdown document.pdf -d pymupdf -d llamaparse
217+
```
218+
219+
This produces `pymupdf-document.md` and `llamaparse-document.md`.
220+
221+
### Inline Output
222+
223+
Use `--inline` with a single file to print markdown directly to stdout with a YAML frontmatter header — useful for shell pipelines:
194224

195225
```bash
196-
parxy markdown --combine -o output/ doc1.pdf doc2.pdf doc3.pdf
226+
parxy markdown document.pdf --inline
227+
parxy markdown document.pdf --inline | your-tool
197228
```
198229

199-
This will generate a file named `combined_output.md` in the output directory.
230+
Output format:
231+
232+
```markdown
233+
---
234+
file: "document.pdf"
235+
pages: 10
236+
---
237+
238+
# Document heading
239+
...
240+
```
200241

201242

202243
## Manipulating PDFs
@@ -317,7 +358,7 @@ With the CLI, you can use Parxy as a **standalone document parsing tool** — id
317358
|------------------|--------------------------------------------------------------|
318359
| `parxy parse` | Extract text from documents with multiple formats & drivers |
319360
| `parxy preview` | Interactive document viewer with metadata and TOC |
320-
| `parxy markdown` | Generate Markdown output |
361+
| `parxy markdown` | Generate Markdown files with driver prefix naming |
321362
| `parxy pdf:merge`| Merge multiple PDF files with page range support |
322363
| `parxy pdf:split`| Split PDF files into individual pages |
323364
| `parxy drivers` | List supported drivers |

0 commit comments

Comments
 (0)