Skip to content

Commit b6f6f10

Browse files
committed
feat(pdf-reader): add skill for PDF text extraction fallback
Submitted by: https://github.com/divitkashyap ## What Added — a skill that automatically detects when an agent cannot read PDFs and provides text extraction using command-line tools with optional installation and user confirmation. ## Why Many AI agents lack native PDF reading capability. When they encounter a PDF, they either: - Fail to help the user - Give generic responses about not being able to access PDF content This skill intercepts that situation and provides a complete fallback workflow using standard tools (pdftotext, pdfplumber, pymupdf). ## How It Works 1. **Detection**: Monitors for agent statements like 'I cannot read PDFs', 'I don't have the ability to read PDFs', etc. 2. **Tool Detection**: Checks for available tools in priority order: pdftotext → pdfplumber → pymupdf 3. **Installation**: If no tool found, asks user permission with platform-specific install commands 4. **Extraction**: Extracts PDF text to /tmp/pdf_extracted.txt 5. **Continuation**: Reads extracted text and proceeds with original user task ## Tool Priority 1. **pdftotext** (poppler-utils) — Preferred, fastest, system-level tool 2. **pdfplumber** (Python) — Fallback if poppler not available 3. **pymupdf** (Python) — Alternative Python fallback ## Platform Support - **macOS**: Homebrew (brew install poppler) or pip - **Linux (Ubuntu/Debian)**: apt-get install poppler-utils or pip - **Linux (Fedora/RHEL)**: dnf install poppler-utils or pip - **Windows**: winget/chocolatey or pip ## Key Features - Automatic detection of agent PDF limitation - Multi-tool fallback strategy - User confirmation before installation - Platform-specific installation commands - Layout preservation option (-layout flag) - Page range extraction support (-f, -l flags) - Error handling for encrypted/protected PDFs ## Example Triggers - 'I cannot read PDFs' - 'I don't have the ability to read PDFs' - 'I can't access PDF content' - 'PDF reading is not supported' ## Files - skills/pdf-reader/SKILL.md — Complete skill with workflow - README.md — Updated with new skill entry ## Validation All 15 skills pass: python .claude/skills/pr-review/scripts/validate_skills.py ✅
1 parent 1391b63 commit b6f6f10

2 files changed

Lines changed: 207 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Development skills for AI coding agents. Plug into your favorite AI coding tool
2323
| `minimax-xlsx` | Open, create, read, analyze, edit, or validate Excel/spreadsheet files (.xlsx, .xlsm, .csv, .tsv). Covers creating new xlsx from scratch via XML templates, reading and analyzing with pandas, editing existing files with zero format loss, formula recalculation, validation, and professional financial formatting. | Official |
2424
| `minimax-docx` | Professional DOCX document creation, editing, and formatting using OpenXML SDK (.NET). Three pipelines: create new documents from scratch, fill/edit content in existing documents, or apply template formatting with XSD validation gate-check. | Official |
2525
| `vision-analysis` | Analyze, describe, and extract information from images using vision AI models. Supports describe, OCR, UI mockup review, chart data extraction, and object detection. Powered by MiniMax VL API with OpenAI GPT-4V fallback. | Community |
26+
| `pdf-reader` | Automatically detect when an agent cannot read PDFs and provide text extraction fallback using command-line tools (pdftotext/poppler-utils) with automatic detection, optional installation with user confirmation, and multi-platform support (macOS, Linux, Windows). | Community |
2627
| `minimax-multimodal-toolkit` | Generate voice, music, video, and image content via MiniMax APIs — the unified entry for MiniMax multimodal use cases. Covers TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract) via FFmpeg. | Official |
2728

2829
## Installation

skills/pdf-reader/SKILL.md

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
---
2+
name: pdf-reader
3+
description: >
4+
Automatically handle PDF text extraction when an agent lacks native PDF reading capability.
5+
Use when: an agent responds with phrases like "I cannot read PDFs", "I don't have the ability to
6+
read PDFs", "I can't access PDF content", "PDF reading is not supported", or similar statements
7+
indicating the agent cannot process PDF files directly.
8+
This skill intercepts that situation and provides a fallback workflow to extract PDF text using
9+
command-line tools (pdftotext/poppler-utils), with automatic detection and optional installation.
10+
Triggers: any message where the agent states it cannot read PDFs or lacks PDF capability.
11+
license: MIT
12+
metadata:
13+
version: "1.0"
14+
category: document-processing
15+
sources:
16+
- poppler-utils (pdftotext)
17+
- pdfplumber (Python alternative)
18+
submitted_by: https://github.com/divitkashyap
19+
---
20+
21+
# PDF Reader Skill
22+
23+
Automatically detect when an agent cannot read PDFs and provide text extraction fallback using command-line tools with user confirmation for installation.
24+
25+
## Workflow
26+
27+
### Step 1: Detect PDF Reading Limitation
28+
29+
When the agent states it cannot read PDFs (phrases like "I cannot read PDFs", "I don't have the ability to read PDFs", etc.), activate this skill automatically.
30+
31+
### Step 2: Identify the Target PDF
32+
33+
Extract the PDF file path from the user's original request. Confirm the file exists:
34+
35+
```bash
36+
ls -la "/path/to/document.pdf"
37+
```
38+
39+
### Step 3: Check for Available PDF Tools
40+
41+
Check which PDF text extraction tools are available on the system:
42+
43+
```bash
44+
# Check for pdftotext (poppler-utils)
45+
which pdftotext || echo "NOT_FOUND"
46+
47+
# Check for pdfplumber (Python)
48+
python3 -c "import pdfplumber; print('FOUND')" 2>/dev/null || echo "NOT_FOUND"
49+
50+
# Check for pymupdf
51+
python3 -c "import fitz; print('FOUND')" 2>/dev/null || echo "NOT_FOUND"
52+
```
53+
54+
### Step 4: Tool Selection Priority
55+
56+
Select the best available tool in this order:
57+
58+
1. **`pdftotext`** (poppler-utils) - Preferred, fastest, system-level tool
59+
2. **`pdfplumber`** (Python) - Fallback if poppler not available
60+
3. **`pymupdf`** (Python) - Alternative Python fallback
61+
62+
### Step 5: Installation (If Needed)
63+
64+
If no tool is found, ask the user for permission to install:
65+
66+
```
67+
I need to install a PDF text extraction tool to read this PDF.
68+
69+
Available options:
70+
1. pdftotext (poppler-utils) - Fast, system-level tool [Recommended]
71+
2. pdfplumber - Python library alternative
72+
73+
Shall I proceed with installation? (y/n)
74+
```
75+
76+
**Installation commands by platform:**
77+
78+
**macOS:**
79+
```bash
80+
brew install poppler # Installs pdftotext
81+
# OR
82+
pip3 install pdfplumber
83+
```
84+
85+
**Linux (Ubuntu/Debian):**
86+
```bash
87+
sudo apt-get install poppler-utils
88+
# OR
89+
pip3 install pdfplumber
90+
```
91+
92+
**Linux (Fedora/RHEL):**
93+
```bash
94+
sudo dnf install poppler-utils
95+
# OR
96+
pip3 install pdfplumber
97+
```
98+
99+
**Windows:**
100+
```powershell
101+
# Use winget
102+
winget install pdftotext
103+
# OR
104+
pip install pdfplumber
105+
```
106+
107+
### Step 6: Extract PDF Text
108+
109+
Once a tool is available, extract text from the PDF:
110+
111+
**Using pdftotext:**
112+
```bash
113+
pdftotext -layout "/path/to/document.pdf" /tmp/pdf_extracted.txt
114+
```
115+
116+
**Using pdfplumber (Python):**
117+
```python
118+
import pdfplumber
119+
120+
with pdfplumber.open("/path/to/document.pdf") as pdf:
121+
text = ""
122+
for page in pdf.pages:
123+
page_text = page.extract_text()
124+
if page_text:
125+
text += page_text + "\n\n"
126+
127+
with open("/tmp/pdf_extracted.txt", "w") as f:
128+
f.write(text)
129+
```
130+
131+
**Using pymupdf (Python):**
132+
```python
133+
import fitz
134+
135+
doc = fitz.open("/path/to/document.pdf")
136+
text = ""
137+
for page in doc:
138+
text += page.get_text() + "\n\n"
139+
doc.close()
140+
141+
with open("/tmp/pdf_extracted.txt", "w") as f:
142+
f.write(text)
143+
```
144+
145+
### Step 7: Read Extracted Text
146+
147+
Read the extracted text file and present it to the user:
148+
149+
```bash
150+
cat /tmp/pdf_extracted.txt
151+
```
152+
153+
### Step 8: Continue Original Task
154+
155+
After extracting and presenting the PDF content, proceed with the user's original request using the extracted text as context.
156+
157+
## Platform-Specific Notes
158+
159+
### macOS
160+
161+
- poppler-utils can be installed via Homebrew: `brew install poppler`
162+
- Python libraries work with system Python3 or pyenv
163+
164+
### Linux
165+
166+
- Most distributions have poppler-utils in their package managers
167+
- pdfplumber/pymupdf require pip installation
168+
169+
### Windows
170+
171+
- poppler binaries available from official poppler releases or via winget/chocolatey
172+
- Python libraries recommended for Windows: `pip install pdfplumber`
173+
174+
## Common Errors and Solutions
175+
176+
| Error | Cause | Solution |
177+
|-------|-------|----------|
178+
| `pdftotext: command not found` | poppler-utils not installed | Install via package manager or use Python alternative |
179+
| `Permission denied` | Output directory not writable | Use `/tmp/` for output |
180+
| `File not found` | Wrong PDF path | Verify path with `ls -la` |
181+
| `PDF extraction failed` | Encrypted/protected PDF | Inform user and suggest manual extraction |
182+
| `pdftotext: syntax error` | Malformed PDF | Try with `-raw` flag instead of `-layout` |
183+
184+
## Alternative Flags for pdftotext
185+
186+
```bash
187+
# Basic extraction
188+
pdftotext input.pdf output.txt
189+
190+
# Preserve layout (default)
191+
pdftotext -layout input.pdf output.txt
192+
193+
# Simple extraction (no layout)
194+
pdftotext -raw input.pdf output.txt
195+
196+
# Extract specific pages
197+
pdftotext -f 1 -l 5 input.pdf output.txt
198+
199+
# Extract to stdout
200+
pdftotext - # Reads from stdin
201+
```
202+
203+
## File Size Limits
204+
205+
- For PDFs larger than 50MB, extract page ranges instead of entire document
206+
- Use `-f` and `-l` flags to process in chunks if needed

0 commit comments

Comments
 (0)