Skip to content

Commit 1251cbe

Browse files
chore: merge file_support information into docs on extracting_data
1 parent 2eba32f commit 1251cbe

2 files changed

Lines changed: 135 additions & 131 deletions

File tree

docs/10_extracting_data.md

Lines changed: 135 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The `get()` method allows you to extract information from the screen. You can us
77
- Get text or data from the screen
88
- Check the state of UI elements
99
- Make decisions based on screen content
10-
- Analyze static images and documents (see [11_file_support.md](11_file_support.md))
10+
- Analyze static images and documents
1111

1212
## Basic Usage
1313

@@ -32,6 +32,140 @@ button_count = agent.get("How many buttons are visible on this page?")
3232

3333
Instead of taking a screenshot, you can also analyze specific images or documents. Please refer to [11_file_support.md](11_file_support.md) for detailed instructions.
3434

35+
## File Support
36+
37+
### Overview
38+
The AskUI Python SDK supports the use of various file formats.
39+
40+
**Supported File Formats**
41+
- PDF Files (.pdf)
42+
- Excel Files (.xlsx, .xls)
43+
- Word Files (.docx, .doc)
44+
- CSV Files (.csv)
45+
46+
**Model Compatibility Matrix**
47+
48+
| File Format | AskUI Gemini | Anthropic Claude | Google Gemini
49+
| ------------------- | ------------ | ---------------- | ----------
50+
| PDF (.pdf) | ✅ | ❌ | ✅
51+
| Excel (.xlsx, .xls) | ✅ | ✅ | ✅
52+
| Word (.docx, .doc) | ✅ | ✅ | ✅
53+
54+
**General Limitations**
55+
- **Processing Model Restriction**: not all models support all document formats
56+
- **No Caching Mechanism**: All document files are re-processed on every `get()` call
57+
- **Performance Impact**: Multiple documents mean multiple processing operations per run
58+
59+
60+
### 📄 PDF Files (.pdf)
61+
62+
- **MIME Types**: `application/pdf`
63+
- **Maximum File Size**: 20MB
64+
- **Processing Method**: **Depends on Usage Context**
65+
66+
67+
**Processing Workflow for PDF Files:**
68+
```mermaid
69+
graph TD
70+
A[Call agent.get with PDF] --> B[Load as PdfSource]
71+
B --> C[Send directly as binary to Gemini]
72+
C --> D[Gemini processes content]
73+
D --> E[Return results directly]
74+
E --> F[No storage - process again for next call]
75+
```
76+
77+
**PDF-Specific Limitations**
78+
79+
- **20MB file size limit** for PDF files
80+
81+
### 📊 Excel Files (.xlsx, .xls)
82+
83+
- **MIME Types**:
84+
- `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (.xlsx)
85+
- `application/vnd.ms-excel` (.xls)
86+
87+
88+
```mermaid
89+
graph TD
90+
A[Call agent.get with Excel] --> B[Load as OfficeDocumentSource]
91+
B --> C[Convert to Markdown using markitdown - NO AI]
92+
C --> D[Process with Gemini]
93+
D --> E[Return results directly]
94+
E --> F[No storage - convert again for next call]
95+
```
96+
97+
**Features**:
98+
- Sheet names are preserved in the markdown output
99+
- Tables are converted to markdown table format
100+
- Optimized for LLM token usage
101+
- Deterministic conversion process (same input = same output)
102+
103+
**Excel-Specific Limitations**
104+
105+
- **file size limit** depending on the model that is used
106+
- Conversion quality depends on [`markitdown`](https://github.com/microsoft/markitdown) library capabilities
107+
- Complex formatting may be simplified during markdown conversion
108+
- Embedded objects (charts, complex tables) may not preserve all details
109+
- **No AI in conversion**: Conversion is deterministic and rule-based, not AI-powered
110+
111+
### 📝 Word Documents (.doc, .docx)
112+
113+
- **MIME Types**:
114+
- `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (.docx)
115+
- `application/msword` (.doc)
116+
**Processing Workflow for Word Documents:**
117+
118+
```mermaid
119+
graph TD
120+
A[Call agent.get with Word] --> B[Load as OfficeDocumentSource]
121+
B --> C[Convert to Markdown using markitdown - NO AI]
122+
C --> D[Process with Gemini]
123+
D --> E[Return results directly]
124+
E --> F[No storage - convert again for next call]
125+
```
126+
127+
**Features**:
128+
- Layout and formatting preserved as much as possible
129+
- Tables converted to HTML tables within markdown
130+
- Deterministic conversion process (same input = same output)
131+
132+
**Word-Specific Limitations**
133+
134+
- **file size limit** depending on the model that is used
135+
- Conversion quality depends on [`markitdown`](https://github.com/microsoft/markitdown) library capabilities
136+
- Complex formatting may be simplified during markdown conversion
137+
- Embedded objects (charts, complex tables) may not preserve all details
138+
- **No AI in conversion**: Conversion is deterministic and rule-based, not AI-powered
139+
140+
### 📈 CSV Files (.csv)
141+
142+
- **Status**: **Not directly supported**
143+
- CSV files are treated as regular text content by the LLM
144+
145+
**CSV-Specific Limitations**
146+
147+
- **No specialized CSV parsing**: No structure preservation
148+
- **Text-only processing**: Treated as regular text content by the LLM
149+
150+
151+
### Usage Examples
152+
153+
```python
154+
from askui import ComputerAgent
155+
156+
with ComputerAgent() as agent:
157+
# PDF
158+
result = agent.get("Summarize the main points", source="document.pdf")
159+
160+
# Excel
161+
result = agent.get("Extract the quarterly sales data", source="sales_report.xlsx")
162+
163+
# Word
164+
result = agent.get("Extract all action items", source="meeting_notes.docx")
165+
```
166+
167+
168+
35169
## Structured Data Extraction
36170

37171
### Overview

docs/11_file_support.md

Lines changed: 0 additions & 130 deletions
This file was deleted.

0 commit comments

Comments
 (0)