Skip to content

Commit f770555

Browse files
fsecada01claude
andcommitted
Enhance documentation landing page with better UX
- Reorganize main guide page with progressive disclosure pattern - Add context-based navigation (tabs for different user paths) - Improve visual hierarchy with clear sections and formatting - Add emoji icons for visual scanning - Link to all guide pages prominently - Include quick-start code and CLI examples on landing page - Inspired by SQLModel-CRUD-Utilities documentation approach Improves user experience by guiding visitors based on their needs rather than forcing a linear read-through. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
1 parent 68ebc08 commit f770555

1 file changed

Lines changed: 122 additions & 24 deletions

File tree

TextSpitter/guide/__init__.py

Lines changed: 122 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,147 @@
11
"""
2-
# TextSpitter — User Guide
2+
# TextSpitter Documentation
33
4-
Welcome to the TextSpitter documentation.
5-
TextSpitter extracts plain text from documents and source-code files with a
6-
single call, normalising every input type (file path, `BytesIO`, `SpooledTemporaryFile`,
7-
raw `bytes`) into a `str`.
4+
## Welcome to TextSpitter
5+
6+
**Transforming documents into insights, effortlessly and efficiently.**
7+
8+
TextSpitter extracts plain text from documents and source-code files with a single call.
9+
It normalises every input type — file paths, `BytesIO` streams, `SpooledTemporaryFile` objects,
10+
and raw `bytes` — into plain strings, making it ideal for LLM pipelines, search engines,
11+
and data-processing workflows.
812
913
---
1014
11-
## Pages in this guide
15+
## 📚 Start Here
16+
17+
Choose your path based on what you want to do:
18+
19+
<details open>
20+
<summary><strong>⚡ I want to extract text right now</strong></summary>
21+
22+
Start with **[Quick Start](quickstart.html)** to install and run your first extraction in under 2 minutes.
23+
24+
```python
25+
from TextSpitter import TextSpitter
26+
27+
text = TextSpitter(filename="report.pdf")
28+
print(text[:500])
29+
```
30+
31+
</details>
32+
33+
<details>
34+
<summary><strong>🎯 I need to understand how TextSpitter works</strong></summary>
35+
36+
Read the **[Technical Overview](overview.html)** for architecture, module design, and implementation details.
37+
38+
Covers: three-layer design, input resolution, PDF fallback chains, encoding strategy, and logging.
39+
40+
</details>
41+
42+
<details>
43+
<summary><strong>🔍 I want to learn by example</strong></summary>
44+
45+
Follow the **[Tutorial](tutorial.html)** for a format-by-format walkthrough covering:
46+
- PDF extraction (with PyMuPDF + pypdf fallback)
47+
- DOCX extraction via FastAPI
48+
- TXT & CSV with encoding handling
49+
- Source code files (50+ extensions)
50+
- Direct `FileExtractor` and `WordLoader` usage
51+
52+
</details>
53+
54+
<details>
55+
<summary><strong>💼 I'm building a real application</strong></summary>
56+
57+
Check **[Common Use Cases](usecases.html)** for production patterns:
58+
- Web APIs (FastAPI, Django/DRF)
59+
- Cloud storage (AWS S3)
60+
- LLM pipelines (LangChain, OpenAI embeddings)
61+
- Batch processing (directory trees, parallel extraction)
62+
- Logging strategies
63+
64+
</details>
1265
13-
| Page | Description |
14-
|------|-------------|
15-
| `TextSpitter.guide.overview` | Architecture and design decisions |
16-
| `TextSpitter.guide.quickstart` | Install and run your first extraction |
17-
| `TextSpitter.guide.tutorial` | Format-by-format walkthrough |
18-
| `TextSpitter.guide.usecases` | FastAPI, S3, LangChain, batch processing … |
19-
| `TextSpitter.guide.recipes` | Copy-paste snippets |
66+
<details>
67+
<summary><strong>📋 I need a code snippet</strong></summary>
68+
69+
Browse **[Recipes](recipes.html)** for copy-paste snippets covering:
70+
- Input handling (BytesIO, SpooledTemporaryFile, raw bytes)
71+
- Format-specific extraction
72+
- Error and encoding handling
73+
- Testing patterns
74+
75+
</details>
2076
2177
---
2278
23-
## Supported formats
79+
## Supported Formats
2480
25-
| Format | Reader | Notes |
81+
| Format | Method | Notes |
2682
|--------|--------|-------|
27-
| PDF | `pdf_file_read` | PyMuPDF → pypdf fallback |
28-
| DOCX | `docx_file_read` | python-docx paragraph extraction |
29-
| TXT | `text_file_read` | UTF-8 → latin-1 → UTF-8-replace |
30-
| CSV | `csv_file_read` | Same encoding cascade as TXT |
31-
| Source code | `code_file_read` | 50 + extensions |
83+
| **PDF** | `pdf_file_read()` | PyMuPDF → pypdf fallback |
84+
| **DOCX** | `docx_file_read()` | python-docx paragraph extraction |
85+
| **TXT** | `text_file_read()` | UTF-8 → latin-1 → UTF-8-replace |
86+
| **CSV** | `csv_file_read()` | Same encoding cascade as TXT |
87+
| **Source code** | `code_file_read()` | 50+ extensions (py, js, ts, go, rs, java, …) |
3288
3389
---
3490
35-
## Quick example
91+
## 🚀 Quick Start
92+
93+
### Install
94+
95+
```sh
96+
pip install textspitter
97+
98+
# With optional loguru logging
99+
pip install "textspitter[logging]"
100+
```
101+
102+
### Extract
36103
37104
```python
38105
from TextSpitter import TextSpitter
39106
107+
# From a file
40108
text = TextSpitter(filename="report.pdf")
41-
print(text[:200])
109+
110+
# From a stream
111+
from io import BytesIO
112+
text = TextSpitter(file_obj=BytesIO(pdf_bytes), filename="report.pdf")
113+
114+
# From raw bytes
115+
text = TextSpitter(file_obj=docx_bytes, filename="contract.docx")
42116
```
43117
44-
Install with optional loguru logging:
118+
### CLI
45119
46120
```sh
47-
pip install "textspitter[logging]"
121+
# Single file to stdout
122+
textspitter report.pdf
123+
124+
# Multiple files to combined output
125+
textspitter chapter1.pdf chapter2.pdf -o book.txt
48126
```
127+
128+
---
129+
130+
## 🔗 Navigation
131+
132+
| Page | Purpose | Best for |
133+
|------|---------|----------|
134+
| [Overview](overview.html) | Architecture & design | Understanding the internals |
135+
| [Quick Start](quickstart.html) | Installation & first extraction | Getting started fast |
136+
| [Tutorial](tutorial.html) | Format-by-format guide | Learning by example |
137+
| [Use Cases](usecases.html) | Production patterns | Building real applications |
138+
| [Recipes](recipes.html) | Code snippets | Copy-paste solutions |
139+
140+
---
141+
142+
## 📖 Full API Reference
143+
144+
For complete API documentation including class definitions, method signatures, and parameters,
145+
see the **TextSpitter module reference** in the sidebar.
146+
49147
"""

0 commit comments

Comments
 (0)