Skip to content

Commit 405266a

Browse files
fsecada01claude
andcommitted
Implement SQLModel-CRUD-Utilities documentation approach
Replicate the superior documentation strategy from SQLModel-CRUD-Utilities: **New approach:** - Custom HTML landing page with professional styling (hero section, feature cards, FAQ, quick-start examples) - Build script (docs/make.py) that preserves custom HTML while generating API reference with pdoc - Separation of concerns: custom pages (user guide) vs auto-generated API reference - Full control over visual hierarchy and UX **Key differences from previous approach:** - Landing page has professional gradient design, feature cards, and interactive elements - Progressive disclosure pattern guides users by use case - Clear navigation between learning resources and API reference - Better responsive design for mobile devices - Manual control over styling and layout **Configuration changes:** - Exclude docs/ from pre-commit linting (generated files) - Updated CI workflow to use docs/make.py build script **Structure:** - docs/index.html — Custom landing page with hero, features, FAQ - docs/make.py — Build script that backs up custom HTML, runs pdoc, restores custom files - Generated TextSpitter.html — Auto-generated API reference (via pdoc) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
1 parent f770555 commit 405266a

6 files changed

Lines changed: 569 additions & 164 deletions

File tree

.claude/settings.local.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@
3030
"Bash(git -C:*)",
3131
"Bash(uv:*)",
3232
"Bash(.venv/Scripts/ty.exe check:*)",
33-
"Bash(\".venv/Scripts/ty.exe\" --version)"
33+
"Bash(\".venv/Scripts/ty.exe\" --version)",
34+
"WebFetch(domain:fsecada01.github.io)"
3435
]
3536
}
3637
}

.github/workflows/docs.yml

Lines changed: 5 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -30,28 +30,17 @@ jobs:
3030
run: uv python install 3.12
3131

3232
- name: Install dependencies
33-
# Editable install ensures TextSpitter.guide is importable by pdoc
3433
run: uv sync --all-extras --dev
3534

36-
# Clone the syn theme. Each theme dir (dracula, onedark, rust) contains
37-
# custom.css, theme.css, and syntax-highlighting.css that override pdoc's
38-
# defaults. The svelte theme is WIP upstream; switch when released.
39-
- name: Download syn pdoc theme
40-
run: git clone --depth=1 https://github.com/nxtlo/syn.git /tmp/syn
41-
42-
- name: Build docs with pdoc
43-
# Run from project root so the editable install resolves correctly.
44-
# TextSpitter.guide is a subpackage and discovered automatically.
45-
run: |
46-
uv run pdoc TextSpitter \
47-
--output-dir _site/ \
48-
--docformat google \
49-
-t /tmp/syn/dracula
35+
- name: Build documentation
36+
# Runs docs/make.py which generates API reference with pdoc
37+
# and preserves custom HTML pages
38+
run: uv run python docs/make.py
5039

5140
- name: Upload pages artifact
5241
uses: actions/upload-pages-artifact@v3
5342
with:
54-
path: _site/
43+
path: docs/
5544

5645
deploy-docs:
5746
needs: build-docs

.pre-commit-config.yaml

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ repos:
33
rev: v5.0.0
44
hooks:
55
- id: check-added-large-files
6-
exclude: bin/
6+
exclude: bin/|docs/
77
- id: check-ast
88
- id: check-builtin-literals
99
- id: check-byte-order-marker
@@ -13,30 +13,33 @@ repos:
1313
- id: check-json
1414
- id: check-merge-conflict
1515
- id: check-shebang-scripts-are-executable
16+
exclude: docs/
1617
- id: check-symlinks
1718
- id: check-toml
1819
- id: check-vcs-permalinks
1920
- id: check-xml
2021
- id: check-yaml
2122
- id: debug-statements
22-
exclude: tests/
23+
exclude: tests/|docs/
2324
- id: destroyed-symlinks
2425
- id: detect-aws-credentials
2526
args: [ --allow-missing-credentials ]
2627
- id: detect-private-key
2728
- id: end-of-file-fixer
28-
exclude: tests/test_changes/
29+
exclude: tests/test_changes/|docs/
2930
files: \.(py|sh|rst|yml|yaml)$
3031
- id: pretty-format-json
3132
args: [ --autofix ]
3233
- id: sort-simple-yaml
3334
- id: trailing-whitespace
35+
exclude: docs/
3436

3537
- repo: https://github.com/charliermarsh/ruff-pre-commit
3638
# Ruff version.
3739
rev: "v0.8.0"
3840
hooks:
3941
- id: ruff
42+
exclude: docs/
4043

4144
- repo: https://github.com/pycqa/isort
4245
rev: 5.13.2
@@ -48,7 +51,7 @@ repos:
4851
rev: 24.10.0
4952
hooks:
5053
- id: black
51-
exclude: tests/
54+
exclude: tests/|docs/
5255

5356
- repo: local
5457
hooks:

TextSpitter/guide/__init__.py

Lines changed: 6 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -1,147 +1,10 @@
11
"""
2-
# TextSpitter Documentation
2+
API Reference and User Guides.
33
4-
## Welcome to TextSpitter
5-
6-
**Transforming documents into insights, effortlessly and efficiently.**
7-
8-
TextSpitter extracts plain text from documents and source-code files with a single call.
9-
It normalises every input type — file paths, `BytesIO` streams, `SpooledTemporaryFile` objects,
10-
and raw `bytes` — into plain strings, making it ideal for LLM pipelines, search engines,
11-
and data-processing workflows.
12-
13-
---
14-
15-
## 📚 Start Here
16-
17-
Choose your path based on what you want to do:
18-
19-
<details open>
20-
<summary><strong>⚡ I want to extract text right now</strong></summary>
21-
22-
Start with **[Quick Start](quickstart.html)** to install and run your first extraction in under 2 minutes.
23-
24-
```python
25-
from TextSpitter import TextSpitter
26-
27-
text = TextSpitter(filename="report.pdf")
28-
print(text[:500])
29-
```
30-
31-
</details>
32-
33-
<details>
34-
<summary><strong>🎯 I need to understand how TextSpitter works</strong></summary>
35-
36-
Read the **[Technical Overview](overview.html)** for architecture, module design, and implementation details.
37-
38-
Covers: three-layer design, input resolution, PDF fallback chains, encoding strategy, and logging.
39-
40-
</details>
41-
42-
<details>
43-
<summary><strong>🔍 I want to learn by example</strong></summary>
44-
45-
Follow the **[Tutorial](tutorial.html)** for a format-by-format walkthrough covering:
46-
- PDF extraction (with PyMuPDF + pypdf fallback)
47-
- DOCX extraction via FastAPI
48-
- TXT & CSV with encoding handling
49-
- Source code files (50+ extensions)
50-
- Direct `FileExtractor` and `WordLoader` usage
51-
52-
</details>
53-
54-
<details>
55-
<summary><strong>💼 I'm building a real application</strong></summary>
56-
57-
Check **[Common Use Cases](usecases.html)** for production patterns:
58-
- Web APIs (FastAPI, Django/DRF)
59-
- Cloud storage (AWS S3)
60-
- LLM pipelines (LangChain, OpenAI embeddings)
61-
- Batch processing (directory trees, parallel extraction)
62-
- Logging strategies
63-
64-
</details>
65-
66-
<details>
67-
<summary><strong>📋 I need a code snippet</strong></summary>
68-
69-
Browse **[Recipes](recipes.html)** for copy-paste snippets covering:
70-
- Input handling (BytesIO, SpooledTemporaryFile, raw bytes)
71-
- Format-specific extraction
72-
- Error and encoding handling
73-
- Testing patterns
74-
75-
</details>
76-
77-
---
78-
79-
## ✨ Supported Formats
80-
81-
| Format | Method | Notes |
82-
|--------|--------|-------|
83-
| **PDF** | `pdf_file_read()` | PyMuPDF → pypdf fallback |
84-
| **DOCX** | `docx_file_read()` | python-docx paragraph extraction |
85-
| **TXT** | `text_file_read()` | UTF-8 → latin-1 → UTF-8-replace |
86-
| **CSV** | `csv_file_read()` | Same encoding cascade as TXT |
87-
| **Source code** | `code_file_read()` | 50+ extensions (py, js, ts, go, rs, java, …) |
88-
89-
---
90-
91-
## 🚀 Quick Start
92-
93-
### Install
94-
95-
```sh
96-
pip install textspitter
97-
98-
# With optional loguru logging
99-
pip install "textspitter[logging]"
100-
```
101-
102-
### Extract
103-
104-
```python
105-
from TextSpitter import TextSpitter
106-
107-
# From a file
108-
text = TextSpitter(filename="report.pdf")
109-
110-
# From a stream
111-
from io import BytesIO
112-
text = TextSpitter(file_obj=BytesIO(pdf_bytes), filename="report.pdf")
113-
114-
# From raw bytes
115-
text = TextSpitter(file_obj=docx_bytes, filename="contract.docx")
116-
```
117-
118-
### CLI
119-
120-
```sh
121-
# Single file to stdout
122-
textspitter report.pdf
123-
124-
# Multiple files to combined output
125-
textspitter chapter1.pdf chapter2.pdf -o book.txt
126-
```
127-
128-
---
129-
130-
## 🔗 Navigation
131-
132-
| Page | Purpose | Best for |
133-
|------|---------|----------|
134-
| [Overview](overview.html) | Architecture & design | Understanding the internals |
135-
| [Quick Start](quickstart.html) | Installation & first extraction | Getting started fast |
136-
| [Tutorial](tutorial.html) | Format-by-format guide | Learning by example |
137-
| [Use Cases](usecases.html) | Production patterns | Building real applications |
138-
| [Recipes](recipes.html) | Code snippets | Copy-paste solutions |
139-
140-
---
141-
142-
## 📖 Full API Reference
143-
144-
For complete API documentation including class definitions, method signatures, and parameters,
145-
see the **TextSpitter module reference** in the sidebar.
4+
See the [main documentation](../index.html) for quick start, tutorials, use cases, and recipes.
1465
6+
Module Overview:
7+
- `TextSpitter.main.WordLoader` — Format dispatcher
8+
- `TextSpitter.core.FileExtractor` — Low-level file reader
9+
- `TextSpitter.logger` — Optional loguru / stdlib logging shim
14710
"""

0 commit comments

Comments
 (0)