Implement SQLModel-CRUD-Utilities documentation approach

fsecada01 · claude · fsecada01 · commit 405266ae87d5 · 2026-02-18T00:08:31.000-05:00
Replicate the superior documentation strategy from SQLModel-CRUD-Utilities:

**New approach:**
- Custom HTML landing page with professional styling (hero section,
  feature cards, FAQ, quick-start examples)
- Build script (docs/make.py) that preserves custom HTML while
  generating API reference with pdoc
- Separation of concerns: custom pages (user guide) vs auto-generated
  API reference
- Full control over visual hierarchy and UX

**Key differences from previous approach:**
- Landing page has professional gradient design, feature cards, and
  interactive elements
- Progressive disclosure pattern guides users by use case
- Clear navigation between learning resources and API reference
- Better responsive design for mobile devices
- Manual control over styling and layout

**Configuration changes:**
- Exclude docs/ from pre-commit linting (generated files)
- Updated CI workflow to use docs/make.py build script

**Structure:**
- docs/index.html — Custom landing page with hero, features, FAQ
- docs/make.py — Build script that backs up custom HTML, runs pdoc,
  restores custom files
- Generated TextSpitter.html — Auto-generated API reference (via pdoc)

Co-Authored-By: Claude Haiku 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/.claude/settings.local.json b/.claude/settings.local.json
@@ -30,7 +30,8 @@
       "Bash(git -C:*)",
       "Bash(uv:*)",
       "Bash(.venv/Scripts/ty.exe check:*)",
-      "Bash(\".venv/Scripts/ty.exe\" --version)"
+      "Bash(\".venv/Scripts/ty.exe\" --version)",
+      "WebFetch(domain:fsecada01.github.io)"
     ]
   }
 }
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -30,28 +30,17 @@ jobs:
         run: uv python install 3.12
 
       - name: Install dependencies
-        # Editable install ensures TextSpitter.guide is importable by pdoc
         run: uv sync --all-extras --dev
 
-      # Clone the syn theme.  Each theme dir (dracula, onedark, rust) contains
-      # custom.css, theme.css, and syntax-highlighting.css that override pdoc's
-      # defaults.  The svelte theme is WIP upstream; switch when released.
-      - name: Download syn pdoc theme
-        run: git clone --depth=1 https://github.com/nxtlo/syn.git /tmp/syn
-
-      - name: Build docs with pdoc
-        # Run from project root so the editable install resolves correctly.
-        # TextSpitter.guide is a subpackage and discovered automatically.
-        run: |
-          uv run pdoc TextSpitter \
-            --output-dir _site/ \
-            --docformat google \
-            -t /tmp/syn/dracula
+      - name: Build documentation
+        # Runs docs/make.py which generates API reference with pdoc
+        # and preserves custom HTML pages
+        run: uv run python docs/make.py
 
       - name: Upload pages artifact
         uses: actions/upload-pages-artifact@v3
         with:
-          path: _site/
+          path: docs/
 
   deploy-docs:
     needs: build-docs
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -3,7 +3,7 @@ repos:
     rev: v5.0.0
     hooks:
       - id: check-added-large-files
-        exclude: bin/
+        exclude: bin/|docs/
       - id: check-ast
       - id: check-builtin-literals
       - id: check-byte-order-marker
@@ -13,30 +13,33 @@ repos:
       - id: check-json
       - id: check-merge-conflict
       - id: check-shebang-scripts-are-executable
+        exclude: docs/
       - id: check-symlinks
       - id: check-toml
       - id: check-vcs-permalinks
       - id: check-xml
       - id: check-yaml
       - id: debug-statements
-        exclude: tests/
+        exclude: tests/|docs/
       - id: destroyed-symlinks
       - id: detect-aws-credentials
         args: [ --allow-missing-credentials ]
       - id: detect-private-key
       - id: end-of-file-fixer
-        exclude: tests/test_changes/
+        exclude: tests/test_changes/|docs/
         files: \.(py|sh|rst|yml|yaml)$
       - id: pretty-format-json
         args: [ --autofix ]
       - id: sort-simple-yaml
       - id: trailing-whitespace
+        exclude: docs/
 
   - repo: https://github.com/charliermarsh/ruff-pre-commit
     # Ruff version.
     rev: "v0.8.0"
     hooks:
       - id: ruff
+        exclude: docs/
 
   - repo: https://github.com/pycqa/isort
     rev: 5.13.2
@@ -48,7 +51,7 @@ repos:
     rev: 24.10.0
     hooks:
       - id: black
-        exclude: tests/
+        exclude: tests/|docs/
 
   - repo: local
     hooks:
diff --git a/TextSpitter/guide/__init__.py b/TextSpitter/guide/__init__.py
@@ -1,147 +1,10 @@
 """
-# TextSpitter Documentation
+API Reference and User Guides.
 
-## Welcome to TextSpitter
-
-**Transforming documents into insights, effortlessly and efficiently.**
-
-TextSpitter extracts plain text from documents and source-code files with a single call.
-It normalises every input type — file paths, `BytesIO` streams, `SpooledTemporaryFile` objects,
-and raw `bytes` — into plain strings, making it ideal for LLM pipelines, search engines,
-and data-processing workflows.
-
----
-
-## 📚 Start Here
-
-Choose your path based on what you want to do:
-
-<details open>
-<summary><strong>⚡ I want to extract text right now</strong></summary>
-
-Start with **[Quick Start](quickstart.html)** to install and run your first extraction in under 2 minutes.
-
-```python
-from TextSpitter import TextSpitter
-
-text = TextSpitter(filename="report.pdf")
-print(text[:500])
-```
-
-</details>
-
-<details>
-<summary><strong>🎯 I need to understand how TextSpitter works</strong></summary>
-
-Read the **[Technical Overview](overview.html)** for architecture, module design, and implementation details.
-
-Covers: three-layer design, input resolution, PDF fallback chains, encoding strategy, and logging.
-
-</details>
-
-<details>
-<summary><strong>🔍 I want to learn by example</strong></summary>
-
-Follow the **[Tutorial](tutorial.html)** for a format-by-format walkthrough covering:
-- PDF extraction (with PyMuPDF + pypdf fallback)
-- DOCX extraction via FastAPI
-- TXT & CSV with encoding handling
-- Source code files (50+ extensions)
-- Direct `FileExtractor` and `WordLoader` usage
-
-</details>
-
-<details>
-<summary><strong>💼 I'm building a real application</strong></summary>
-
-Check **[Common Use Cases](usecases.html)** for production patterns:
-- Web APIs (FastAPI, Django/DRF)
-- Cloud storage (AWS S3)
-- LLM pipelines (LangChain, OpenAI embeddings)
-- Batch processing (directory trees, parallel extraction)
-- Logging strategies
-
-</details>
-
-<details>
-<summary><strong>📋 I need a code snippet</strong></summary>
-
-Browse **[Recipes](recipes.html)** for copy-paste snippets covering:
-- Input handling (BytesIO, SpooledTemporaryFile, raw bytes)
-- Format-specific extraction
-- Error and encoding handling
-- Testing patterns
-
-</details>
-
----
-
-## ✨ Supported Formats
-
-| Format | Method | Notes |
-|--------|--------|-------|
-| **PDF** | `pdf_file_read()` | PyMuPDF → pypdf fallback |
-| **DOCX** | `docx_file_read()` | python-docx paragraph extraction |
-| **TXT** | `text_file_read()` | UTF-8 → latin-1 → UTF-8-replace |
-| **CSV** | `csv_file_read()` | Same encoding cascade as TXT |
-| **Source code** | `code_file_read()` | 50+ extensions (py, js, ts, go, rs, java, …) |
-
----
-
-## 🚀 Quick Start
-
-### Install
-
-```sh
-pip install textspitter
-
-# With optional loguru logging
-pip install "textspitter[logging]"
-```
-
-### Extract
-
-```python
-from TextSpitter import TextSpitter
-
-# From a file
-text = TextSpitter(filename="report.pdf")
-
-# From a stream
-from io import BytesIO
-text = TextSpitter(file_obj=BytesIO(pdf_bytes), filename="report.pdf")
-
-# From raw bytes
-text = TextSpitter(file_obj=docx_bytes, filename="contract.docx")
-```
-
-### CLI
-
-```sh
-# Single file to stdout
-textspitter report.pdf
-
-# Multiple files to combined output
-textspitter chapter1.pdf chapter2.pdf -o book.txt
-```
-
----
-
-## 🔗 Navigation
-
-| Page | Purpose | Best for |
-|------|---------|----------|
-| [Overview](overview.html) | Architecture & design | Understanding the internals |
-| [Quick Start](quickstart.html) | Installation & first extraction | Getting started fast |
-| [Tutorial](tutorial.html) | Format-by-format guide | Learning by example |
-| [Use Cases](usecases.html) | Production patterns | Building real applications |
-| [Recipes](recipes.html) | Code snippets | Copy-paste solutions |
-
----
-
-## 📖 Full API Reference
-
-For complete API documentation including class definitions, method signatures, and parameters,
-see the **TextSpitter module reference** in the sidebar.
+See the [main documentation](../index.html) for quick start, tutorials, use cases, and recipes.
 
+Module Overview:
+- `TextSpitter.main.WordLoader` — Format dispatcher
+- `TextSpitter.core.FileExtractor` — Low-level file reader
+- `TextSpitter.logger` — Optional loguru / stdlib logging shim
 """
diff --git a/docs/index.html b/docs/index.html
diff --git a/docs/make.py b/docs/make.py

Original file line number	Diff line number	Diff line change
`@@ -30,7 +30,8 @@`
`30`	`30`	`"Bash(git -C:*)",`
`31`	`31`	`"Bash(uv:*)",`
`32`	`32`	`"Bash(.venv/Scripts/ty.exe check:*)",`
`33`		`- "Bash(\".venv/Scripts/ty.exe\" --version)"`
	`33`	`+ "Bash(\".venv/Scripts/ty.exe\" --version)",`
	`34`	`+ "WebFetch(domain:fsecada01.github.io)"`
`34`	`35`	`]`
`35`	`36`	`}`
`36`	`37`	`}`