docctl is a CLI-first local document retrieval tool optimized for agent and human operability.
The repository is designed for:
- explicit boundaries,
- predictable module responsibilities,
- index-first documentation,
- mechanically verified quality checks.
Primary runtime code lives in src/docctl/.
cli.py- User entrypoint and command contract.
- Responsible for argument parsing, output mode selection, and exit behavior.
services.py- Stable façade entrypoint consumed by
cli.py. - Preserves service-level public call signatures and monkeypatch seams.
- Stable façade entrypoint consumed by
service_ingest.py,service_query.py,service_session.py,service_doctor.py- Internal orchestration modules split by workflow domain.
- Own command execution logic for ingest/query/session/doctor flows.
service_snapshot.py- Snapshot orchestration for index export/import workflows.
- Owns zip archive validation, safe extraction, and restore policy enforcement.
service_manifest.py- Manifest and catalog serialization helpers.
service_types.py- Internal dataclasses/protocols for service request payloads and injected dependencies.
document_extract.py- Multi-format extraction dispatcher for supported inputs (
.pdf,.docx,.txt,.md).
- Multi-format extraction dispatcher for supported inputs (
pdf_extract.py- PDF parser branch with fallback extraction and normalization.
chunking.py- Sentence-aware chunk generation and shared metadata propagation.
index_store.py- Chroma persistence adapter and collection operations.
embeddings.py- Embedding model initialization and vector generation boundary.
reranking.py- Optional second-stage cross-encoder reranking boundary used by search/session.
models.py,errors.py,config.py,jsonio.py,ids.py- Shared data contracts, stable errors, configuration defaults, deterministic JSON, and IDs.
Required dependency direction:
- Helper modules (
models,errors,config,jsonio,ids) are foundational. - Capability modules (
document_extract,pdf_extract,chunking,index_store,embeddings) depend on helpers. - Service orchestration modules (
service_*) compose helpers and capability modules. servicesis a compatibility façade overservice_*modules.clidepends onservicesand shared contracts only.
Disallowed pattern examples:
- Low-level modules importing CLI code.
- Business flow logic split into CLI handlers instead of
services.py. - Ad-hoc JSON output bypassing
jsonio.pydeterministic serializer.
All changes must satisfy:
make check-markdown-linksuv run pytest tests/unit tests/integration tests/acceptance -q
Recommended behavior validation for workflow changes:
uv run docctl --help- real smoke loop:
ingest -> search -> show
AGENTS.mdis the map, not the encyclopedia.- Deep context belongs in
docs/and is index-first. - Documentation updates must stay synchronized with behavior changes.