@@ -4,6 +4,14 @@ Last verified: 2026-03-12
44
55This guide documents the current CLI behavior for scripts in ` scripts/rag/ ` .
66
7+ Use module-style invocation for active commands:
8+
9+ ``` bash
10+ uv run python -m scripts.rag.< script_name> ...
11+ ```
12+
13+ Top-level wrappers in ` scripts/*.py ` still exist for compatibility, but module invocation is the preferred form for day-to-day RAG data management.
14+
715## Docs Quick Links
816
917- RAG management docs hub: ` docs/rag_management/00_README.md `
@@ -23,7 +31,42 @@ For detailed per-script documentation, see:
23314 . ` scripts/rag/old_prepare_rag.py ` (legacy batch helper)
24325 . ` scripts/context/fetch_character_context.py `
2533
26- Top-level wrappers in ` scripts/*.py ` are kept for compatibility.
34+ ## Canonical RAG Data Management Process
35+
36+ The clearest routine workflow for one character or corpus is:
37+
38+ 1 . Prepare the source text in ` rag_data/<name>.txt ` .
39+ 2 . Generate metadata with ` analyze_rag_text ` .
40+ 3 . Validate the metadata file.
41+ 4 . Optionally run quality gates:
42+ - ` coverage score ` for source-to-metadata coverage
43+ - ` lint message-examples ` when ` *_message_examples.txt ` exists
44+ 5 . Push lore and message examples with ` push_rag_data ` .
45+ 6 . Spot-check retrieval with ` manage_collections test ` .
46+ 7 . Run ` evaluate-fixtures ` when you need regression metrics.
47+
48+ Example:
49+
50+ ``` bash
51+ uv run python -m scripts.rag.analyze_rag_text analyze rag_data/shodan.txt \
52+ -o rag_data/shodan.json \
53+ --strict \
54+ --review-report rag_data/shodan_review.json
55+
56+ uv run python -m scripts.rag.analyze_rag_text validate rag_data/shodan.json
57+
58+ uv run python -m scripts.rag.manage_collections coverage score \
59+ --metadata-file rag_data/shodan.json \
60+ --source-file rag_data/shodan.txt \
61+ --threshold 0.75
62+
63+ uv run python -m scripts.rag.manage_collections lint message-examples --fix
64+
65+ uv run python -m scripts.rag.push_rag_data rag_data/shodan.txt -c shodan -w
66+ uv run python -m scripts.rag.push_rag_data rag_data/shodan_message_examples.txt -c shodan_mes -w
67+
68+ uv run python -m scripts.rag.manage_collections test shodan -q " SHODAN origin" -k 5
69+ ```
2770
2871---
2972
@@ -32,7 +75,7 @@ Top-level wrappers in `scripts/*.py` are kept for compatibility.
3275### Analyze
3376
3477``` bash
35- uv run python scripts/ rag/analyze_rag_text.py analyze rag_data/shodan.txt -v
78+ uv run python -m scripts. rag.analyze_rag_text analyze rag_data/shodan.txt -v
3679```
3780
3881Common options:
@@ -48,21 +91,21 @@ Common options:
4891### Validate metadata
4992
5093``` bash
51- uv run python scripts/ rag/analyze_rag_text.py validate rag_data/shodan.json
94+ uv run python -m scripts. rag.analyze_rag_text validate rag_data/shodan.json
5295```
5396
5497### Scan directory
5598
5699``` bash
57- uv run python scripts/ rag/analyze_rag_text.py scan rag_data/ --auto-generate
100+ uv run python -m scripts. rag.analyze_rag_text scan rag_data/ --auto-generate
58101```
59102
60103---
61104
62105## 2) Push RAG Data to ChromaDB
63106
64107``` bash
65- uv run python scripts/ rag/push_rag_data.py rag_data/shodan.txt -c shodan
108+ uv run python -m scripts. rag.push_rag_data rag_data/shodan.txt -c shodan
66109```
67110
68111Common options:
@@ -82,8 +125,10 @@ Notes:
82125
83126- Leading HTML header comments are stripped before chunking.
84127- Metadata file auto-detection maps ` <name>.txt ` (and ` <name>_message_examples.txt ` ) to ` <name>.json ` .
128+ - If metadata exists, push runs the coverage quality gate before writing.
85129- Metadata enrichment workers use ` ProcessPoolExecutor ` with ` spawn ` context to avoid Python 3.13 ` fork() ` deprecation warnings in multithreaded runs.
86130- Collection writes stamp embedding fingerprint metadata and non-overwrite pushes block mixed-model writes.
131+ - Category threshold flags are logged for visibility, but category assignment itself happens when metadata is generated by ` analyze_rag_text ` .
87132
88133---
89134
@@ -92,25 +137,25 @@ Notes:
92137### List
93138
94139``` bash
95- uv run python scripts/ rag/manage_collections.py list-collections -v
140+ uv run python -m scripts. rag.manage_collections list-collections -v
96141```
97142
98143### Delete one
99144
100145``` bash
101- uv run python scripts/ rag/manage_collections.py delete shodan_old -y
146+ uv run python -m scripts. rag.manage_collections delete shodan_old -y
102147```
103148
104149### Delete multiple
105150
106151``` bash
107- uv run python scripts/ rag/manage_collections.py delete-multiple --pattern " test_*" -y
152+ uv run python -m scripts. rag.manage_collections delete-multiple --pattern " test_*" -y
108153```
109154
110155### Test retrieval
111156
112157``` bash
113- uv run python scripts/ rag/manage_collections.py test shodan -q " SHODAN origin" -k 5
158+ uv run python -m scripts. rag.manage_collections test shodan -q " SHODAN origin" -k 5
114159```
115160
116161Optional embedding overrides:
@@ -121,13 +166,13 @@ Optional embedding overrides:
121166### Export
122167
123168``` bash
124- uv run python scripts/ rag/manage_collections.py export shodan -o backups/shodan.json
169+ uv run python -m scripts. rag.manage_collections export shodan -o backups/shodan.json
125170```
126171
127172### Info
128173
129174``` bash
130- uv run python scripts/ rag/manage_collections.py info shodan
175+ uv run python -m scripts. rag.manage_collections info shodan
131176```
132177
133178### Evaluate fixtures
@@ -158,7 +203,7 @@ Use this after upgrading to fingerprint enforcement to migrate legacy collection
158203## 4) Fetch and Clean Character Context From Web
159204
160205``` bash
161- uv run python scripts/ context/fetch_character_context.py " https://en.wikipedia.org/wiki/Leonardo_da_Vinci" -o rag_data/leonardo_da_vinci.txt
206+ uv run python -m scripts. context.fetch_character_context " https://en.wikipedia.org/wiki/Leonardo_da_Vinci" -o rag_data/leonardo_da_vinci.txt
162207```
163208
164209Features:
@@ -173,10 +218,10 @@ Features:
173218## Typical Workflow
174219
175220``` bash
176- uv run python scripts/ rag/analyze_rag_text.py analyze rag_data/new_char.txt -o rag_data/new_char.json --strict
177- uv run python scripts/ rag/analyze_rag_text.py validate rag_data/new_char.json
178- uv run python scripts/ rag/push_rag_data.py rag_data/new_char.txt -c new_char -w
179- uv run python scripts/ rag/manage_collections.py test new_char -q " intro prompt" -k 5
221+ uv run python -m scripts. rag.analyze_rag_text analyze rag_data/new_char.txt -o rag_data/new_char.json --strict
222+ uv run python -m scripts. rag.analyze_rag_text validate rag_data/new_char.json
223+ uv run python -m scripts. rag.push_rag_data rag_data/new_char.txt -c new_char -w
224+ uv run python -m scripts. rag.manage_collections test new_char -q " intro prompt" -k 5
180225```
181226
182227## Related Files
0 commit comments