Skip to content

Commit 1c876f0

Browse files
authored
Merge pull request #22 from ossirytk/vibing-rag-quality-work
Update docs and some rag work
2 parents 56cbbdb + 1d958f2 commit 1c876f0

50 files changed

Lines changed: 2662 additions & 294 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/quality_gate.yml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
name: Quality Gate
2+
3+
on:
4+
push:
5+
branches: ["**"]
6+
pull_request:
7+
branches: ["**"]
8+
9+
jobs:
10+
quality-gate:
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- uses: actions/checkout@v4
15+
16+
- name: Set up Python
17+
uses: actions/setup-python@v5
18+
with:
19+
python-version: "3.13"
20+
21+
- name: Install uv
22+
uses: astral-sh/setup-uv@v4
23+
24+
- name: Install dependencies
25+
run: uv sync --dev
26+
27+
- name: Lint (ruff)
28+
run: uv run ruff check .
29+
30+
- name: Format check (ruff)
31+
run: uv run ruff format --check .
32+
33+
- name: Unit tests
34+
run: uv run pytest -q
35+
36+
- name: Capture baselines (idempotent)
37+
run: uv run python -m scripts.conversation.capture_baselines
38+
39+
- name: Quality gate
40+
run: >
41+
uv run python -m scripts.quality_gate
42+
--seed 42
43+
--skip-retrieval
44+
--baselines-dir logs/conversation_quality/baselines

README.md

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -89,31 +89,54 @@ Python requirement is defined in `pyproject.toml` (`>=3.13`).
8989

9090
## Quick RAG Workflow
9191

92+
Use module-style invocation for the active RAG scripts:
93+
94+
```bash
95+
uv run python -m scripts.rag.<script_name> ...
96+
```
97+
98+
This is the preferred form in the docs because it is more reliable for package imports than calling nested script paths directly.
99+
92100
1. Analyze source text and generate metadata:
93101

94102
```bash
95-
uv run python scripts/rag/analyze_rag_text.py analyze rag_data/shodan.txt -o rag_data/shodan.json --strict
103+
uv run python -m scripts.rag.analyze_rag_text analyze rag_data/shodan.txt \
104+
-o rag_data/shodan.json \
105+
--strict \
106+
--review-report rag_data/shodan_review.json
96107
```
97108

98109
2. Validate metadata:
99110

100111
```bash
101-
uv run python scripts/rag/analyze_rag_text.py validate rag_data/shodan.json
112+
uv run python -m scripts.rag.analyze_rag_text validate rag_data/shodan.json
113+
```
114+
115+
3. Optional quality gates before push:
116+
117+
```bash
118+
uv run python -m scripts.rag.manage_collections coverage score \
119+
--metadata-file rag_data/shodan.json \
120+
--source-file rag_data/shodan.txt \
121+
--threshold 0.75
122+
123+
uv run python -m scripts.rag.manage_collections lint message-examples --fix
102124
```
103125

104-
3. Push text into a collection:
126+
4. Push lore and message examples into collections:
105127

106128
```bash
107-
uv run python scripts/rag/push_rag_data.py rag_data/shodan.txt -c shodan -w
129+
uv run python -m scripts.rag.push_rag_data rag_data/shodan.txt -c shodan -w
130+
uv run python -m scripts.rag.push_rag_data rag_data/shodan_message_examples.txt -c shodan_mes -w
108131
```
109132

110-
4. Test retrieval quality:
133+
5. Spot-check retrieval quality:
111134

112135
```bash
113-
uv run python scripts/rag/manage_collections.py test shodan -q "SHODAN origin" -k 5
136+
uv run python -m scripts.rag.manage_collections test shodan -q "SHODAN origin" -k 5
114137
```
115138

116-
5. Evaluate retrieval fixtures with summary metrics:
139+
6. Evaluate retrieval fixtures with summary metrics:
117140

118141
```bash
119142
uv run python -m scripts.rag.manage_collections evaluate-fixtures --fixture-file tests/fixtures/retrieval_fixtures.json
@@ -162,6 +185,8 @@ Notes:
162185

163186
- Leading HTML header comments are stripped before chunking.
164187
- Metadata auto-detection maps `<name>.txt` and `<name>_message_examples.txt` to `<name>.json`.
188+
- If metadata exists, push runs a source-coverage quality gate before writing.
189+
- Category threshold flags are informational at push time; change category assignment by regenerating metadata with `analyze_rag_text`.
165190

166191
### `scripts/rag/manage_collections.py`
167192

@@ -173,6 +198,11 @@ Commands:
173198
- `test`
174199
- `export`
175200
- `info`
201+
- `evaluate-fixtures`
202+
- `benchmark-rerank`
203+
- `backfill-embedding-fingerprint`
204+
- `coverage score`
205+
- `lint message-examples`
176206

177207
### Compatibility wrappers
178208

core/conversation_manager.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ def __init__(self) -> None:
124124
self._last_summary_topic_terms: set[str] = set()
125125
drift_window = max(1, self.runtime_config.persona_drift_history_window)
126126
self.persona_drift_history: deque[float] = deque(maxlen=drift_window)
127+
self.persona_drift_trace: deque[dict[str, object]] = deque(maxlen=drift_window)
127128
self.last_persona_drift: dict[str, object] | None = None
128129
self.persona_drift_scorer = PersonaDriftScorer(
129130
PersonaAnchor(

core/conversation_response_mixin.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def _record_persona_drift(self, response: str) -> None:
4040
history.append(float(result.drift_score))
4141
summary = self._persona_drift_summary()
4242
turn_number = min(len(self.user_message_history), len(self.ai_message_history))
43-
self.last_persona_drift = {
43+
drift_record = {
4444
"turn": turn_number,
4545
"drift_score": float(result.drift_score),
4646
"persona_fidelity": float(result.persona_fidelity),
@@ -50,6 +50,10 @@ def _record_persona_drift(self, response: str) -> None:
5050
"has_user_turn_pattern": bool(result.has_user_turn_pattern),
5151
"rolling_avg": float(summary["avg"]),
5252
}
53+
self.last_persona_drift = drift_record
54+
trace = getattr(self, "persona_drift_trace", None)
55+
if trace is not None:
56+
trace.append(dict(drift_record))
5357

5458
warning_threshold = float(getattr(self.runtime_config, "persona_drift_warning_threshold", 1.0))
5559
if result.drift_score >= warning_threshold:
@@ -200,6 +204,8 @@ def clear_conversation_state(self) -> None:
200204
self._last_summary_topic_terms = set()
201205
if hasattr(self, "persona_drift_history"):
202206
self.persona_drift_history.clear()
207+
if hasattr(self, "persona_drift_trace"):
208+
self.persona_drift_trace.clear()
203209
self.last_persona_drift = None
204210

205211
def export_conversation_state(self) -> dict[str, object]:
@@ -211,6 +217,7 @@ def export_conversation_state(self) -> dict[str, object]:
211217
"history_summaries": list(self.history_summaries),
212218
"last_summary_topic_terms": sorted(self._last_summary_topic_terms),
213219
"persona_drift_history": list(getattr(self, "persona_drift_history", [])),
220+
"persona_drift_trace": list(getattr(self, "persona_drift_trace", [])),
214221
"persona_drift_last": self.last_persona_drift,
215222
"persona_drift_avg": drift_summary["avg"],
216223
}
@@ -222,6 +229,7 @@ def import_conversation_state(self, state: dict[str, object]) -> None:
222229
history_summaries = state.get("history_summaries", [])
223230
summary_terms = state.get("last_summary_topic_terms", [])
224231
drift_history = state.get("persona_drift_history", [])
232+
drift_trace = state.get("persona_drift_trace", [])
225233
drift_last = state.get("persona_drift_last")
226234

227235
normalized_user_history = [item for item in user_history if isinstance(item, str)]
@@ -240,6 +248,9 @@ def import_conversation_state(self, state: dict[str, object]) -> None:
240248
if hasattr(self, "persona_drift_history"):
241249
normalized_drift = [float(value) for value in drift_history if isinstance(value, int | float)]
242250
self.persona_drift_history = deque(normalized_drift, maxlen=self.persona_drift_history.maxlen)
251+
if hasattr(self, "persona_drift_trace"):
252+
normalized_trace = [item for item in drift_trace if isinstance(item, dict)]
253+
self.persona_drift_trace = deque(normalized_trace, maxlen=self.persona_drift_trace.maxlen)
243254
self.last_persona_drift = drift_last if isinstance(drift_last, dict) else None
244255

245256
_STRAY_TOKENS: tuple[str, ...] = ("[/INST]", "<|im_end|>", "</s>", "<|eot_id|>", "<s>", "<|end|>")

core/conversation_retrieval_orchestration_mixin.py

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -265,14 +265,17 @@ def _get_vector_context(self, query: str, k: int | None = None, *, include_mes:
265265
k=k_mes,
266266
)
267267
else:
268-
mes_chunks, mes_trace = [], {
269-
"mode": "disabled",
270-
"filter_path": "none",
271-
"candidates": 0,
272-
"returned": 0,
273-
"queries": 0,
274-
"rerank_applied": False,
275-
}
268+
mes_chunks, mes_trace = (
269+
[],
270+
{
271+
"mode": "disabled",
272+
"filter_path": "none",
273+
"candidates": 0,
274+
"returned": 0,
275+
"queries": 0,
276+
"rerank_applied": False,
277+
},
278+
)
276279
context_chunks = self._filter_context_chunks(context_chunks)
277280
mes_chunks = self._filter_context_chunks(mes_chunks)
278281
context_chunks, mes_chunks, cross_removed = self._dedupe_cross_collection_chunks(context_chunks, mes_chunks)

docs/RAG_SCRIPTS_GUIDE.md

Lines changed: 61 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,14 @@ Last verified: 2026-03-12
44

55
This guide documents the current CLI behavior for scripts in `scripts/rag/`.
66

7+
Use module-style invocation for active commands:
8+
9+
```bash
10+
uv run python -m scripts.rag.<script_name> ...
11+
```
12+
13+
Top-level wrappers in `scripts/*.py` still exist for compatibility, but module invocation is the preferred form for day-to-day RAG data management.
14+
715
## Docs Quick Links
816

917
- RAG management docs hub: `docs/rag_management/00_README.md`
@@ -23,7 +31,42 @@ For detailed per-script documentation, see:
2331
4. `scripts/rag/old_prepare_rag.py` (legacy batch helper)
2432
5. `scripts/context/fetch_character_context.py`
2533

26-
Top-level wrappers in `scripts/*.py` are kept for compatibility.
34+
## Canonical RAG Data Management Process
35+
36+
The clearest routine workflow for one character or corpus is:
37+
38+
1. Prepare the source text in `rag_data/<name>.txt`.
39+
2. Generate metadata with `analyze_rag_text`.
40+
3. Validate the metadata file.
41+
4. Optionally run quality gates:
42+
- `coverage score` for source-to-metadata coverage
43+
- `lint message-examples` when `*_message_examples.txt` exists
44+
5. Push lore and message examples with `push_rag_data`.
45+
6. Spot-check retrieval with `manage_collections test`.
46+
7. Run `evaluate-fixtures` when you need regression metrics.
47+
48+
Example:
49+
50+
```bash
51+
uv run python -m scripts.rag.analyze_rag_text analyze rag_data/shodan.txt \
52+
-o rag_data/shodan.json \
53+
--strict \
54+
--review-report rag_data/shodan_review.json
55+
56+
uv run python -m scripts.rag.analyze_rag_text validate rag_data/shodan.json
57+
58+
uv run python -m scripts.rag.manage_collections coverage score \
59+
--metadata-file rag_data/shodan.json \
60+
--source-file rag_data/shodan.txt \
61+
--threshold 0.75
62+
63+
uv run python -m scripts.rag.manage_collections lint message-examples --fix
64+
65+
uv run python -m scripts.rag.push_rag_data rag_data/shodan.txt -c shodan -w
66+
uv run python -m scripts.rag.push_rag_data rag_data/shodan_message_examples.txt -c shodan_mes -w
67+
68+
uv run python -m scripts.rag.manage_collections test shodan -q "SHODAN origin" -k 5
69+
```
2770

2871
---
2972

@@ -32,7 +75,7 @@ Top-level wrappers in `scripts/*.py` are kept for compatibility.
3275
### Analyze
3376

3477
```bash
35-
uv run python scripts/rag/analyze_rag_text.py analyze rag_data/shodan.txt -v
78+
uv run python -m scripts.rag.analyze_rag_text analyze rag_data/shodan.txt -v
3679
```
3780

3881
Common options:
@@ -48,21 +91,21 @@ Common options:
4891
### Validate metadata
4992

5093
```bash
51-
uv run python scripts/rag/analyze_rag_text.py validate rag_data/shodan.json
94+
uv run python -m scripts.rag.analyze_rag_text validate rag_data/shodan.json
5295
```
5396

5497
### Scan directory
5598

5699
```bash
57-
uv run python scripts/rag/analyze_rag_text.py scan rag_data/ --auto-generate
100+
uv run python -m scripts.rag.analyze_rag_text scan rag_data/ --auto-generate
58101
```
59102

60103
---
61104

62105
## 2) Push RAG Data to ChromaDB
63106

64107
```bash
65-
uv run python scripts/rag/push_rag_data.py rag_data/shodan.txt -c shodan
108+
uv run python -m scripts.rag.push_rag_data rag_data/shodan.txt -c shodan
66109
```
67110

68111
Common options:
@@ -82,8 +125,10 @@ Notes:
82125

83126
- Leading HTML header comments are stripped before chunking.
84127
- Metadata file auto-detection maps `<name>.txt` (and `<name>_message_examples.txt`) to `<name>.json`.
128+
- If metadata exists, push runs the coverage quality gate before writing.
85129
- Metadata enrichment workers use `ProcessPoolExecutor` with `spawn` context to avoid Python 3.13 `fork()` deprecation warnings in multithreaded runs.
86130
- Collection writes stamp embedding fingerprint metadata and non-overwrite pushes block mixed-model writes.
131+
- Category threshold flags are logged for visibility, but category assignment itself happens when metadata is generated by `analyze_rag_text`.
87132

88133
---
89134

@@ -92,25 +137,25 @@ Notes:
92137
### List
93138

94139
```bash
95-
uv run python scripts/rag/manage_collections.py list-collections -v
140+
uv run python -m scripts.rag.manage_collections list-collections -v
96141
```
97142

98143
### Delete one
99144

100145
```bash
101-
uv run python scripts/rag/manage_collections.py delete shodan_old -y
146+
uv run python -m scripts.rag.manage_collections delete shodan_old -y
102147
```
103148

104149
### Delete multiple
105150

106151
```bash
107-
uv run python scripts/rag/manage_collections.py delete-multiple --pattern "test_*" -y
152+
uv run python -m scripts.rag.manage_collections delete-multiple --pattern "test_*" -y
108153
```
109154

110155
### Test retrieval
111156

112157
```bash
113-
uv run python scripts/rag/manage_collections.py test shodan -q "SHODAN origin" -k 5
158+
uv run python -m scripts.rag.manage_collections test shodan -q "SHODAN origin" -k 5
114159
```
115160

116161
Optional embedding overrides:
@@ -121,13 +166,13 @@ Optional embedding overrides:
121166
### Export
122167

123168
```bash
124-
uv run python scripts/rag/manage_collections.py export shodan -o backups/shodan.json
169+
uv run python -m scripts.rag.manage_collections export shodan -o backups/shodan.json
125170
```
126171

127172
### Info
128173

129174
```bash
130-
uv run python scripts/rag/manage_collections.py info shodan
175+
uv run python -m scripts.rag.manage_collections info shodan
131176
```
132177

133178
### Evaluate fixtures
@@ -158,7 +203,7 @@ Use this after upgrading to fingerprint enforcement to migrate legacy collection
158203
## 4) Fetch and Clean Character Context From Web
159204

160205
```bash
161-
uv run python scripts/context/fetch_character_context.py "https://en.wikipedia.org/wiki/Leonardo_da_Vinci" -o rag_data/leonardo_da_vinci.txt
206+
uv run python -m scripts.context.fetch_character_context "https://en.wikipedia.org/wiki/Leonardo_da_Vinci" -o rag_data/leonardo_da_vinci.txt
162207
```
163208

164209
Features:
@@ -173,10 +218,10 @@ Features:
173218
## Typical Workflow
174219

175220
```bash
176-
uv run python scripts/rag/analyze_rag_text.py analyze rag_data/new_char.txt -o rag_data/new_char.json --strict
177-
uv run python scripts/rag/analyze_rag_text.py validate rag_data/new_char.json
178-
uv run python scripts/rag/push_rag_data.py rag_data/new_char.txt -c new_char -w
179-
uv run python scripts/rag/manage_collections.py test new_char -q "intro prompt" -k 5
221+
uv run python -m scripts.rag.analyze_rag_text analyze rag_data/new_char.txt -o rag_data/new_char.json --strict
222+
uv run python -m scripts.rag.analyze_rag_text validate rag_data/new_char.json
223+
uv run python -m scripts.rag.push_rag_data rag_data/new_char.txt -c new_char -w
224+
uv run python -m scripts.rag.manage_collections test new_char -q "intro prompt" -k 5
180225
```
181226

182227
## Related Files

docs/configs/00_README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This section documents configuration files in `configs/`.
1212
## Runtime Loading Behavior
1313

1414
- Runtime requires `configs/config.v2.json`.
15+
- The repository currently tracks `configs/config.v2.json` directly; no `config.v2.example.json` is shipped.
1516
- `core/config.py` flattens nested v2 keys into legacy-style runtime keys for internal use.
1617
- `ConversationManager` and script CLIs consume typed values via `load_conversation_runtime_config` / `load_rag_script_config`.
1718

0 commit comments

Comments
 (0)