Skip to content

Commit 4ee2fd3

Browse files
author
root
committed
chore: add py-scripts for offline eval and rocks meta repair
1 parent ea2955a commit 4ee2fd3

8 files changed

Lines changed: 2030 additions & 0 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,7 @@
22
/.env
33
/configs/
44
/.worktrees/
5+
/py-scripts/.venv/
6+
/py-scripts/__pycache__/
7+
/py-scripts/.pytest_cache/
8+
/py-scripts/*.egg-info/
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Profile Eval (Python scripts via running PrismGuard) - Design
2+
3+
**Date:** 2026-04-22
4+
5+
## Goal
6+
7+
Provide a fast-to-iterate evaluation tool that:
8+
9+
1. Accepts a moderation `profile`
10+
2. Reports local model artifacts for that profile (runtime + training marker files)
11+
3. Uses the **already running** PrismGuard Rust service to:
12+
- Open the profile database in read-only mode (server-side)
13+
- Run predictions/metrics for the profile's `local_model_type`
14+
- Produce F1 scores across multiple confidence thresholds
15+
16+
This intentionally avoids compiling or testing the Rust codebase, and only depends on the existing binary/service.
17+
18+
## Non-Goals
19+
20+
- Reimplement local inference in Python (hashlinear/bow/fasttext)
21+
- Read/iterate `history.rocks` directly in Python
22+
- Add new Rust endpoints or change existing service behavior
23+
24+
## Approach
25+
26+
### Python tool layout
27+
28+
- Directory: `py-scripts/`
29+
- Dependency management: `uv` virtualenv in `py-scripts/.venv`
30+
- Script: `py-scripts/eval_profile.py`
31+
32+
### Service discovery
33+
34+
The script determines the service base URL in this order:
35+
36+
1. `--base-url` CLI override
37+
2. `PRISMGUARD_BASE_URL` or `BASE_URL` env var
38+
3. Repository root `.env` values: `HOST` + `PORT`
39+
4. Fallback: `http://127.0.0.1:8000`
40+
41+
If `.env` binds to `0.0.0.0`, the client uses `127.0.0.1`.
42+
43+
### Data & metrics retrieval
44+
45+
The script calls:
46+
47+
- `GET /debug/profile/<profile>` to retrieve:
48+
- `live_sample_count` (used for full-dataset evaluation size)
49+
- `history_rocks_path` (reported)
50+
- `training_status` (reported)
51+
- plus other debug metadata
52+
53+
- `GET /debug/profile/<profile>/metrics?...` per threshold:
54+
- `sample_size=<live_sample_count>`
55+
- `sampling=latest_full|random_full|balanced`
56+
- `threshold=<t>`
57+
58+
The service performs DB open/read-only behavior internally; the script reports the output as evidence.
59+
60+
### Model selection
61+
62+
Only the profile's configured `local_model_type` is evaluated.
63+
64+
The script still reports missing/present runtime + marker files for that model type to explain failures quickly.
65+
66+
### Output
67+
68+
- Human-readable artifact report (exists/size/mtime)
69+
- Human-readable metrics table per threshold:
70+
- `precision`, `recall`, `f1`, `accuracy`
71+
- `tp`, `tn`, `fp`, `fn`, `evaluated`
72+
- elapsed seconds per request
73+
- Optional `--out-json` for machine-readable output
74+
75+
## Risks / Notes
76+
77+
- Full-dataset metrics may be slow (depends on sample count, model type, and server load).
78+
- The `/debug/profile/<profile>/metrics` route can be feature-gated; the tool should error clearly on non-200 responses.
79+

py-scripts/README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# py-scripts
2+
3+
Python 脚本工具集(用 `uv` 管理依赖与虚拟环境),用于在**不重新编译 Rust 主程序**的情况下,复用当前运行中的 PrismGuard 服务做诊断与评测。
4+
5+
## 快速开始
6+
7+
在仓库根目录执行:
8+
9+
```bash
10+
cd py-scripts
11+
uv venv
12+
uv pip install -e .
13+
```
14+
15+
运行离线评测脚本(示例,直接只读打开 `history.rocks`,本地推理,计算多阈值 F1):
16+
17+
```bash
18+
uv run python eval_profile.py --profile 4claudecode
19+
```
20+
21+
快速试跑(限制样本数):
22+
23+
```bash
24+
uv run python eval_profile.py --profile 4claudecode --limit 2000
25+
```
26+
27+
说明:
28+
- 默认走 `scikit-learn``HashingVectorizer` 快速路径(更接近 Rust/训练时的 Hashing 行为,也快很多)
29+
- 如果环境缺少依赖,会自动回退到纯 Python 慢路径(用于小样本调试)
30+
31+
## 单条耗时对比(Python vs Rust 在线服务)
32+
33+
基准脚本:
34+
35+
```bash
36+
uv run python bench_single.py --profile 4claudecode
37+
```
38+
39+
说明:
40+
- Python 侧测的是“纯本地推理”耗时(不含 HTTP、不含 DB 扫描)
41+
- Rust 侧用现有的 `/debug/profile/<profile>/metrics?sample_size=1` 作为近似(包含 HTTP + 服务端 DB 读取 + JSON 编解码开销),因此是偏保守的上界
42+
43+
## 修复 RocksDB 元数据(meta:count 等)
44+
45+
当程序异常终止,可能出现 `meta:*`(计数/next_id)与实际 `sample:*` 不一致,进而导致采样、指标、查找性能退化。
46+
47+
修复脚本:`repair_rocks_meta.py`
48+
49+
特性:
50+
- 先检查 RocksDB 的 `LOCK` 文件能否加锁(被占用会直接报错退出)
51+
- 默认 **dry-run**(只打印计划,不写入)
52+
- 可选重建 `text_latest:*` 指针(`--fix-pointers`
53+
54+
示例:
55+
56+
```bash
57+
cd py-scripts
58+
uv run python repair_rocks_meta.py --profile 4claudecode
59+
uv run python repair_rocks_meta.py --profile 4claudecode --apply
60+
uv run python repair_rocks_meta.py --profile 4claudecode --fix-pointers --apply
61+
```

0 commit comments

Comments
 (0)