CassiopeiaCode
diff --git a/‎.gitignore‎
Lines changed: 4 additions & 0 deletions b/‎.gitignore‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/superpowers/specs/2026-04-22-profile-eval-py-scripts-design.md‎
Lines changed: 79 additions & 0 deletions b/‎docs/superpowers/specs/2026-04-22-profile-eval-py-scripts-design.md‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎py-scripts/README.md‎
Lines changed: 61 additions & 0 deletions b/‎py-scripts/README.md‎
Lines changed: 61 additions & 0 deletions
@@ -2,3 +2,7 @@
 /.env
 /configs/
 /.worktrees/
+/py-scripts/.venv/
+/py-scripts/__pycache__/
+/py-scripts/.pytest_cache/
+/py-scripts/*.egg-info/
@@ -0,0 +1,79 @@
+# Profile Eval (Python scripts via running PrismGuard) - Design
+
+**Date:** 2026-04-22
+
+## Goal
+
+Provide a fast-to-iterate evaluation tool that:
+
+1. Accepts a moderation `profile`
+2. Reports local model artifacts for that profile (runtime + training marker files)
+3. Uses the **already running** PrismGuard Rust service to:
+   - Open the profile database in read-only mode (server-side)
+   - Run predictions/metrics for the profile's `local_model_type`
+   - Produce F1 scores across multiple confidence thresholds
+
+This intentionally avoids compiling or testing the Rust codebase, and only depends on the existing binary/service.
+
+## Non-Goals
+
+- Reimplement local inference in Python (hashlinear/bow/fasttext)
+- Read/iterate `history.rocks` directly in Python
+- Add new Rust endpoints or change existing service behavior
+
+## Approach
+
+### Python tool layout
+
+- Directory: `py-scripts/`
+- Dependency management: `uv` virtualenv in `py-scripts/.venv`
+- Script: `py-scripts/eval_profile.py`
+
+### Service discovery
+
+The script determines the service base URL in this order:
+
+1. `--base-url` CLI override
+2. `PRISMGUARD_BASE_URL` or `BASE_URL` env var
+3. Repository root `.env` values: `HOST` + `PORT`
+4. Fallback: `http://127.0.0.1:8000`
+
+If `.env` binds to `0.0.0.0`, the client uses `127.0.0.1`.
+
+### Data & metrics retrieval
+
+The script calls:
+
+- `GET /debug/profile/<profile>` to retrieve:
+  - `live_sample_count` (used for full-dataset evaluation size)
+  - `history_rocks_path` (reported)
+  - `training_status` (reported)
+  - plus other debug metadata
+
+- `GET /debug/profile/<profile>/metrics?...` per threshold:
+  - `sample_size=<live_sample_count>`
+  - `sampling=latest_full|random_full|balanced`
+  - `threshold=<t>`
+
+The service performs DB open/read-only behavior internally; the script reports the output as evidence.
+
+### Model selection
+
+Only the profile's configured `local_model_type` is evaluated.
+
+The script still reports missing/present runtime + marker files for that model type to explain failures quickly.
+
+### Output
+
+- Human-readable artifact report (exists/size/mtime)
+- Human-readable metrics table per threshold:
+  - `precision`, `recall`, `f1`, `accuracy`
+  - `tp`, `tn`, `fp`, `fn`, `evaluated`
+  - elapsed seconds per request
+- Optional `--out-json` for machine-readable output
+
+## Risks / Notes
+
+- Full-dataset metrics may be slow (depends on sample count, model type, and server load).
+- The `/debug/profile/<profile>/metrics` route can be feature-gated; the tool should error clearly on non-200 responses.
+
@@ -0,0 +1,61 @@
+# py-scripts
+
+Python 脚本工具集（用 `uv` 管理依赖与虚拟环境），用于在**不重新编译 Rust 主程序**的情况下，复用当前运行中的 PrismGuard 服务做诊断与评测。
+
+## 快速开始
+
+在仓库根目录执行：
+
+```bash
+cd py-scripts
+uv venv
+uv pip install -e .
+```
+
+运行离线评测脚本（示例，直接只读打开 `history.rocks`，本地推理，计算多阈值 F1）：
+
+```bash
+uv run python eval_profile.py --profile 4claudecode
+```
+
+快速试跑（限制样本数）：
+
+```bash
+uv run python eval_profile.py --profile 4claudecode --limit 2000
+```
+
+说明：
+- 默认走 `scikit-learn` 的 `HashingVectorizer` 快速路径（更接近 Rust/训练时的 Hashing 行为，也快很多）
+- 如果环境缺少依赖，会自动回退到纯 Python 慢路径（用于小样本调试）
+
+## 单条耗时对比（Python vs Rust 在线服务）
+
+基准脚本：
+
+```bash
+uv run python bench_single.py --profile 4claudecode
+```
+
+说明：
+- Python 侧测的是“纯本地推理”耗时（不含 HTTP、不含 DB 扫描）
+- Rust 侧用现有的 `/debug/profile/<profile>/metrics?sample_size=1` 作为近似（包含 HTTP + 服务端 DB 读取 + JSON 编解码开销），因此是偏保守的上界
+
+## 修复 RocksDB 元数据（meta:count 等）
+
+当程序异常终止，可能出现 `meta:*`（计数/next_id）与实际 `sample:*` 不一致，进而导致采样、指标、查找性能退化。
+
+修复脚本：`repair_rocks_meta.py`
+
+特性：
+- 先检查 RocksDB 的 `LOCK` 文件能否加锁（被占用会直接报错退出）
+- 默认 **dry-run**（只打印计划，不写入）
+- 可选重建 `text_latest:*` 指针（`--fix-pointers`）
+
+示例：
+
+```bash
+cd py-scripts
+uv run python repair_rocks_meta.py --profile 4claudecode
+uv run python repair_rocks_meta.py --profile 4claudecode --apply
+uv run python repair_rocks_meta.py --profile 4claudecode --fix-pointers --apply
+```