English | 简体中文
The experience quality gate keeps ChatCrystal focused on reusable experience assets instead of raw conversation summaries.
ChatCrystal should preserve conversations that contain:
- problem-solving process
- multi-step reasoning or analysis
- decisions and tradeoffs
- verified outcomes
- reusable patterns
It should filter or downgrade:
- single-turn informational Q&A
- low-density approval messages
- raw logs without analysis
- abandoned brainstorms
- implementation notes without verification
- content with no durable conclusion
The gate combines:
- Lexical signals extracted from conversation messages.
- Prefilter rules that reject obvious low-signal cases before an LLM judge is needed.
- Structured judge dimensions for candidates that pass prefilter:
problem_clarityprocess_depthdecision_valueoutcome_closurereuse_potential
- Core enforcement during both summarization and MCP writeback.
The gate is deliberately hybrid: deterministic rules catch simple cases, while structured scoring handles dense or nuanced experience candidates.
Rejected conversations do not create notes. Audit details are stored on the conversation row:
experience_scoreexperience_gate_reasonexperience_gate_detailsstatus = filtered
This makes filtering reviewable and keeps future retry workflows possible.
Run the calibration suite:
npm run eval:experience -w serverThe default sample set lives at:
server/src/services/experience/eval-samples.json
The current default set contains 37 calibration cases and must pass with no false accepts or false rejects.
The default sample set is declared as:
{
"origin": "synthetic_calibration_cases",
"contains_real_user_data": false
}These samples are hand-authored calibration cases. They are not copied from a local ChatCrystal database or raw private conversation logs.
Privacy tests reject common sensitive patterns, including:
- absolute local user paths
- personal user names
- email addresses
- private IP ranges and loopback literals
- secret-like tokens
- private key material
When adding real examples later, use desensitized samples and update the provenance metadata intentionally.
Add a sample when the gate makes a meaningful false accept or false reject.
Each useful sample should include:
id: stable kebab-case identifierlabel: human-readable scenarioexpected_decision:acceptorrejectmessages: minimal conversation evidencejudge_dimensions: required when the sample should pass prefilternotes: why the case matters
Prefer small high-signal cases over large pasted transcripts.
The next product step should make gate decisions reviewable:
- Surface filtered conversations and reasons in UI or CLI.
- Let a user mark a case as "should keep" or "correctly filtered".
- Feed false accepts and false rejects back into the calibration set.
- Re-run
npm run eval:experience -w serverbefore changing thresholds.
The quality gate should evolve from real review outcomes, not intuition alone.