You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/overviews/ethics.md
+11-8Lines changed: 11 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,28 +11,31 @@ One of our most discussed features is the **"Confrontational"** tone. Here is ho
11
11
***The Burnout Safety Valve**: If the `ExcuseDetector` identifies signs of fatigue, the system **blocks** confrontational escalation and triggers a "Burnout Alert" for the manager instead.
12
12
***Tone Drift & Cooling-off**: To prevent morale fatigue, the system implements **mathematical tone-damping**. If a user receives **3 consecutive "Firm" or "Confrontational" follow-ups**, the logic automatically locks the agent into a `NEUTRAL` or `SUPPORTIVE` state for 48 hours (configurable via `COOLING_OFF_PERIOD_HOURS`).
13
13
14
-
## 2. Nuanced Hard-Blocking (The "Ethics Firewall")
15
-
We explicitly distinguish between "Business Aggression" and "HR Violations." The **Safety Supervisor** enforces a **Semantic Firewall**:
16
-
***BLOCKED (HR Territory)**: Discussions involving `Salary`, `PIP` (Performance Improvement Plans), `Firing`, or `Legal Threats` are immediately blocked. This is a hard-coded safety guarantee.
17
-
***ALLOWED (Business Territory)**: Aggressive discussions about `Pricing Models`, `Budgeting`, or `Resource Allocation` are permitted as valid professional discourse.
14
+
## 2. Industry-Specific Semantic Firewall 🧱
15
+
CommitVigil provides hard-coded safety guarantees for regulated industries:
16
+
***Healthcare (HIPAA)**: The system hard-blocks unauthorized medical mandates or PII disclosure.
17
+
***Finance (SEC)**: Prevents the agent from accidentally facilitating market manipulation or providing unregulated financial advice.
18
+
***HR Territory**: Discussions involving `Salary`, `PIP` (Performance Improvement Plans), or `Firing` are immediately escalated to human review.
18
19
19
20
20
21
21
-
## 2. Cultural & Contextual Sensitivity
22
-
"Deflection" is relative. What is seen as blunt in one culture is polite in another:
22
+
## 3. Continuous Learning & ROI Metrics 📈
23
+
CommitVigil doesn't just act; it learns:
24
+
***Manager Feedback Loop**: Every intervention can be reviewed by a supervisor. Their "Accept/Modify/Reject" decisions are persisted.
25
+
***ROI Dashboard**: The system calculates the **Intervention Acceptance Rate** to quantify the AI's alignment with management intent.
23
26
24
27
***Sensitivity Calibration**: CommitVigil supports **Cultural Tone Profiles**. Managers can calibrate the "Pressure Sensitivity" of the agents to match their specific team norms (e.g., High-Directness vs. High-Context locales).
25
28
***Domain-Specific Jargon**: The NLP models are refined to recognize that certain industry vernacular (e.g., *"I'm swamped"*) may be a routine status update rather than an excuse in specific high-velocity teams.
26
29
27
-
## 3. Privacy & Data Integrity
30
+
## 4. Privacy & Data Integrity
28
31
29
32
Monitoring at the granularity of Slack threads and Git commits requires a strict privacy stance:
30
33
31
34
***Scoped Monitoring**: CommitVigil is designed to monitor designated `#project` channels, not private DMs or unrelated chatter.
32
35
***Source-Level Only**: Commit monitoring is restricted to commit messages and PR metadata—not the proprietary logic within the source code files themselves.
33
36
***Identity Anonymization**: Internal IDs are used for analysis; real names can be masked in the database if necessary.
34
37
35
-
## 3. Handling Ambiguity (The "100% Visibility" Claim) 🧠
38
+
## 5. Handling Ambiguity (The "100% Visibility" Claim) 🧠
36
39
Ambiguity is the greatest challenge in Engineering NLP. Here is how we move toward high accuracy:
37
40
38
41
***Confidence Scores**: Every extraction (Commitment, Risk, Excuse) is accompanied by a `confidence_score`.
detected_gap="The developer promised to refactor the API, fix CSS, and update docs, but only updated some typos in README. No major code changes detected.",
160
164
risk_to_system_stability=0.8,
161
-
intervention_required=True
165
+
intervention_required=True,
162
166
)
163
167
gap=TruthGapAnalysis(
164
168
gap_detected=True,
165
169
truth_score=0.1,
166
170
explanation="The user claims to be 90% done with the refactor, but the technical evidence only shows updates to typos in the README with no major code changes detected.",
0 commit comments