Skip to content

Commit 2b475a9

Browse files
author
Dwi Fahni Denni
committed
feat: Update profile agent for critical thinking and no hallucination. Add capabilities for cybersecurity defense and auto reporting RCA and Postmortem
1 parent 7630cf8 commit 2b475a9

61 files changed

Lines changed: 5814 additions & 832 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.example

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ TELEMETRYFLOW_KEY_SECRET=
4343
# Default: http://localhost:3000/api/v2
4444
TELEMETRYFLOW_API_URL=http://localhost:3000/api/v2
4545

46+
# Database name (shared by PostgreSQL and ClickHouse)
47+
# Default: telemetryflow_db
48+
TELEMETRYFLOW_DB_NAME=telemetryflow_db
49+
4650
# Organization ID (REQUIRED for all LLM endpoints)
4751
# Every LLM API call is scoped to an organization.
4852
# Find yours: TelemetryFlow UI > Settings > Organization > Copy ID
@@ -124,18 +128,35 @@ LLM_ENCRYPTION_KEY=
124128
# NOTE: Hermes tools query ClickHouse through the TelemetryFlow API
125129
# (/telemetry/query), NOT directly. These are only used by
126130
# direct query tools or debugging scripts.
131+
# CLICKHOUSE_DATABASE should match TELEMETRYFLOW_DB_NAME above.
127132
CLICKHOUSE_HOST=localhost
128133
CLICKHOUSE_PORT=9000
129134
CLICKHOUSE_HTTP_PORT=8123
130135
CLICKHOUSE_USER=hermes_readonly
131136
CLICKHOUSE_PASSWORD=
132-
CLICKHOUSE_DATABASE=telemetryflow
137+
CLICKHOUSE_DATABASE=telemetryflow_db
133138

134139
# ================================================================
135140
# Kubernetes
136141
# ================================================================
137142
KUBECONFIG=~/.kube/config
138143

144+
# ================================================================
145+
# Jira Integration (for RCA ticket submission)
146+
# ================================================================
147+
# JIRA_URL=https://your-domain.atlassian.net
148+
# JIRA_EMAIL=your-email@example.com
149+
# JIRA_API_TOKEN=your-jira-api-token
150+
# JIRA_PROJECT_KEY=OPS
151+
152+
# ================================================================
153+
# Trello Integration (for RCA card submission)
154+
# ================================================================
155+
# TRELLO_API_KEY=your-trello-api-key
156+
# TRELLO_API_TOKEN=your-trello-api-token
157+
# TRELLO_BOARD_ID=your-board-id
158+
# TRELLO_LIST_ID_INCIDENTS=your-incidents-list-id
159+
139160
# ================================================================
140161
# Telegram Gateway (per-profile bot tokens)
141162
# Each agent needs its own bot (Telegram allows 1 connection per token)
@@ -168,7 +189,7 @@ TELEGRAM_CHAT_ID_REMEDIATOR=
168189
# POSTGRES_PORT=5432
169190
# POSTGRES_USER=
170191
# POSTGRES_PASSWORD=
171-
# POSTGRES_DB=telemetryflow
192+
# POSTGRES_DB=${TELEMETRYFLOW_DB_NAME}
172193
# REDIS_HOST=localhost
173194
# REDIS_PORT=6379
174195
# PORT=3000

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ on:
4949
env:
5050
PYTHON_VERSION: "3.12"
5151
PRODUCT_NAME: TelemetryFlow Hermes
52-
VERSION: "2.0.0"
52+
VERSION: "1.2.0"
5353

5454
permissions:
5555
contents: read

.github/workflows/docker.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ on:
4141
workflow_dispatch:
4242
inputs:
4343
version:
44-
description: 'Version tag (e.g., 2.0.0)'
44+
description: 'Version tag (e.g., 1.2.0)'
4545
required: false
4646
default: ''
4747
push:

.github/workflows/release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@ on:
2424
workflow_dispatch:
2525
inputs:
2626
version:
27-
description: "Version to release (e.g., 2.0.0)"
27+
description: "Version to release (e.g., 1.2.0)"
2828
required: true
29-
default: "2.0.0"
29+
default: "1.2.0"
3030
prerelease:
3131
description: "Mark as pre-release"
3232
required: false

.gitlab-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ stages:
1919
variables:
2020
PYTHON_VERSION: "3.12"
2121
PRODUCT_NAME: "TelemetryFlow Hermes"
22-
VERSION: "2.0.0"
22+
VERSION: "1.2.0"
2323
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
2424

2525
default:

CHANGELOG.md

Lines changed: 136 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@
77

88
<h3>TelemetryFlow Hermes — Self-Improving AI Agent for Observability Incident Response</h3>
99

10-
[![Version](https://img.shields.io/badge/Version-1.0.0-orange.svg)](CHANGELOG.md)
10+
[![Version](https://img.shields.io/badge/Version-1.2.0-orange.svg)](CHANGELOG.md)
1111
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
1212
[![Python](https://img.shields.io/badge/Python-3.8+-3776AB?logo=python)](https://www.python.org/)
1313
[![Hermes](https://img.shields.io/badge/Hermes-Agent-00d4aa)](https://github.com/NousResearch/hermes-agent)
14-
[![Tests](https://img.shields.io/badge/Tests-458%20passing-brightgreen.svg)](tests/)
15-
[![Coverage](https://img.shields.io/badge/Coverage-97%25-brightgreen.svg)](tests/)
16-
[![Tools](https://img.shields.io/badge/Tools-37%20Plugin-blueviolet)](plugins/telemetryflow/plugin.yaml)
14+
[![Tests](https://img.shields.io/badge/Tests-521%20passing-brightgreen.svg)](tests/)
15+
[![Coverage](https://img.shields.io/badge/Coverage-99%25-brightgreen.svg)](tests/)
16+
[![Tools](https://img.shields.io/badge/Tools-40%20Plugin-blueviolet)](plugins/telemetryflow/plugin.yaml)
1717
[![ContextTypes](https://img.shields.io/badge/ContextTypes-74-9cf)](docs/api/context-types.md)
1818
[![ClickHouse](https://img.shields.io/badge/ClickHouse-Readonly-FFCC00?logo=clickhouse)](security/clickhouse-readonly.sql)
1919
[![Docs](https://img.shields.io/badge/Docs-28%20Pages-informational)](docs/)
@@ -29,6 +29,138 @@ All notable changes to **TelemetryFlow Hermes** will be documented in this file.
2929
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
3030
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
3131

32+
## [1.2.0] - 2026-06-05
33+
34+
### Summary
35+
36+
**RCA reports, postmortem generation, cybersecurity defense, full ClickHouse access, and manual review templates.**
37+
38+
Three new tools for automated incident reporting: `generate_rca_report` produces full Root Cause Analysis with 5W analysis, mermaid timeline diagrams, and Jira/Trello ticket summaries. `generate_postmortem` generates comprehensive postmortem reports with lessons learned and action items. `generate_rca_template` provides a blank template for manual human review. All four agent profiles now have cybersecurity defense postures and full access to all 20 ClickHouse read-only tables.
39+
40+
### Added
41+
42+
#### RCA & Postmortem Reports — 3 New Tools
43+
44+
- **`generate_rca_report`** — Full Root Cause Analysis report with:
45+
- 5W analysis (What, Where, When, Why, How)
46+
- Impact assessment with before/during/after metrics
47+
- Mermaid timeline diagram with all events
48+
- Mermaid incident response flow diagram
49+
- Blast radius analysis
50+
- Contributing factors table
51+
- Action items with owners and priorities
52+
- Lessons learned
53+
- Actions: `rca` (report only), `jira` (report + Jira ticket), `trello` (report + Trello card), `all` (report + both)
54+
- Jira/Trello submission dual-gated by `JIRA_ENABLED`/`TRELLO_ENABLED` env vars + `--submit true` flag
55+
- Force-submit actions: `jira-submit`, `trello-submit`, `submit` (bypass enabled flags)
56+
- **`generate_postmortem`** — Comprehensive postmortem report with:
57+
- Detailed timeline with mermaid diagram
58+
- 5W analysis
59+
- Remediation flow diagram
60+
- Lessons learned (what went well, what to improve, where lucky)
61+
- Action items table
62+
- Appendix with alert payload and metrics snapshot
63+
- Actions: `postmortem` (full report), `template` (blank template)
64+
- **`generate_rca_template`** — Blank RCA template for manual review with:
65+
- Document control section
66+
- Impact assessment with blast radius mermaid diagram
67+
- 5W analysis with placeholder fields
68+
- Detailed timeline with mermaid
69+
- Root cause deep dive with causal chain diagram
70+
- Contributing factors table
71+
- Lessons learned checklists
72+
- Action items with Jira/Trello ticket references
73+
- Approval signature section
74+
75+
#### Cybersecurity Defense — All 4 Agents
76+
77+
- **Triage Agent** — Threat-informed triage with security classification override. Red flag patterns for credential stuffing, SQL injection, insider threats, cryptojacking, data exfiltration, lateral movement, privilege escalation. `SECURITY_FLAG` delegation context.
78+
- **Investigator Agent** — Security hypothesis generation alongside operational hypotheses. Mandatory security evidence queries (audit logs, auth patterns, network map, IAM, SSO). Attack pattern recognition table. Security escalation protocol.
79+
- **Reviewer Agent** — Security review checklist for every investigation. Cover-up detection for "accidental" data deletion, performance degradation masking exfiltration, deploy rollback that changes security configs. Security verdict override capability.
80+
- **Remediator Agent** — Security-aware remediation checks (forensic evidence destruction, access control weakening, attack surface creation). Containment-first protocol for security incidents. Post-action security verification (audit logs, RBAC, secrets, network policies).
81+
82+
#### Full ClickHouse Access — All 4 Profiles
83+
84+
- All 4 agent `config.yaml` files now include all 20 ClickHouse read-only tables (matching `security/clickhouse-readonly.sql`):
85+
- Triage: 2 → 20 tables (removed non-existent `alert_rules`)
86+
- Investigator: 6 → 20 tables
87+
- Reviewer: 6 → 20 tables
88+
- Remediator: 2 → 20 tables
89+
90+
### Changed
91+
92+
- Plugin version: 3.0.0 → 1.2.0 (aligned with project versioning)
93+
- `post-remediation.sh` hook now auto-generates RCA report after successful remediation, saves to `~/.hermes/reports/`
94+
- `pyproject.toml` — added `SIM105`, `SIM117` to ruff ignore list (try/except/pass and nested with patterns needed for graceful query failure handling)
95+
- Test suite: 458 → 521 tests (63 new), coverage: 97.38% → 99.08%, source files: 38 → 41
96+
97+
### Fixed
98+
99+
- `generate_rca_report.py``_generate_impact_metrics` now handles list responses from ClickHouse queries
100+
- `generate_rca_report.py` / `generate_postmortem.py` — added `return` after `sys.exit(1)` for mocked test environments
101+
102+
## [1.1.0] - 2026-06-05
103+
104+
### Summary
105+
106+
**Agent personality overhaul, dynamic database configuration, simplified Makefile, and Docker deployment refinements.**
107+
108+
Agent SOUL.md files rewritten with brutally honest, adversarial, debate-oriented personalities. Each agent now operates as a scientist who challenges other agents — no hallucination, no hedging, evidence-only reasoning. The Triage agent classifies with zero tolerance for uncertainty. The Investigator treats every hypothesis as guilty until proven innocent. The Reviewer is a hostile skeptic. The Remediator is a cautious pragmatist who refuses to act without proof.
109+
110+
### Added
111+
112+
#### Agent Personalities — Adversarial Debate Framework
113+
114+
- **Triage Agent** — Paranoid gatekeeper. Assumes alerts lie until proven truthful. Zero hallucination policy with banned vocabulary ("I think", "probably"). New INCOMPLETE classification for ambiguous alerts. Issues challenges to Investigator: "Prove me right or prove me wrong."
115+
- **Investigator Agent** — Hostile scientist. Treats every hypothesis as guilty until proven innocent with data. Falsification-first protocol. Zero tolerance for narrative without numbers. Cross-examines own findings before submitting. Demands the Reviewer tear the hypothesis apart.
116+
- **Reviewer Agent** — Skeptic devils advocate. Actively hunts for reasons the investigation is wrong. Falsification protocol: tries to break the hypothesis before accepting it. Flags unstated assumptions as speculation. Only verdicts: CONFIRMED, NEEDS_MORE_EVIDENCE, REJECTED — no "looks good to me."
117+
- **Remediator Agent** — Cautious pragmatist. Refuses to act without a confirmed verdict from Reviewer. Every action includes blast radius analysis. First question: "What breaks if I am wrong?" Post-action verification is mandatory, not optional.
118+
119+
#### Dynamic Database Configuration
120+
121+
- `TELEMETRYFLOW_DB_NAME` environment variable — single source of truth for database name (default: `telemetryflow_db`)
122+
- `docker-compose.yaml` — all PostgreSQL and ClickHouse references use `${TELEMETRYFLOW_DB_NAME:-telemetryflow_db}`
123+
- `security/clickhouse-readonly.sql` — uses `${TELEMETRYFLOW_DB_NAME}` placeholder, substituted by `setup-readonly-user.sh`
124+
- `security/setup-readonly-user.sh` — reads `TELEMETRYFLOW_DB_NAME` and performs runtime substitution into SQL
125+
- `.env.example` — new `TELEMETRYFLOW_DB_NAME=telemetryflow_db` in Platform Connection section
126+
127+
#### Simplified Makefile
128+
129+
- `make init` — one-command first-time setup (install hermes → configure → deploy)
130+
- `make configure` — copy .env, install config, profiles, skills, plugins, cron, hooks
131+
- `make env` — setup `.env` from `.env.example` + install `config.yaml` + `SOUL.md`
132+
- `make docker-build` / `make docker-up` / `make docker-down` — Docker shortcuts
133+
- `make stop` — stop all agent gateways
134+
- `make start` — install deps + configure
135+
- `make reset` — clean + re-configure
136+
137+
#### Docker Deployment
138+
139+
- `docker-compose.yaml` — 4 profiles: `core` (backend + frontend + postgres + clickhouse + redis + nats), `monitoring` (tfo-collector + tfo-agent + jaeger), `tools` (portainer), `all`
140+
- `Dockerfile` — single-stage, python:3.13-slim-trixie, CVE patching, non-root user
141+
- `run-container.sh` — build, tag, push, compose orchestration with `--up`/`--down`/`--profile` flags
142+
143+
#### CI/CD
144+
145+
- `.github/workflows/docker.yml` — multi-platform build (amd64/arm64), Docker Hub, SBOM, Trivy scan
146+
- `.github/workflows/ci.yml` — 7-job CI with split test-unit/test-integration, matrix 3.10-3.13
147+
- `.github/workflows/release.yml` — tag-triggered release with checksums
148+
149+
### Changed
150+
151+
- Version unified to `1.1.0` across all files (pyproject.toml, docker-compose.yaml, run-container.sh, CI workflows, GitLab CI)
152+
- `docs/architecture.md` — directory structure expanded with all 37 tools (6 categories), Docker/CI files, tests, 18 skill categories
153+
- `docs/security/clickhouse-readonly.md` — all SQL examples use `${TELEMETRYFLOW_DB_NAME}` placeholder, automated setup reads env var
154+
- `docs/operations/troubleshooting.md` — ClickHouse queries use `${TELEMETRYFLOW_DB_NAME:-telemetryflow_db}`
155+
- `CONTRIBUTING.md` — project structure updated with Docker/CI files, skill count corrected
156+
- Agent SOUL.md files completely rewritten (triage, investigator, reviewer, remediator) — from polite operators to adversarial scientists
157+
158+
### Removed
159+
160+
- Hardcoded `telemetryflow_db` / `telemetryflow` database name references (replaced by `TELEMETRYFLOW_DB_NAME` env var)
161+
- Old Makefile `setup` target (replaced by `init` + `configure`)
162+
- Old script-based `profiles` target (replaced by inline Makefile logic)
163+
32164
## [1.0.0] - 2026-06-04
33165

34166
### Summary

0 commit comments

Comments
 (0)