Skip to content

Commit d668379

Browse files
ChengaDevclaude
andcommitted
Move docs to docs/ folder and expand contributing guide
- WHY_OPSAGENT.md, EXAMPLES.md, RUNNING_LOCALLY.md → docs/ - Update README nav links to docs/ paths - CONTRIBUTING.md: add "most wanted" section with notification channel wishlist (PagerDuty, Teams, Discord, Opsgenie, Datadog, Email, Telegram), step-by-step guide for adding a new channel, and gaps for log patterns Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent b359d8d commit d668379

5 files changed

Lines changed: 367 additions & 7 deletions

File tree

CONTRIBUTING.md

Lines changed: 45 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Contributing to OpsAgent
22

3-
Contributions are welcome. Please open an issue first for anything beyond a small bug fix so we can align on direction before you invest time in a PR.
3+
Contributions are very welcome — OpsAgent is a young project and there is a lot of room to grow. Please open an issue first for anything beyond a small bug fix so we can align on direction before you invest time in a PR.
44

55
## Requirements for every PR
66

@@ -18,11 +18,51 @@ pip install -e ".[all-providers]"
1818
pytest tests/ -v
1919
```
2020

21-
## Good first areas
21+
---
2222

23-
- **New MCP servers** — Jira, PagerDuty, Datadog log fetcher, `kubectl` live pod state
24-
- **New log patterns** — add to `_PATTERNS` in `mcp_tools/log_analyzer.py` with a matching fixture and test
25-
- **Streaming output** — stream Claude's reasoning in real time
23+
## Most wanted contributions
24+
25+
### 🔔 New notification channels
26+
27+
This is the highest-impact area right now. OpsAgent currently supports Slack, generic webhooks, and GitHub PR comments. We'd love to add:
28+
29+
| Channel | Notes |
30+
|---|---|
31+
| **PagerDuty** | Create an incident via the Events API v2 |
32+
| **Microsoft Teams** | Adaptive Card payload via Incoming Webhook |
33+
| **Discord** | Embed payload via Discord webhook |
34+
| **Opsgenie** | Create alert via Opsgenie REST API |
35+
| **Datadog** | Post event to Datadog Events API |
36+
| **Email** | SMTP or SendGrid for direct email delivery |
37+
| **Telegram** | Bot API message to a chat or channel |
38+
39+
Each channel lives in `mcp_tools/notification_server.py` as a new MCP tool. Follow the pattern of `send_slack_notification` — accept a webhook URL or token via env var, build the payload, send it, return a success/error string.
40+
41+
### 🔍 New log patterns
42+
43+
Add to `_PATTERNS` in `mcp_tools/log_analyzer.py` with a matching fixture log and test. Common gaps:
44+
45+
- Ruby / Bundler errors
46+
- Gradle / Maven build failures
47+
- Go module errors
48+
- Rust / Cargo compilation errors
49+
50+
### 🛠️ New MCP servers
51+
52+
- **Jira** — create or update a ticket from the RCA
53+
- **Datadog** — fetch recent logs or metrics for a service
54+
- **`kubectl`** — live pod state, describe, events
55+
56+
---
57+
58+
## Adding a new notification channel
59+
60+
1. Add a new `@mcp.tool()` function in `mcp_tools/notification_server.py`
61+
2. Accept the target URL / token as a parameter (callers pass it from env)
62+
3. Build the channel-specific payload and POST it with `httpx`
63+
4. Return a plain string: `"✓ Sent"` or `"✗ Error: <message>"`
64+
5. Add a test in `tests/test_notification_server.py` using `respx` to mock the HTTP call
65+
6. Document the new env var in the README CLI reference table
2666

2767
## Adding a new log pattern
2868

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
[![Release](https://github.com/ChengaDev/opsagent/actions/workflows/release.yml/badge.svg)](https://github.com/ChengaDev/opsagent/actions/workflows/release.yml)
77
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
88

9-
[Why OpsAgent?](WHY_OPSAGENT.md) · [Examples](EXAMPLES.md) · [Running locally](RUNNING_LOCALLY.md) · [Contributing](CONTRIBUTING.md)
9+
[Why OpsAgent?](docs/why-opsagent.md) · [Examples](docs/examples.md) · [Running locally](docs/running-locally.md) · [Contributing](CONTRIBUTING.md)
1010

1111
---
1212

@@ -118,7 +118,7 @@ pip install -e .
118118
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
119119
```
120120
121-
See [EXAMPLES.md](EXAMPLES.md) for Python, Node.js, Helm, Terraform, GitLab CI, Jenkins, and more.
121+
See [docs/examples.md](docs/examples.md) for Python, Node.js, Helm, Terraform, GitLab CI, Jenkins, and more.
122122
123123
---
124124

docs/examples.md

Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
# GitHub Actions Examples
2+
3+
> **Tip:** Always use `set -o pipefail` before piping through `tee` — without it, the pipeline returns `tee`'s exit code (0) even when your command fails, so `if: failure()` never triggers.
4+
5+
## Python / pytest
6+
7+
```yaml
8+
jobs:
9+
test:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v4
13+
- uses: actions/setup-python@v5
14+
with:
15+
python-version: '3.11'
16+
17+
- name: Install dependencies
18+
run: pip install -r requirements.txt
19+
20+
- name: Run tests
21+
run: |
22+
set -o pipefail
23+
pytest tests/ -v 2>&1 | tee "${{ runner.temp }}/pytest.log"
24+
25+
- name: Run OpsAgent RCA
26+
if: failure()
27+
uses: ChengaDev/opsagent@v1
28+
with:
29+
log-path: ${{ runner.temp }}/pytest.log
30+
workspace: ${{ github.workspace }}
31+
slack-webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
32+
env:
33+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
34+
```
35+
36+
## Node.js / npm
37+
38+
```yaml
39+
jobs:
40+
build:
41+
runs-on: ubuntu-latest
42+
steps:
43+
- uses: actions/checkout@v4
44+
- uses: actions/setup-node@v4
45+
with:
46+
node-version: '20'
47+
48+
- name: Install and build
49+
run: |
50+
npm ci
51+
set -o pipefail
52+
npm run build 2>&1 | tee "${{ runner.temp }}/build.log"
53+
54+
- name: Run tests
55+
run: |
56+
set -o pipefail
57+
npm test 2>&1 | tee "${{ runner.temp }}/test.log"
58+
59+
- name: Run OpsAgent RCA
60+
if: failure()
61+
uses: ChengaDev/opsagent@v1
62+
with:
63+
log-path: ${{ runner.temp }}/test.log
64+
workspace: ${{ github.workspace }}
65+
slack-webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
66+
env:
67+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
68+
```
69+
70+
## Post RCA as a PR comment
71+
72+
```yaml
73+
- name: Run OpsAgent RCA
74+
if: failure()
75+
uses: ChengaDev/opsagent@v1
76+
with:
77+
log-path: ${{ runner.temp }}/test.log
78+
workspace: ${{ github.workspace }}
79+
github-token: ${{ secrets.GITHUB_TOKEN }}
80+
env:
81+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
82+
```
83+
84+
OpsAgent posts the full RCA as a comment on the pull request that triggered the failure — no webhook configuration needed.
85+
86+
## Save the RCA to a file
87+
88+
```yaml
89+
- name: Run OpsAgent RCA
90+
if: failure()
91+
uses: ChengaDev/opsagent@v1
92+
with:
93+
log-path: ${{ runner.temp }}/test.log
94+
workspace: ${{ github.workspace }}
95+
output: ${{ runner.temp }}/rca.md
96+
env:
97+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
98+
99+
- name: Upload RCA report
100+
if: failure()
101+
uses: actions/upload-artifact@v4
102+
with:
103+
name: rca-report
104+
path: ${{ runner.temp }}/rca.md
105+
```
106+
107+
## Use a custom model
108+
109+
```yaml
110+
- name: Run OpsAgent RCA
111+
if: failure()
112+
uses: ChengaDev/opsagent@v1
113+
with:
114+
log-path: ${{ runner.temp }}/test.log
115+
workspace: ${{ github.workspace }}
116+
model: claude-opus-4-6
117+
investigate-model: claude-haiku-4-5-20251001
118+
env:
119+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
120+
```
121+
122+
## Use a different provider
123+
124+
```yaml
125+
# Google Gemini
126+
- name: Run OpsAgent RCA
127+
if: failure()
128+
uses: ChengaDev/opsagent@v1
129+
with:
130+
log-path: ${{ runner.temp }}/test.log
131+
workspace: ${{ github.workspace }}
132+
provider: google
133+
env:
134+
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
135+
```
136+
137+
```yaml
138+
# OpenAI
139+
- name: Run OpsAgent RCA
140+
if: failure()
141+
uses: ChengaDev/opsagent@v1
142+
with:
143+
log-path: ${{ runner.temp }}/test.log
144+
workspace: ${{ github.workspace }}
145+
provider: openai
146+
env:
147+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
148+
```
149+
150+
## CD pipeline — Helm deploy
151+
152+
```yaml
153+
- name: Deploy
154+
run: |
155+
set -o pipefail
156+
helm upgrade --install my-service ./charts/my-service \
157+
--namespace production \
158+
--set image.tag=${{ github.sha }} \
159+
--wait --timeout 5m 2>&1 | tee "${{ runner.temp }}/deploy.log"
160+
161+
- name: Run OpsAgent RCA
162+
if: failure()
163+
uses: ChengaDev/opsagent@v1
164+
with:
165+
log-path: ${{ runner.temp }}/deploy.log
166+
workspace: ${{ github.workspace }}
167+
slack-webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
168+
webhook-url: ${{ secrets.WEBHOOK_URL }}
169+
env:
170+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
171+
```
172+
173+
## CD pipeline — Terraform
174+
175+
```yaml
176+
- name: Terraform apply
177+
run: |
178+
set -o pipefail
179+
terraform apply -auto-approve 2>&1 | tee "${{ runner.temp }}/tf.log"
180+
181+
- name: Run OpsAgent RCA
182+
if: failure()
183+
uses: ChengaDev/opsagent@v1
184+
with:
185+
log-path: ${{ runner.temp }}/tf.log
186+
workspace: ${{ github.workspace }}
187+
slack-webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
188+
webhook-url: ${{ secrets.WEBHOOK_URL }}
189+
env:
190+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
191+
```
192+
193+
## GitLab CI
194+
195+
```yaml
196+
rca:
197+
stage: .post
198+
when: on_failure
199+
script:
200+
- pip install "git+https://github.com/ChengaDev/opsagent.git[all-providers]"
201+
- opsagent --log-path build.log --workspace .
202+
variables:
203+
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
204+
```
205+
206+
## Jenkins
207+
208+
```groovy
209+
post {
210+
failure {
211+
sh '''
212+
pip install "git+https://github.com/ChengaDev/opsagent.git[all-providers]"
213+
opsagent --log-path build.log --workspace .
214+
'''
215+
}
216+
}
217+
```

docs/running-locally.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Running OpsAgent Locally
2+
3+
## Setup
4+
5+
```bash
6+
git clone https://github.com/ChengaDev/opsagent.git
7+
cd opsagent
8+
python3 -m venv .venv && source .venv/bin/activate
9+
pip install -e ".[all-providers]"
10+
cp .env.example .env
11+
# Add your API key to .env
12+
```
13+
14+
## Mock mode — no API key needed
15+
16+
The `demo.py` script runs the **full LangGraph pipeline** with a mock LLM against realistic fixture logs. Real MCP servers start, real tools execute, only the LLM response is mocked.
17+
18+
```bash
19+
python demo.py # default: python import error
20+
python demo.py --fixture oom_killed.log # OOM killed container
21+
python demo.py --fixture test_failure.log # pytest failures
22+
python demo.py --fixture k8s_crash_loop.log # Kubernetes CrashLoopBackOff
23+
python demo.py --fixture helm_upgrade_failed.log # Helm upgrade timeout
24+
python demo.py --fixture terraform_error.log # Terraform apply error
25+
python demo.py --fixture registry_auth_error.log # Docker registry auth failure
26+
python demo.py --fixture health_check_failed.log # readiness probe failure
27+
python demo.py --list # show all available fixtures
28+
```
29+
30+
## Production mode — real LLM
31+
32+
```bash
33+
# Anthropic (default)
34+
python cli.py \
35+
--log-path tests/fixtures/k8s_crash_loop.log \
36+
--workspace .
37+
38+
# OpenAI
39+
python cli.py --provider openai \
40+
--log-path tests/fixtures/k8s_crash_loop.log \
41+
--workspace .
42+
43+
# Google Gemini
44+
python cli.py --provider google \
45+
--log-path tests/fixtures/k8s_crash_loop.log \
46+
--workspace .
47+
48+
# Custom models
49+
python cli.py --provider anthropic \
50+
--model claude-opus-4-6 \
51+
--investigate-model claude-haiku-4-5-20251001 \
52+
--log-path tests/fixtures/k8s_crash_loop.log \
53+
--workspace .
54+
55+
# With Slack notification and saved report
56+
python cli.py \
57+
--log-path tests/fixtures/helm_upgrade_failed.log \
58+
--workspace . \
59+
--slack-url "$SLACK_WEBHOOK_URL" \
60+
--output rca_report.md
61+
62+
# With a generic webhook (Discord, Teams, PagerDuty)
63+
python cli.py \
64+
--log-path tests/fixtures/helm_upgrade_failed.log \
65+
--workspace . \
66+
--webhook-url "$WEBHOOK_URL"
67+
```
68+
69+
## Running tests
70+
71+
```bash
72+
pytest tests/ -v
73+
```
74+
75+
## Build executable locally
76+
77+
```bash
78+
pip install -e ".[build]"
79+
pyinstaller opsagent.spec
80+
./dist/opsagent --log-path tests/fixtures/terraform_error.log --workspace .
81+
```

0 commit comments

Comments
 (0)