Skip to content

Commit b90a50e

Browse files
authored
Enhance Slack integration and database backup workflows (#21)
* Update documentation and workflows for improved Slack integration and database backups * Enhance database backup workflow and documentation with unique object keys and improved restore instructions * Improve database backup workflow by adding an EXIT trap to remove temporary dump files on failure; update documentation for clarity on backup processes.
1 parent 8da43bf commit b90a50e

9 files changed

Lines changed: 139 additions & 50 deletions

File tree

.github/workflows/db-backup.yml

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,13 @@ on:
77

88
jobs:
99
backup:
10-
name: PostgreSQL Backup to GCS
10+
name: PostgreSQL Backup (${{ matrix.environment }})
1111
runs-on: ubuntu-latest
12+
strategy:
13+
fail-fast: false
14+
matrix:
15+
environment: [staging, production]
16+
environment: ${{ matrix.environment }}
1217
steps:
1318
- name: Dump and upload
1419
uses: appleboy/ssh-action@v1
@@ -19,9 +24,12 @@ jobs:
1924
port: ${{ secrets.SERVER_PORT || 22 }}
2025
script: |
2126
set -euo pipefail
22-
STAMP="$(date +%Y%m%d)"
23-
DUMP="/tmp/paperscout-${STAMP}.dump"
27+
STAMP="$(date -u +%Y%m%dT%H%M%SZ)"
28+
RUN_KEY="${{ github.run_id }}-${{ github.run_attempt }}"
29+
DUMP="/tmp/paperscout-${{ matrix.environment }}-${STAMP}-${RUN_KEY}.dump"
30+
DEST="gs://insights-db-backups/paperscout/${{ matrix.environment }}/paperscout-${STAMP}-${RUN_KEY}.dump"
31+
trap 'rm -f "$DUMP"' EXIT
2432
2533
sudo -u postgres pg_dump -Fc paperscout > "$DUMP"
26-
gsutil cp "$DUMP" "gs://paperscout-backups/paperscout-${STAMP}.dump"
34+
gsutil cp "$DUMP" "$DEST"
2735
rm -f "$DUMP"

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,4 @@ build/
3737
Icon?
3838
.com.apple.timemachine.donotpresent
3939
.VolumeIcon.icns
40+
.cursor

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111

12+
- Post the same Slack **status** summary as the interactive command to `NOTIFICATION_CHANNEL` once when the process starts (when that channel is configured).
1213
- Open-source hygiene: contributing guide, security policy, code of conduct, onboarding and handoff docs, pre-commit (Ruff), GitHub issue templates, Dependabot, CodeQL, CODEOWNERS template, and `.gitattributes`.
1314

15+
### Changed
16+
17+
- Documentation: deployment URLs (Slack Request URL behind nginx `/paperscout/`), clone URL in server setup, staging-style placeholders.
18+
- `db-backup.yml`: matrix parallel backups for `staging` / `production` using environment-level SSH secrets; uploads under `gs://insights-db-backups/paperscout/<environment>/` with unique temp files and object keys (UTC timestamp + `run_id` + `run_attempt` + environment); `EXIT` trap removes temp dump on failure. `SERVER_SETUP` restore examples updated (`--no-owner`, listing/copy by object name).
19+
1420
## [0.1.0] - 2026-05-05
1521

1622
### Added

README.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -113,15 +113,17 @@ python -m paperscout
113113
Once the scout is running and reachable at a public URL:
114114

115115
1. Go back to **Event Subscriptions** in the Slack app config
116-
2. Set **Request URL** to `https://your-server.com/slack/events`
117-
3. Slack will send a challenge request -- the scout responds automatically
116+
2. Set **Request URL** depending on how traffic reaches Bolt:
117+
- **Reverse proxy (recommended for production/staging):** If nginx terminates TLS and proxies under a path prefix (see [`deploy/paperscout.conf`](deploy/paperscout.conf)), Slack must use that prefix. Example: `https://your-domain.example.org/paperscout/slack/events` — not `https://your-domain.example.org/slack/events`.
118+
- **Direct to the app (local dev or ngrok without nginx):** Bolt serves `/slack/events` at the container root. Example: `https://staging.example.org/slack/events` or `https://abc123.ngrok-free.app/slack/events`.
119+
3. Slack will send a challenge request — the scout responds automatically
118120
4. Click **Save Changes**
119121

120-
For local testing with ngrok:
122+
For local testing with ngrok (traffic straight to `PORT`, no path prefix):
121123

122124
```bash
123125
ngrok http 3000
124-
# Use the ngrok URL: https://abc123.ngrok.io/slack/events
126+
# Use: https://<ngrok-host>/slack/events
125127
```
126128

127129
### 8. Invite the Scout
@@ -191,7 +193,7 @@ curl -sf http://localhost:9102/health
191193

192194
See [`deploy/SERVER_SETUP.md`](deploy/SERVER_SETUP.md) for the full Ubuntu 22.04 provisioning guide, and [`.github/workflows/cd.yml`](.github/workflows/cd.yml) for the CD pipeline.
193195

194-
Database backups run daily via [`.github/workflows/db-backup.yml`](.github/workflows/db-backup.yml), uploading `pg_dump` snapshots to Google Cloud Storage.
196+
Database backups run daily via [`.github/workflows/db-backup.yml`](.github/workflows/db-backup.yml): **matrix jobs** for **`staging`** and **`production`** run **in parallel**, each using that **GitHub Environment’s** SSH secrets (same names as CD: `SERVER_HOST`, `SERVER_USER`, `SERVER_SSH_KEY`, optional `SERVER_PORT`). Dumps are uploaded to **`gs://insights-db-backups/paperscout/<environment>/`** so staging and production stay under the shared **`paperscout`** prefix in the bucket.
195197

196198
## Scout Commands
197199

@@ -331,7 +333,7 @@ paperscout/
331333
.github/workflows/
332334
ci.yml Test matrix on push/PR to main
333335
cd.yml SSH deploy (git pull + build) on push to main
334-
db-backup.yml Daily pg_dump to Google Cloud Storage
336+
db-backup.yml Matrix pg_dump (staging + production) to GCS insights-db-backups/paperscout/<env>/
335337
```
336338

337339
### PostgreSQL Schema
@@ -453,8 +455,8 @@ A `concurrency` group keyed by branch prevents overlapping deploys to the same e
453455

454456
The `.github/workflows/db-backup.yml` workflow runs daily at 3 AM UTC (and supports manual dispatch):
455457

456-
1. SSHes into the server and runs `pg_dump` on the host's PostgreSQL
457-
2. Uploads the dump to Google Cloud Storage (`gs://paperscout-backups/`)
458-
3. Old backups are auto-pruned by a GCS lifecycle rule (30 days)
458+
1. Runs **two jobs in parallel** (matrix: `staging`, `production`), each bound to the matching **GitHub Environment** so SSH secrets match that tier’s server (same secret names as CD).
459+
2. On each host, runs `pg_dump` and uploads to **`gs://insights-db-backups/paperscout/<environment>/`**, using object keys that include UTC time plus the GitHub Actions run id so backups do not collide on reruns.
460+
3. Configure lifecycle rules on the bucket/prefixes as needed (for example, pruning objects older than 30 days).
459461

460-
CD secrets and variables are configured per **GitHub Environment** (`production` and `staging`); see the table in [Deployment](#deployment). Other secrets (e.g. database backups) are documented in [`deploy/SERVER_SETUP.md`](deploy/SERVER_SETUP.md#9-github-secrets-checklist).
462+
SSH credentials for backups live under **each environment** (`staging`, `production`), not at the repository level — parallel to [Deployment](#deployment). See [`deploy/SERVER_SETUP.md`](deploy/SERVER_SETUP.md#9-github-secrets-and-environments).

deploy/SERVER_SETUP.md

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -100,10 +100,14 @@ rm /tmp/paperscout.dump
100100
```
101101

102102
If the dump is stored in GCS (from the daily backup workflow),
103-
download it directly on the new server instead:
103+
download it directly on the new server instead — use the prefix that matches
104+
the environment you are restoring (**`staging`** or **`production`**). Object
105+
names include UTC time and the workflow run id (see §8); pick the file you need,
106+
for example:
104107

105108
```bash
106-
gsutil cp gs://paperscout-backups/paperscout-<YYYYMMDD>.dump /tmp/paperscout.dump
109+
gsutil ls gs://insights-db-backups/paperscout/<environment>/
110+
gsutil cp gs://insights-db-backups/paperscout/<environment>/paperscout-<object-name>.dump /tmp/paperscout.dump
107111
pg_restore -U paperscout -h localhost -d paperscout --no-owner /tmp/paperscout.dump
108112
rm /tmp/paperscout.dump
109113
```
@@ -180,7 +184,7 @@ Clone the repo into `/opt/paperscout`:
180184

181185
```bash
182186
sudo mkdir -p /opt
183-
sudo git clone https://github.com/<org>/<repo>.git /opt/paperscout
187+
sudo git clone https://github.com/cppalliance/paperscout.git /opt/paperscout
184188
sudo chown -R <deploy-user>:<deploy-user> /opt/paperscout
185189
```
186190

@@ -222,42 +226,56 @@ curl -sf http://localhost:9101/health | python3 -m json.tool
222226
docker compose logs -f paperscout
223227
```
224228

229+
### Example: staging-style host
230+
231+
If you use a **separate** staging deployment (second clone path and GitHub Environment `staging`), typical placeholders are:
232+
233+
- TLS / DNS: `sudo certbot --nginx -d staging.example.org` (replace with your real staging hostname when provisioning).
234+
- Health check on the staging machine after mapping ports (see README CD table): `curl -sf http://localhost:9102/health` — use whatever port your staging compose publishes for health instead of `9102` if different.
235+
- Slack **Request URL** when nginx proxies under `/paperscout/`: `https://staging.example.org/paperscout/slack/events`.
236+
225237
---
226238

227239
## 7. Restoring from a GCS backup (optional)
228240

229241
If migrating from another server with an existing database:
230242

231243
```bash
232-
gsutil cp gs://paperscout-backups/paperscout-<YYYYMMDD>.dump /tmp/paperscout.dump
233-
pg_restore -U paperscout -h localhost -d paperscout -c /tmp/paperscout.dump
244+
gsutil ls gs://insights-db-backups/paperscout/<environment>/
245+
gsutil cp gs://insights-db-backups/paperscout/<environment>/paperscout-<object-name>.dump /tmp/paperscout.dump
246+
pg_restore -U paperscout -h localhost -d paperscout -c --no-owner /tmp/paperscout.dump
234247
rm /tmp/paperscout.dump
235248
```
236249

237250
---
238251

239252
## 8. Database backups
240253

241-
The `db-backup.yml` GitHub Actions workflow SSHes into the server daily
242-
and runs `pg_dump` + `gsutil cp` to upload to GCS. The VM's service
243-
account handles authentication automatically — no credentials needed.
254+
The `db-backup.yml` workflow runs **two parallel matrix jobs** (`staging` and
255+
`production`). Each job uses the **GitHub Environment** with the same name, so
256+
SSH secrets (`SERVER_HOST`, etc.) resolve per tier — matching CD. Each run uploads to:
244257

245-
The GCS bucket `paperscout-backups` should have a lifecycle rule to
246-
auto-delete objects older than 30 days (configured in the Cloud Console
247-
under the bucket's **Lifecycle** tab).
258+
```text
259+
gs://insights-db-backups/paperscout/<environment>/paperscout-<UTC-timestamp>-<run-id>-<run-attempt>.dump
260+
```
261+
262+
Object keys include the workflow run id so same-day reruns do not overwrite objects; each matrix job uses its own temp file on the host.
248263

249264
---
250265

251-
## 9. GitHub Secrets checklist
266+
## 9. GitHub secrets and environments
267+
268+
**Continuous deployment** (`cd.yml`) and **database backups** (`db-backup.yml`)
269+
both use the **`staging`** and **`production`** GitHub Environments. Configure the **same SSH secret names** in each environment (values differ per server):
252270

253-
Configure these in the repo under **Settings → Secrets and variables → Actions**:
271+
| Secret | Purpose |
272+
| ---------------- | -------------------------------------------------------- |
273+
| `SERVER_HOST` | SSH target host for that environment’s VM |
274+
| `SERVER_USER` | SSH username (e.g. `<deploy-user>`) |
275+
| `SERVER_SSH_KEY` | Private SSH key for the deploy user |
276+
| `SERVER_PORT` | SSH port (optional; default `22`) |
254277

255-
| Secret | Purpose |
256-
| ---------------- | ----------------------------------- |
257-
| `SERVER_HOST` | Server IP or hostname |
258-
| `SERVER_USER` | SSH username (e.g. `<deploy-user>`) |
259-
| `SERVER_SSH_KEY` | Private SSH key for the deploy user |
260-
| `SERVER_PORT` | SSH port (optional, defaults to 22) |
278+
CD also uses **environment Variables** (`DEPLOY_PATH`, `DEPLOY_BRANCH`, `HEALTH_PORT`) — see the README Deployment table. Backup jobs only need the secrets above.
261279

262280
`GITHUB_TOKEN` is provided automatically by GitHub Actions.
263-
GCS authentication uses the VM's service account — no extra secrets needed.
281+
GCS uploads use the VM's service account (`gsutil`) — ensure each server can write to `gs://insights-db-backups/paperscout/<environment>/`.

docs/onboarding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ python -m paperscout
9999
- **Slack HTTP app** listens on `PORT` (default **3000**).
100100
- **Health** endpoint listens on `health_port` from settings (default **8080**) — `GET /health`.
101101

102-
For Slack Event Subscriptions you need a public URL (e.g. ngrok); see [README](../README.md#7-set-the-request-url).
102+
For Slack Event Subscriptions you need a public URL (e.g. ngrok). With nginx and a `/paperscout/` prefix, the Request URL must include that path; see [README — Set the Request URL](../README.md#7-set-the-request-url).
103103

104104
## Deployment (summary)
105105

src/paperscout/__main__.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,14 @@
1414
from .db import init_db, init_pool
1515
from .health import start_health_server
1616
from .monitor import Scheduler
17-
from .scout import MessageQueue, create_app, notify_channel, notify_users, register_handlers
17+
from .scout import (
18+
MessageQueue,
19+
create_app,
20+
enqueue_startup_status,
21+
notify_channel,
22+
notify_users,
23+
register_handlers,
24+
)
1825
from .sources import ISOProber, WG21Index
1926
from .storage import ProbeState, UserWatchlist
2027

@@ -131,6 +138,8 @@ def _on_poll_result(result):
131138
)
132139
bolt_thread.start()
133140

141+
enqueue_startup_status(mq, state, paper_count_fn)
142+
134143
await scheduler.run_forever()
135144

136145

src/paperscout/scout.py

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -440,30 +440,44 @@ def _show_watchlist(
440440
)
441441

442442

443-
def _handle_status(state: ProbeState, paper_count_fn, say, reply_opts: dict) -> None:
444-
"""Post loaded paper count, last poll, probe settings."""
443+
def format_status_message(state: ProbeState, paper_count_fn) -> str:
444+
"""Mrkdwn body for the interactive ``status`` command and startup channel post."""
445445
from datetime import datetime as _dt
446446
from datetime import timezone as _tz
447447

448448
last = state.last_poll
449449
last_str = (
450450
_dt.fromtimestamp(last, tz=_tz.utc).strftime("%Y-%m-%d %H:%M:%S UTC") if last else "never"
451451
)
452-
say(
453-
text=(
454-
f"*Paperscout Status*\n"
455-
f"• Papers loaded: {paper_count_fn():,}\n"
456-
f"• Last poll: {last_str}\n"
457-
f"• Poll interval: {settings.poll_interval_minutes} min\n"
458-
f"• Discovered via probe: {len(state.get_all_discovered())}\n"
459-
f"• ISO probing: {'enabled' if settings.enable_iso_probe else 'disabled'}\n"
460-
f"• Alert window: {settings.alert_modified_hours}h\n"
461-
f"• Cold cycle: 1/{settings.cold_cycle_divisor}"
462-
),
463-
**reply_opts,
452+
return (
453+
f"*Paperscout Status*\n"
454+
f"• Papers loaded: {paper_count_fn():,}\n"
455+
f"• Last poll: {last_str}\n"
456+
f"• Poll interval: {settings.poll_interval_minutes} min\n"
457+
f"• Discovered via probe: {len(state.get_all_discovered())}\n"
458+
f"• ISO probing: {'enabled' if settings.enable_iso_probe else 'disabled'}\n"
459+
f"• Alert window: {settings.alert_modified_hours}h\n"
460+
f"• Cold cycle: 1/{settings.cold_cycle_divisor}"
464461
)
465462

466463

464+
def _handle_status(state: ProbeState, paper_count_fn, say, reply_opts: dict) -> None:
465+
"""Post loaded paper count, last poll, probe settings."""
466+
say(text=format_status_message(state, paper_count_fn), **reply_opts)
467+
468+
469+
def enqueue_startup_status(
470+
mq: MessageQueue,
471+
state: ProbeState,
472+
paper_count_fn,
473+
) -> None:
474+
"""Post *status* summary to ``NOTIFICATION_CHANNEL`` once at process start."""
475+
channel = settings.notification_channel
476+
if not channel:
477+
return
478+
mq.enqueue(channel, format_status_message(state, paper_count_fn))
479+
480+
467481
def _handle_version(say, reply_opts: dict) -> None:
468482
"""Post package version string."""
469483
from . import __version__

tests/test_scout.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
_paper_link,
2020
_reply_opts,
2121
_show_watchlist,
22+
enqueue_startup_status,
23+
format_status_message,
2224
notify_channel,
2325
notify_users,
2426
register_handlers,
@@ -483,6 +485,35 @@ def test_status_after_poll(self, fake_pool):
483485
assert "100" in text and "never" not in text
484486

485487

488+
class TestFormatStatusMessage:
489+
def test_matches_handle_status_output(self, fake_pool):
490+
state = ProbeState(fake_pool)
491+
say = MagicMock()
492+
with patch("paperscout.scout.settings", _make_settings()):
493+
expected = format_status_message(state, lambda: 42)
494+
_handle_status(state, lambda: 42, say, {})
495+
assert say.call_args[1]["text"] == expected
496+
497+
498+
class TestEnqueueStartupStatus:
499+
def test_enqueues_when_channel_configured(self, fake_pool):
500+
mq = MagicMock()
501+
state = ProbeState(fake_pool)
502+
with patch("paperscout.scout.settings", _make_settings(channel="C-alerts")):
503+
enqueue_startup_status(mq, state, lambda: 7)
504+
mq.enqueue.assert_called_once()
505+
assert mq.enqueue.call_args[0][0] == "C-alerts"
506+
assert "Paperscout Status" in mq.enqueue.call_args[0][1]
507+
assert "7" in mq.enqueue.call_args[0][1]
508+
509+
def test_skips_when_no_channel(self, fake_pool):
510+
mq = MagicMock()
511+
state = ProbeState(fake_pool)
512+
with patch("paperscout.scout.settings", _make_settings(channel="")):
513+
enqueue_startup_status(mq, state, lambda: 1)
514+
mq.enqueue.assert_not_called()
515+
516+
486517
# ── register_handlers ─────────────────────────────────────────────────────────
487518

488519

0 commit comments

Comments
 (0)