Skip to content

Commit 13db8b7

Browse files
Fix Upptime pipeline: expand to 7-service bundle, remove broken static-site workflow (#2)
- Expand .upptimerc.yml from 3 → 11 monitored surfaces to match the actual agent-facing bundle promised in HN/PH/marketing copy: postgres, redis, mongodb, queue, storage, webhook, deploy provisioning endpoints, plus marketing site, dashboard, agent healthz, OpenAPI spec, and the pg.instanode.dev TLS handshake. POST-only routes are probed by GET and accept 405 as the success signal. /deploy/new accepts 401 (auth-gated). - Remove .github/workflows/static-site.yml — the upstream action upptime/status-page@master no longer exists, which is why every Static Site CI run has failed. The Jekyll-rendered README.md is the modern Upptime default and is what status.instanode.dev serves. - Document the two human-ops steps blocking Summary/Graphs/Response-Time workflows: add a GH_PAT secret with repo scope, and either turn off enforce_admins on master branch protection or add the PAT owner as a bypass actor. Without those, the auto-generated badge table and graphs never land in README.md, which is why the page currently renders as a plain README instead of a status board. Refs gtm-ops/TASKS.md item U. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d8940f9 commit 13db8b7

4 files changed

Lines changed: 200 additions & 43 deletions

File tree

.github/workflows/static-site.yml

Lines changed: 0 additions & 23 deletions
This file was deleted.

.upptimerc.yml

Lines changed: 85 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,115 @@
1-
# Upptime configuration
1+
# Upptime configuration for status.instanode.dev
22
# Docs: https://upptime.js.org/docs/configuration
3+
#
4+
# Each check runs every 5 minutes from `Uptime CI`. Results land in
5+
# `history/*.yml`, badges in `api/`, graphs in `graphs/`. The Jekyll-
6+
# rendered README is the status page at status.instanode.dev.
7+
#
8+
# Services mirror the agent-facing bundle exposed by api.instanode.dev:
9+
# postgres, redis, mongodb, queue (NATS), storage (MinIO), webhook,
10+
# deploy — plus the marketing site itself.
311

412
owner: InstaNode-dev
513
repo: instant-status
614

715
sites:
8-
- name: API
16+
# ─── Public HTTP surfaces ─────────────────────────────────────────────────
17+
- name: Marketing site
18+
url: https://instanode.dev/
19+
expectedStatusCodes:
20+
- 200
21+
22+
- name: Agent API (healthz)
923
url: https://api.instanode.dev/healthz
1024
expectedStatusCodes:
1125
- 200
12-
assignees: []
1326

14-
- name: Marketing site
15-
url: https://instanode.dev/
27+
- name: Dashboard
28+
url: https://app.instanode.dev/
29+
expectedStatusCodes:
30+
- 200
31+
- 301
32+
- 302
33+
34+
# ─── Provisioning surfaces (POST endpoints — probed with OPTIONS) ─────────
35+
# We do not POST in monitors (that would create real resources). Instead we
36+
# check that the route exists by sending HEAD/OPTIONS via the Upptime
37+
# default GET and accepting the API's "method not allowed" response as
38+
# proof-of-life for the handler. The API replies 405 for GET on POST-only
39+
# routes which is the desired signal.
40+
- name: Postgres provisioning (POST /db/new)
41+
url: https://api.instanode.dev/db/new
42+
expectedStatusCodes:
43+
- 405
44+
- 200
45+
46+
- name: Redis provisioning (POST /cache/new)
47+
url: https://api.instanode.dev/cache/new
48+
expectedStatusCodes:
49+
- 405
50+
- 200
51+
52+
- name: MongoDB provisioning (POST /nosql/new)
53+
url: https://api.instanode.dev/nosql/new
1654
expectedStatusCodes:
55+
- 405
1756
- 200
1857

19-
# Postgres TLS-only endpoint. Upptime supports a TCP_PING check method
20-
# (see https://upptime.js.org/docs/configuration#method). We use that
21-
# so we get a true "is the port open" signal rather than poking the
22-
# Postgres protocol with an HTTPS prober.
58+
- name: Queue provisioning (POST /queue/new)
59+
url: https://api.instanode.dev/queue/new
60+
expectedStatusCodes:
61+
- 405
62+
- 200
63+
64+
- name: Storage provisioning (POST /storage/new)
65+
url: https://api.instanode.dev/storage/new
66+
expectedStatusCodes:
67+
- 405
68+
- 200
69+
70+
- name: Webhook provisioning (POST /webhook/new)
71+
url: https://api.instanode.dev/webhook/new
72+
expectedStatusCodes:
73+
- 405
74+
- 200
75+
76+
- name: Deploy provisioning (POST /deploy/new)
77+
url: https://api.instanode.dev/deploy/new
78+
# Deploy is auth-gated (RequireAuth) — unauthenticated GET returns 401.
79+
# That still proves the handler is wired and the auth middleware is up.
80+
expectedStatusCodes:
81+
- 401
82+
- 405
83+
84+
# ─── Customer-facing TCP surfaces ─────────────────────────────────────────
2385
- name: Customer Postgres (pg.instanode.dev TLS handshake)
2486
url: https://pg.instanode.dev:5432
2587
method: TCP_PING
2688
tcpHostPort: "pg.instanode.dev:5432"
27-
# Fallback codes if the TCP_PING method isn't honored in the current
28-
# Upptime action — HTTPS probe to a raw Postgres port typically
29-
# surfaces these "not really HTTP" responses which we treat as "up".
89+
# Fallbacks if TCP_PING isn't honored by the current Upptime action —
90+
# probing :5432 over HTTPS surfaces "not really HTTP" codes which we
91+
# treat as "the port is open" = up.
3092
expectedStatusCodes:
3193
- 0
3294
- 400
3395
- 401
3496
- 426
3597

98+
- name: OpenAPI spec (api.instanode.dev/openapi.json)
99+
url: https://api.instanode.dev/openapi.json
100+
expectedStatusCodes:
101+
- 200
102+
36103
status-website:
37104
cname: status.instanode.dev
38105
name: instanode.dev status
39106
logoUrl: ""
40107
baseUrl: /
108+
introTitle: "instanode.dev status"
109+
introMessage: >
110+
Live uptime and response time for every public instanode.dev surface — the
111+
agent API, the seven provisioning endpoints (postgres, redis, mongodb,
112+
queue, storage, webhook, deploy), the marketing site, and the customer
113+
Postgres TLS endpoint. Checked every 5 minutes from GitHub Actions.
41114
42115
notifications: []

README.md

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,47 @@
22

33
Upptime-powered status page for [instanode.dev](https://instanode.dev).
44

5-
- Live status page: https://status.instanode.dev
6-
- Incidents auto-open as issues in this repository when a monitored endpoint fails; they close automatically once the endpoint recovers.
7-
- Checks run every 5 minutes via GitHub Actions (see `.github/workflows/uptime.yml`).
5+
- Live status page: <https://status.instanode.dev>
6+
- Incidents auto-open as issues in this repo when a monitored endpoint fails; they close automatically when the endpoint recovers.
7+
- Checks run every 5 minutes from GitHub Actions (`.github/workflows/uptime.yml`).
8+
- Daily summary, response-time, and graph generation run at 00:00 UTC.
89

910
## What is monitored
1011

11-
Configured in `.upptimerc.yml`:
12+
Configured in `.upptimerc.yml` — the 7-service agent bundle plus public web surfaces:
1213

13-
- **API** - `https://api.instanode.dev/healthz`
14-
- **Marketing site** - `https://instanode.dev/`
15-
- **Customer Postgres** - TCP ping to `pg.instanode.dev:5432`
14+
**Public web**
15+
- Marketing site (`instanode.dev/`)
16+
- Agent API health (`api.instanode.dev/healthz`)
17+
- Dashboard (`app.instanode.dev/`)
18+
- OpenAPI spec (`api.instanode.dev/openapi.json`)
19+
20+
**Provisioning surfaces (7-service bundle)**
21+
- Postgres — `POST /db/new`
22+
- Redis — `POST /cache/new`
23+
- MongoDB — `POST /nosql/new`
24+
- Queue (NATS) — `POST /queue/new`
25+
- Storage (MinIO) — `POST /storage/new`
26+
- Webhook — `POST /webhook/new`
27+
- Deploy — `POST /deploy/new`
28+
29+
POST-only routes are probed by GET; a `405 Method Not Allowed` response is the success signal (the handler is wired and the router is up). `/deploy/new` is auth-gated and returns `401` for unauthenticated probes — also treated as success.
30+
31+
**Customer TCP surface**
32+
- Customer Postgres TLS handshake (`pg.instanode.dev:5432`)
33+
34+
## Setup (one-time human ops)
35+
36+
Before the status site renders correctly, an admin needs to do two things — see [`SETUP.md`](./SETUP.md).
37+
38+
1. Add a `GH_PAT` secret (personal access token with `repo` scope) so workflows can push the auto-generated badges, response-time data, and README updates past branch protection.
39+
2. Either disable `enforce_admins` on the `master` branch protection rule, or grant the PAT owner admin bypass, so the daily Summary/Graphs/Response-Time workflows succeed.
40+
41+
Once both are in place, run `Setup CI` manually from the Actions tab to seed the badge directory; the rest is automatic.
1642

1743
## Powered by
1844

19-
[Upptime](https://upptime.js.org/) - the open-source uptime monitor and status page powered entirely by GitHub Actions, Issues, and Pages. No servers, no dashboards, no cost.
45+
[Upptime](https://upptime.js.org/) open-source uptime monitor and status page powered by GitHub Actions, Issues, and Pages. No servers, no dashboards, no recurring cost.
2046

2147
## License
2248

SETUP.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# SETUP — status.instanode.dev
2+
3+
This file documents the human-ops steps required to bring `status.instanode.dev` from a placeholder (Jekyll-rendered `README.md`) to a fully rendered Upptime status page with live badges, graphs, and response-time history.
4+
5+
## Current state (2026-05-11)
6+
7+
- DNS: `status.instanode.dev` CNAMEs to `instanode-dev.github.io` (verified — HTTP/2 200).
8+
- GitHub Pages: enabled on this repo, source = `master` branch root.
9+
- `Uptime CI`: green every 5 min — probes are running and the `history/*.yml` records exist.
10+
- `Summary CI`, `Graphs CI`, `Response Time CI`: **failing** — workflows produce the correct artifacts but `git push` to `master` is rejected by branch protection (`enforce_admins: true`).
11+
- `Static Site CI`: **deleted** in this PR — the upstream action `upptime/status-page@master` no longer exists. The Jekyll-rendered `README.md` *is* the status page, which is the modern Upptime default. No separate static-site step is needed.
12+
13+
## Why the site looks like a README today
14+
15+
Without `Summary CI` / `Graphs CI` ever succeeding, the `README.md` was never auto-rewritten to include the badge table, uptime percentages, and response-time graphs. So Jekyll renders the manually-authored README. Once the workflows can push, the README is regenerated daily and the page transforms.
16+
17+
## Required ops (one-time, ~5 min)
18+
19+
### 1. Create a GH_PAT
20+
21+
A scheduled GitHub Action using the default `GITHUB_TOKEN` is *not allowed* to bypass branch protection — that's the entire point of the protection. The Upptime docs recommend a personal access token with `repo` scope.
22+
23+
```text
24+
GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic) → Generate new token (classic)
25+
Note: "instant-status push token"
26+
Expiration: 1 year (or longer)
27+
Scopes: repo (full control of private repos)
28+
Generate → copy the token (starts with ghp_)
29+
```
30+
31+
### 2. Add the PAT as a repository secret
32+
33+
```text
34+
github.com/InstaNode-dev/instant-status → Settings → Secrets and variables → Actions → New repository secret
35+
Name: GH_PAT
36+
Value: <paste the ghp_ token>
37+
```
38+
39+
All four Upptime workflows already reference `secrets.GH_PAT || secrets.GITHUB_TOKEN` so no workflow edits are needed.
40+
41+
### 3. Allow the PAT to bypass branch protection
42+
43+
The PAT must be owned by a user listed in branch-protection bypass, OR `enforce_admins` must be `false`. Pick one:
44+
45+
**Option A (recommended): turn off `enforce_admins`**
46+
```bash
47+
gh api -X DELETE repos/InstaNode-dev/instant-status/branches/master/protection/enforce_admins
48+
```
49+
This keeps PR review requirements in place for humans but lets admin-PAT pushes through. Re-enable with:
50+
```bash
51+
gh api -X POST repos/InstaNode-dev/instant-status/branches/master/protection/enforce_admins
52+
```
53+
54+
**Option B: allow specific bypass actors** — Repo Settings → Branches → branch protection rule → "Allow specified actors to bypass required pull requests" → add the PAT owner.
55+
56+
### 4. Seed the badge directory
57+
58+
Once the secret + protection bypass are in place:
59+
```text
60+
Actions tab → Setup CI → Run workflow → on master
61+
```
62+
This bootstraps `api/`, `history/`, and `graphs/`. After it succeeds, the next scheduled Summary CI run (00:00 UTC) will rewrite `README.md` with the live status table — or you can run `Summary CI` manually from the Actions tab to see it immediately.
63+
64+
## Verifying it worked
65+
66+
```bash
67+
# 1. README on master now contains a badge table:
68+
gh api repos/InstaNode-dev/instant-status/contents/README.md \
69+
| jq -r .content | base64 -d | head -40
70+
71+
# 2. Status site reflects it (Pages can take ~60 s to rebuild after a master push):
72+
curl -s https://status.instanode.dev | grep -c '<img.*shields.io'
73+
# Expect: >0 (one badge per monitored service)
74+
75+
# 3. Uptime workflows continue green every 5 min:
76+
gh run list --repo InstaNode-dev/instant-status --workflow=uptime.yml --limit 3
77+
```
78+
79+
## Rollback
80+
81+
If anything goes wrong, just revert this PR — the previous workflows + 3-service config will resume. The `history/` and `api/` directories generated by future Uptime CI runs are append-only and harmless.

0 commit comments

Comments
 (0)