Skip to content

Commit afd950e

Browse files
epipavthemaroltmbani01joanagmaia
authored
feat: track packages db contract and scaffold (#4146)
Signed-off-by: Uroš Marolt <uros@marolt.me> Signed-off-by: Mouad BANI <mouad-mb@outlook.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org> Signed-off-by: anilb <epipav@gmail.com> Co-authored-by: Uroš Marolt <uros@marolt.me> Co-authored-by: Mouad BANI <mouad-mb@outlook.com> Co-authored-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
1 parent eba296c commit afd950e

40 files changed

Lines changed: 2014 additions & 11 deletions

.claude/rules/skill-guidance.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ This project has guided skills for common workflows. **Proactively suggest the r
1616
| `/review-pr` | Review a PR, audit code changes, check PR quality, validate a PR against standards |
1717
| `/adr` | Record an architecture decision, choose between frameworks/libraries/patterns, query past decisions |
1818
| `/scaffold-snowflake-connector` | Add a new Snowflake-connector data source or integration |
19+
| `/packages-worker-setup` | First-time setup of packages-db and github-repos-enricher for a new engineer |
20+
| `/packages-worker-add-entrypoint` | Scaffold a new sibling worker inside packages_worker (npm, OSV, scorecard, etc.) |
1921

2022
## Trigger Phrases
2123

@@ -45,3 +47,13 @@ This project has guided skills for common workflows. **Proactively suggest the r
4547
**`/scaffold-snowflake-connector`** — match any of these intents:
4648
- "Add a new Snowflake connector", "New integration for [platform]"
4749
- "Scaffold a new data source", anything about adding a platform to `snowflake_connectors`
50+
51+
**`/packages-worker-setup`** — match any of these intents:
52+
- "Set up packages worker", "how do I run the enricher", "first time on this branch"
53+
- "Get packages-db running", "packages-db won't start", "ENRICHER_GITHUB_TOKENS"
54+
- Any first-time setup question specific to `packages_worker` or `packages-db`
55+
56+
**`/packages-worker-add-entrypoint`** — match any of these intents:
57+
- "Add a new packages worker", "scaffold a sibling worker", "new entry point in packages_worker"
58+
- "Add npm ingestion", "add OSV worker", "add scorecard runner"
59+
- Any request to create a new `src/bin/*.ts` worker inside `packages_worker`
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
name: packages-worker-add-entrypoint
3+
description: >
4+
Scaffold a new sub-worker inside packages_worker (npm, deps.dev, osv, scorecard,
5+
etc.) following the single-service multi-entry-point structure. Use when: "add a
6+
new packages worker", "scaffold a sub-worker in packages_worker", "new worker for
7+
packages-db", "add npm worker", "add OSV worker", "add deps.dev worker".
8+
allowed-tools: Read, Write, Edit, Bash, AskUserQuestion, Glob
9+
---
10+
11+
# packages-worker — Add a New Sub-worker
12+
13+
You are adding a new data-ingestion worker to `services/apps/packages_worker/`.
14+
The structure follows the same pattern as `backend/` (where `api.ts` and
15+
`job-generator.ts` share one Dockerfile): one npm package, one Docker image,
16+
each worker in its own `src/{worker}/` directory with its own entry point.
17+
18+
```
19+
services/apps/packages_worker/
20+
src/
21+
bin/
22+
packages-worker.ts ← parent stub
23+
github-repos-enricher.ts ← existing worker
24+
<name>.ts ← entry point you will create
25+
github/ ← existing worker logic
26+
<worker>/ ← directory you will create
27+
index.ts ← main logic for this worker
28+
types.ts
29+
config.ts ← shared — add your config getter here
30+
db.ts ← shared — do not modify
31+
```
32+
33+
## Step 1 — Gather requirements
34+
35+
Ask the engineer for:
36+
37+
1. **Worker name** (kebab-case) — e.g. `npm-sync`, `osv-sync`, `scorecard-runner`. Used as the entry point filename (`src/bin/<name>.ts`) and docker-compose service name.
38+
2. **Worker directory name** (short, lowercase) — e.g. `npm`, `osv`, `scorecard`. Becomes `src/<worker>/`.
39+
3. **What it does** — what data it fetches/writes, what table(s) in packages-db it reads from and writes to.
40+
4. **External API or data source** (if any) — URL, auth method, rate-limit characteristics.
41+
5. **Required env vars** beyond the shared DB vars — e.g. `NPM_API_URL`, `OSV_API_KEY`.
42+
43+
Do not proceed until you have answers to 1–3.
44+
45+
## Step 2 — Read existing files first
46+
47+
```bash
48+
cat services/apps/packages_worker/src/bin/github-repos-enricher.ts
49+
cat services/apps/packages_worker/src/config.ts
50+
cat services/apps/packages_worker/package.json
51+
cat scripts/services/github-repos-enricher.yaml
52+
```
53+
54+
These are the canonical references. Do not deviate from the patterns you see there.
55+
56+
## Step 3 — Scaffold the files
57+
58+
### 3a. Worker directory — `services/apps/packages_worker/src/<worker>/`
59+
60+
Create the directory with at minimum:
61+
62+
**`types.ts`** — types specific to this worker (input/output shapes, error kinds if calling an external API).
63+
64+
**`index.ts`** — the main logic function(s) this worker runs. What goes here depends entirely on what the worker does — do not force a loop shape if it does not fit. Discuss with the engineer what the execution model should be (continuous loop, one-shot batch, event-driven, etc.) and implement accordingly.
65+
66+
Add any additional files the worker needs (e.g. an API client, a DB query helper). All DB access uses inline pg-promise SQL via `qx.select` / `qx.result` / `qx.none` — do not add files to `services/libs/data-access-layer`.
67+
68+
### 3b. Entry point — `services/apps/packages_worker/src/bin/<name>.ts`
69+
70+
Follow the structure of `github-repos-enricher.ts`:
71+
- Import `getServiceLogger` from `@crowd/logging`
72+
- Import your worker's config getter from `../config` and `getPackagesDb` from `../db`
73+
- Import your worker's main function from `../<worker>/index`
74+
- Set `liveFilePath` / `readyFilePath` to `../tmp/<name>-live.tmp` / `../tmp/<name>-ready.tmp`
75+
- Handle SIGINT / SIGTERM with a `shuttingDown` flag
76+
- In `main()`: call config getter → validate any required tokens/keys → `await getPackagesDb()``await qx.selectOne('SELECT 1')``fs.mkdirSync` for the tmp dir → `setInterval` writing probe files every 5000ms → call your worker's main function → `clearInterval``process.exit(0)`
77+
- Fatal handler: `main().catch(err => { log.error({ err }, '<name> fatal error'); process.exit(1) })`
78+
79+
### 3c. Config additions — `services/apps/packages_worker/src/config.ts`
80+
81+
Read the file first, then add a `get<Worker>Config()` function:
82+
- Use `requireEnv(name)` for string vars, `requireEnvInt(name)` for integers
83+
- No defaults, no `?? undefined` — the process must refuse to start on missing config
84+
85+
### 3d. Docker-compose service — `scripts/services/<name>.yaml`
86+
87+
Copy `scripts/services/github-repos-enricher.yaml` and adapt:
88+
- Service names: `<name>` (prod) and `<name>-dev` (dev)
89+
- `command` (prod): `pnpm run start:<name>`
90+
- `command` (dev): `pnpm run dev:<name>`
91+
- `env_file`: keep the same four files (`backend/.env.dist.local`, `backend/.env.dist.composed`, `backend/.env.override.local`, `backend/.env.override.composed`)
92+
- `environment`: set any tuning var defaults inline (avoids requiring them in `.env.override.local` for local dev)
93+
- `volumes` (dev only): bind-mount `./services/apps/packages_worker/src` plus every `services/libs/*/src` directory (copy the full list from the enricher yaml for hot reload)
94+
95+
### 3e. package.json scripts — `services/apps/packages_worker/package.json`
96+
97+
Read the file first, then add:
98+
```json
99+
"start:<name>": "tsx src/bin/<name>.ts",
100+
"dev:<name>": "tsx watch src/bin/<name>.ts"
101+
```
102+
103+
### 3f. Env var files — `backend/.env.dist.local` and `backend/.env.dist.composed`
104+
105+
Append new required vars with empty-string defaults (or sensible local values for non-secrets):
106+
```
107+
NEW_WORKER_API_KEY=
108+
```
109+
110+
## Step 4 — TypeScript check
111+
112+
```bash
113+
cd services/apps/packages_worker && pnpm tsc --noEmit
114+
```
115+
116+
Fix any errors before proceeding.
117+
118+
## Checklist before committing
119+
120+
- [ ] `src/<worker>/` directory created with `types.ts` and `index.ts`
121+
- [ ] `src/bin/<name>.ts` — probe files, SIGINT/SIGTERM handler, fail-fast config check, `SELECT 1` on startup
122+
- [ ] `config.ts` — new `get<Worker>Config()` using `requireEnv`/`requireEnvInt`, no defaults
123+
- [ ] `scripts/services/<name>.yaml` — prod + dev services with bind mounts
124+
- [ ] `package.json``start:<name>` and `dev:<name>` scripts added
125+
- [ ] `backend/.env.dist.local` and `.env.dist.composed` — new vars documented
126+
- [ ] No new files in `services/libs/data-access-layer` (packages-db uses inline SQL)
127+
- [ ] `pnpm tsc --noEmit` passes
128+
129+
Use `/preflight` before opening a PR and `/commit` to sign off.
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
---
2+
name: packages-worker-setup
3+
description: >
4+
Get packages_worker running locally — first time or resuming after a break.
5+
Spins up packages-db if not running, applies any pending migrations, and starts
6+
the worker. All steps are safe to re-run.
7+
Use when: "set up packages worker", "start packages worker", "resume packages worker",
8+
"get packages-db running", "packages-db stopped", "restart the worker".
9+
allowed-tools: Read, Bash, Edit, AskUserQuestion
10+
---
11+
12+
# packages-worker
13+
14+
Get `packages_worker` running locally. All steps are idempotent — safe to run
15+
whether this is your first time or you're resuming after a break.
16+
17+
## Prerequisites check
18+
19+
```bash
20+
git branch --show-current # should be feat/track-packages
21+
docker info --format '{{.ServerVersion}}'
22+
pnpm --version
23+
```
24+
25+
If the branch is wrong: `git checkout feat/track-packages && pnpm i`.
26+
27+
## Step 1 — Start packages-db
28+
29+
No-op if already running.
30+
31+
```bash
32+
docker compose -f scripts/scaffold.yaml up -d packages
33+
until docker compose -f scripts/scaffold.yaml exec packages pg_isready -U postgres; do sleep 1; done
34+
echo "packages-db is ready"
35+
```
36+
37+
## Step 2 — Apply pending migrations
38+
39+
Flyway skips already-applied migrations, so this is safe to re-run.
40+
41+
```bash
42+
arch=$(uname -m)
43+
[ "$arch" = "arm64" ] && PLATFORM="--platform=linux/arm64/v8" || PLATFORM="--platform=linux/amd64"
44+
docker build $PLATFORM -t packages_flyway \
45+
-f backend/src/osspckgs/Dockerfile.flyway backend/src/osspckgs --load
46+
47+
docker run --rm --network crowd-bridge \
48+
-e PGHOST=packages \
49+
-e PGPORT=5432 \
50+
-e PGUSER=postgres \
51+
-e PGPASSWORD=example \
52+
-e PGDATABASE=packages-db \
53+
packages_flyway
54+
```
55+
56+
To create a new migration:
57+
58+
```bash
59+
./scripts/cli scaffold create-packages-migration <descriptive_name>
60+
```
61+
62+
## Step 3 — Start the worker
63+
64+
```bash
65+
DEV=1 ./scripts/cli service packages-worker up
66+
```
67+
68+
Dev mode uses hot reload — edits to `services/apps/packages_worker/src/` and
69+
`services/libs/*/src/` are picked up immediately without restarting.
70+
71+
## Day-to-day commands
72+
73+
```bash
74+
# Follow logs
75+
./scripts/cli service packages-worker logs
76+
77+
# Stop
78+
./scripts/cli service packages-worker down
79+
80+
# Restart
81+
./scripts/cli service packages-worker restart
82+
83+
# Check status
84+
./scripts/cli service packages-worker status
85+
```
86+
87+
## Going further
88+
89+
- Add a new sub-worker (npm-sync, osv-sync, etc.): `/packages-worker-add-entrypoint`
90+
- Record an architecture decision: `/adr`
91+
- Before opening a PR: `/preflight`
92+
- Commit with DCO sign-off: `/commit`
93+
94+
## Troubleshooting
95+
96+
| Symptom | Likely cause | Fix |
97+
|---|---|---|
98+
| `Connection refused` on packages-db | Docker not running | `docker compose -f scripts/scaffold.yaml up -d packages` |
99+
| `permission denied: scripts/cli` | CLI not executable | `chmod +x scripts/cli` |

backend/.env.dist.composed

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,11 @@ CROWD_OPENSEARCH_NODE=http://open-search:9200
2727
CROWD_TEMPORAL_SERVER_URL=temporal:7233
2828

2929
# Seach sync api
30-
CROWD_SEARCH_SYNC_API_URL=http://search-sync-api:8083
30+
CROWD_SEARCH_SYNC_API_URL=http://search-sync-api:8083
31+
# packages DB (osspckgs)
32+
CROWD_PACKAGES_DB_READ_HOST=packages
33+
CROWD_PACKAGES_DB_WRITE_HOST=packages
34+
CROWD_PACKAGES_DB_PORT=5432
35+
CROWD_PACKAGES_DB_USERNAME=postgres
36+
CROWD_PACKAGES_DB_PASSWORD=example
37+
CROWD_PACKAGES_DB_DATABASE=packages-db

backend/.env.dist.local

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,9 @@ CROWD_TEMPORAL_NAMESPACE=default
150150
CROWD_TEMPORAL_ENCRYPTION_KEY_ID=local
151151
CROWD_TEMPORAL_ENCRYPTION_KEY=FweBMRnGCLshER8FlSvNusQA6G3MRUKt
152152

153+
# Temporal — packages namespace
154+
CROWD_PACKAGES_TEMPORAL_NAMESPACE=default
155+
153156
# Seach sync api
154157
CROWD_SEARCH_SYNC_API_URL=http://localhost:8083
155158

@@ -166,4 +169,17 @@ CROWD_TINYBIRD_BASE_URL=http://localhost:7181/
166169

167170
# Auth0
168171
CROWD_AUTH0_ISSUER_BASE_URLS=
169-
CROWD_AUTH0_AUDIENCE=
172+
CROWD_AUTH0_AUDIENCE=
173+
# packages DB (osspckgs)
174+
CROWD_PACKAGES_DB_READ_HOST=localhost
175+
CROWD_PACKAGES_DB_WRITE_HOST=localhost
176+
CROWD_PACKAGES_DB_PORT=5434
177+
CROWD_PACKAGES_DB_USERNAME=postgres
178+
CROWD_PACKAGES_DB_PASSWORD=example
179+
CROWD_PACKAGES_DB_DATABASE=packages-db
180+
181+
# github-repos-enricher
182+
ENRICHER_GITHUB_TOKENS=
183+
ENRICHER_BATCH_SIZE=100
184+
ENRICHER_REPO_UPDATE_INTERVAL_HOURS=24
185+
ENRICHER_IDLE_SLEEP_SEC=60
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
FROM flyway/flyway:7.8.1-alpine
2+
3+
USER root
4+
5+
# Install envsubst from gettext used for templating.
6+
RUN apk update \
7+
&& apk add --no-cache gettext
8+
9+
USER flyway
10+
11+
COPY ./flyway_migrate.sh /migrate.sh
12+
13+
# Override default `flyway` entrypoint.
14+
ENTRYPOINT ["/migrate.sh"]
15+
16+
# Copy migrations.
17+
COPY ./migrations /tmp/migrations
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/usr/bin/env bash
2+
3+
set -e
4+
echo "Migrating jdbc:postgresql://${PGHOST}:${PGPORT}/${PGDATABASE}"
5+
6+
flyway \
7+
-locations="filesystem:/tmp/migrations" \
8+
-url="jdbc:postgresql://${PGHOST}:${PGPORT}/${PGDATABASE}" \
9+
-user="$PGUSER" \
10+
-password="$PGPASSWORD" \
11+
-connectRetries=60 \
12+
-outOfOrder=true \
13+
-mixed=true \
14+
-placeholderReplacement=false \
15+
-schemas=public \
16+
-X \
17+
migrate

0 commit comments

Comments
 (0)