feat: add project onboarding script (LFXV2-1373)#82
Conversation
Adds scripts/onboard_project.py and accompanying docs/onboard_new_projects_script.md to automate steps 2-5 of the new project onboarding workflow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-1373 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
There was a problem hiding this comment.
Pull request overview
Adds a standalone onboarding automation script plus documentation to consolidate the “new project onboarding” operational steps into a repeatable workflow.
Changes:
- Introduces
scripts/onboard_project.pyto resolve a project tree, replay NATS KV entries, verify mappings, reindex committees, and reindex DynamoDB-backed resources. - Adds
docs/onboard_new_projects_script.mdwith prerequisites, usage, options, and a recommended operator workflow.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
scripts/onboard_project.py |
New CLI script that orchestrates Phases 1–5 (API lookup, NATS KV replay/verification, committee/member reindex, DynamoDB query + KV replay). |
docs/onboard_new_projects_script.md |
New operator documentation for running the onboarding script via uv, including options and workflow guidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if cfg["parent_key_index"] is None: | ||
| # Parent key field is the primary key — use batch_get_item | ||
| keys_list = list(parent_keys) | ||
| for i in range(0, len(keys_list), 100): | ||
| batch = [{cfg["primary_key"]: k} for k in keys_list[i:i + 100]] | ||
| resp = self.dynamodb.batch_get_item( | ||
| RequestItems={cfg["name"]: {"Keys": batch}} | ||
| ) | ||
| items.extend(resp.get("Responses", {}).get(cfg["name"], [])) | ||
| while resp.get("UnprocessedKeys"): | ||
| resp = self.dynamodb.batch_get_item( | ||
| RequestItems=resp["UnprocessedKeys"] | ||
| ) | ||
| items.extend(resp.get("Responses", {}).get(cfg["name"], [])) |
There was a problem hiding this comment.
self.dynamodb is a DynamoDB resource (boto3.resource('dynamodb')), but this code calls self.dynamodb.batch_get_item(...), which is a low-level client operation and will raise at runtime. Use boto3.client('dynamodb') (or self.dynamodb.meta.client) for batch_get_item, and ensure keys are marshalled to the expected wire format when using the client API.
| parser.add_argument( | ||
| "--fix-mappings", | ||
| action="store_true", | ||
| help="Attempt to create missing v1-mappings entries (limited support)", |
There was a problem hiding this comment.
The --fix-mappings flag help text says it will “Attempt to create missing v1-mappings entries”, but the implementation only prints a note and does not create mappings. Update the CLI help text to reflect actual behavior (e.g., “print guidance when mappings are missing”).
| help="Attempt to create missing v1-mappings entries (limited support)", | |
| help="Print guidance when v1-mappings entries are missing", |
| 5. Reindex DynamoDB resources (meetings, polls, etc.) | ||
|
|
||
| Usage: | ||
| python scripts/onboard_project.py <slug> [dev|staging|prod] [options] |
There was a problem hiding this comment.
The module docstring usage suggests a positional environment argument (<slug> [dev|staging|prod]), but the CLI actually uses --env and only accepts a single positional slug. Update the usage text to avoid misleading users.
| python scripts/onboard_project.py <slug> [dev|staging|prod] [options] | |
| python scripts/onboard_project.py <slug> [--env {dev,staging,prod}] [options] |
|
|
||
| Consolidates the manual onboarding steps 2-5 into a single command: | ||
| 2. Replay project KV entries in v1-objects to trigger reprocessing | ||
| 3. Verify (and optionally create) v1-mappings entries |
There was a problem hiding this comment.
The docstring claims Phase 3 can “optionally create” v1-mappings entries, but the current implementation explicitly does not create mappings (it only reports guidance). Align the docstring wording with the actual behavior.
| 3. Verify (and optionally create) v1-mappings entries | |
| 3. Verify v1-mappings entries and report guidance for missing mappings |
| def _fetch_all_pages(self, client: httpx.Client, url: str, params: dict) -> List[dict]: | ||
| """Fetch all pages from a paginated endpoint.""" | ||
| items: List[dict] = [] | ||
| offset = 0 | ||
| while True: | ||
| p = {**params, "pageSize": PAGE_SIZE, "offset": offset} | ||
| resp = client.get(url, params=p, headers=self.headers) | ||
| resp.raise_for_status() | ||
| data = resp.json() | ||
| page = data.get("Data") or data.get("data") or [] | ||
| if not page: | ||
| break | ||
| items.extend(page) | ||
| if len(page) < PAGE_SIZE: | ||
| break | ||
| offset += PAGE_SIZE | ||
| return items | ||
|
|
There was a problem hiding this comment.
LFXAPIClient._fetch_all_pages is defined but never used in this script. Consider removing it to reduce maintenance surface area, or refactor existing pagination loops to use it so there’s a single pagination implementation.
| def _fetch_all_pages(self, client: httpx.Client, url: str, params: dict) -> List[dict]: | |
| """Fetch all pages from a paginated endpoint.""" | |
| items: List[dict] = [] | |
| offset = 0 | |
| while True: | |
| p = {**params, "pageSize": PAGE_SIZE, "offset": offset} | |
| resp = client.get(url, params=p, headers=self.headers) | |
| resp.raise_for_status() | |
| data = resp.json() | |
| page = data.get("Data") or data.get("data") or [] | |
| if not page: | |
| break | |
| items.extend(page) | |
| if len(page) < PAGE_SIZE: | |
| break | |
| offset += PAGE_SIZE | |
| return items |
| @@ -0,0 +1,161 @@ | |||
| # Onboarding Script — `scripts/onboard_project.py` | |||
|
|
|||
| Automates steps 2–5 of [onboarding a new project](./onboarding-new-project.md) into a | |||
There was a problem hiding this comment.
This doc links to ./onboarding-new-project.md, but that file doesn’t exist in the repository (so the link is broken). Update the link target to the correct location/file name, or add the referenced manual guide file.
| Automates steps 2–5 of [onboarding a new project](./onboarding-new-project.md) into a | |
| Automates steps 2–5 of onboarding a new project into a |
| # Only re-run committee and DynamoDB reindex (skip project replay and mapping check) | ||
| uv run scripts/onboard_project.py <slug> --reindex --skip-committees | ||
|
|
||
| # Only check mappings |
There was a problem hiding this comment.
The “Skipping phases” example is internally inconsistent: the comment says it will “skip project replay and mapping check” and “re-run committee and DynamoDB reindex”, but the command shown only uses --skip-committees (which skips committees) and the script has no flags to skip phases 2 or 3. Please correct the explanation and/or the example command to match what the script can actually do.
| # Only re-run committee and DynamoDB reindex (skip project replay and mapping check) | |
| uv run scripts/onboard_project.py <slug> --reindex --skip-committees | |
| # Only check mappings | |
| # Re-run project replay, mapping check, and DynamoDB reindex (skip committees) | |
| uv run scripts/onboard_project.py <slug> --reindex --skip-committees | |
| # Re-run project replay and mapping check only (skip committees and DynamoDB) |
|
Side note: I would expect in the future that "fan-out reindexing" is not a v1-sync-helper activity — v1-sync-helper's reindexing capability, in the future, should be limited to handling iterations in the v2 API contracts of projects and committees (if we need to backfill new fields in v2 that correspond to v1 fields)—the "touch each KV entry in the replica to re-sync" makes sense for this. I do recognize the value of a centralized "fan-out" reindex as we iterative remove project filters, but I propose we remove it afterwards (project filters were a hack to begin with and the architecture was never really intended for these). Wrapper services should implement their own backfill/reindexing routines in code (rather than scripts). Let's discuss more in coming weeks. |
Summary
scripts/onboard_project.pyto automate steps 2–5 of the new project onboarding workflow (resolve project tree, check/replay KV entries, trigger reindex, query DynamoDB)docs/onboard_new_projects_script.mddocumenting prerequisites, quick-start usage, and per-phase behaviouruvdependencies (boto3,httpx,nats-py) — no manual install requiredJira: LFXV2-1373
🤖 Generated with Claude Code