🔗 Add script to automatically make new meetings pages#2401
🔗 Add script to automatically make new meetings pages#2401JFWooten4 wants to merge 13 commits intostellar:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a Python helper to generate new Stellar developer meeting pages from a YouTube URL/ID by downloading captions, producing a cleaned transcript, and scaffolding an MDX page (with optional summary/resources extraction).
Changes:
- Add
meetings/new-meeting.pyto fetch YouTube captions (viayt_dlp), clean/punctuate them, and generate an MDX meeting page. - Add
meetings/README.mddocumenting setup and usage of the script. - Update site/build hygiene: exclude
README.mdfrom meetings blog ingestion and ignore local Python/cookies artifacts.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
meetings/new-meeting.py |
New generator script for meeting MDX + transcript/summary/resources extraction. |
meetings/README.md |
Usage and setup instructions for the generator script. |
docusaurus.config.ts |
Excludes README.md files from the meetings blog content glob. |
.gitignore |
Ignores *.pyc and **/cookies.txt to avoid committing local artifacts/secrets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if args.createPage and not args.keepVtt: | ||
| for vttFile in vttFiles: | ||
| try: | ||
| vttFile.unlink() | ||
| except OSError: | ||
| pass | ||
| try: | ||
| if outDir.exists() and not any(outDir.iterdir()): | ||
| outDir.rmdir() | ||
| except OSError: |
There was a problem hiding this comment.
VTT cleanup is only performed when --create-page is enabled. When running with --no-create-page, the script always writes the .txt transcript but will still leave the downloaded .vtt files behind even if --keep-vtt is not set, which contradicts the meaning of --keep-vtt. Consider deleting the VTT files whenever not args.keepVtt (independent of createPage) once the .txt or .mdx output has been produced.
| parser.add_argument("--meetings-dir", default="meetings", dest="meetingsDir") | ||
| parser.add_argument("--title") |
There was a problem hiding this comment.
The default --meetings-dir is meetings, but the current Docusaurus blog config in this branch still points to path: 'meeting-notes'. Until the dependent PR that renames the directory is merged, this default will generate MDX files in a directory that won’t be published by the site. Consider defaulting to the current configured blog path or documenting that --meetings-dir meeting-notes is required until the rename lands.
| What it doesn't do: | ||
|
|
||
| - Draft a perfect description. | ||
| - Add a helpful resources section. |
There was a problem hiding this comment.
README says the script “doesn’t” add a helpful resources section, but the script actually generates a ### Resources section and auto-extracts CAP/SEP links. Update this section so the documented behavior matches the script’s output.
| - Add a helpful resources section. | |
| - Guarantee a complete resources list beyond the generated `### Resources` | |
| section and the auto-extracted CAP/SEP links. |
| <!-- removable --> | ||
|
|
There was a problem hiding this comment.
There’s a leftover <!-- removable --> marker in the README. If this file is meant to be user-facing docs, remove this internal note.
| <!-- removable --> |
| parser.add_argument("--create-page", action="store_true", default=True, dest="createPage", help="Create meetings mdx page with transcript") | ||
| parser.add_argument("--no-create-page", action="store_false", dest="createPage", help="Only export captions to text") | ||
| parser.add_argument("--save-txt", action="store_true", dest="saveTxt", help="Also save a plain text transcript") | ||
| parser.add_argument("--keep-vtt", action="store_true", dest="keepVtt", help="Keep downloaded VTT files") | ||
| parser.add_argument("--meetings-dir", default="meetings", dest="meetingsDir") | ||
| parser.add_argument("--title") | ||
| parser.add_argument("--description") | ||
| parser.add_argument("--authors", help="Comma-separated speaker slugs from meetings/authors.yml") | ||
| parser.add_argument("--tags", help="Comma-separated tags (default: developer)") | ||
| parser.add_argument("--date", help="YYYY-MM-DD; defaults to upload date") | ||
| parser.add_argument("--overwrite", action="store_true") | ||
| parser.add_argument("--block-seconds", type=int, default=60, dest="blockSeconds") | ||
| parser.add_argument("--punctuate", action="store_true", default=True) | ||
| parser.add_argument("--no-punctuate", action="store_false", dest="punctuate") | ||
| parser.add_argument("--spellcheck", action="store_true", default=True) | ||
| parser.add_argument("--no-spellcheck", action="store_false", dest="spellcheck") |
There was a problem hiding this comment.
These argparse options are configured with default=True + action="store_true", so passing --create-page / --punctuate / --spellcheck has no effect (they’re already true). This is confusing UX and makes --no-* flags the only meaningful toggles; consider dropping the redundant --create-page/--punctuate/--spellcheck flags or switching to the more typical default=False + --enable-* pattern.
|
I really like the new style of having the meeting author commit the page updates with extensive notes based on their immediate memory. Each section of news in Kaan's #2388 could easily be an independent page reference for significant network developments. The styling encoded in the script currently best adapts to the historic CAP discussions like how |
Warning
This relies on #2390 and #2361 which are blocking
This simple Python script creates a new meetings page with just a URL / video ID from YouTube. All new meetings use YouTube, so I didn't add support for older methods in #2362 or #2363.
It has quite some extra bells and whistles, which I think would be good to add into the repo on
mainbefore trimming down. One of which is the option to output the transcript into a text file. I've thrown a flag to allow that if others want to test presentation ideas on the source materials.I also set up more structure with a
Key PointsandResourcessection by default. All the past meetings follow this flow, where the notes highlight main topics and then refer to resources. Usually, those resources are links to CAPs - more to come in integrating those into tags later.In the past it would repeat links or ideas redundantly. This new flow encourages a clear list of where viewers can learn more.
As for the transcript logic, I believe this functionality will be invaluable for SEO. It will be much easier to search for developer chats when it follows the cleaned encoding, which presents topics in writing. Think quoted searches or even intra-docs CAP queries, which can now easily reference source materials from Core Devs and the Community.