Story 2096: Implement Wagtail Integration#2100
Conversation
gregjkal
left a comment
There was a problem hiding this comment.
Per our conversation in standup: please test the homepage to make sure it won't break without manually migrating the existing testimonial and/or news items. If it does break, we need to either make these changes backwards-compatible (my strong preference!), with a follow-up change that cleans any bridge code up, or add a data migration so it can be fixed during the deploy instead of crashing until manual fixes.
julhoang
left a comment
There was a problem hiding this comment.
Hi @jlchilders11 ! This is awesome work! I just left a few of clarifying questions below.
Besides them, it looks like the HTML templates are now outdated. Do you have a next-step recommendation for us to update those templates to V3 versions? Also, do we need to update the V3 Create Post page and various other existing queries to query from Wagtail's PostPage instead of Entry?
Thanks so much for your work here!
a984bd7 to
bd274ee
Compare
For updating these to the V3 and existing things, I believe that should be handled in the data integration phase of those pages, which I believe are still outstanding tickets. If not, I do think that that work should be handled as part of a separate ticket, as it somewhat escapes the "migration" scope of this ticket. |
herzog0
left a comment
There was a problem hiding this comment.
Hey @jlchilders11 this is looking great!
I just have a couple of questions that I'll write in a Slack thread to keep it as a better reference for the future
julhoang
left a comment
There was a problem hiding this comment.
Hi @jlchilders11 , I've left a couple of suggestions below – please let me know your thoughts!
A few more things:
1/ I noticed that a few video entries are looking like this when they have a empty summary – can we make sure to not render that block in that case?
2/ The filters on the http://localhost:8000/pages/news/ are not redirecting to the new /pages path – I understand if this makes more sense to tackle in this ticket instead: #2378. If that's the case, would you mind adding a note for Katty so that she's aware?
| ctx["is_paginated"] = True | ||
| ctx["entry_list"] = pag.get_page(request.GET.get("page", 1)) | ||
| ctx["page_obj"] = ctx["entry_list"] | ||
| ctx["filters"] = POST_CONTENT_TYPES |
There was a problem hiding this comment.
Is the UI consuming this ctx["filters"]? It seems like it's only using ctx["filter_terms"]
There was a problem hiding this comment.
Good call. The filter terms should eventually be generated from the POST_CONTENT_TYPES, but that can be addressed once all content types have been established. Added a TODO to address that.
|
|
||
| def get_or_create_page(entry: Entry, index_page: PostIndexPage) -> PostPage: | ||
| try: | ||
| page = index_page.get_children().get(title=entry.title).specific |
There was a problem hiding this comment.
Nit: Hmm, I'm curious if there's a different key that can be more robust than title here – just a thought, no real action needed!
|
julhoang
left a comment
There was a problem hiding this comment.
Hi @jlchilders11 , I have 1 questions regarding the summary_dispatcher, others are just optional nits 🙏
|
|
||
| if not self.summary: | ||
| logger.info(f"Passing {self.pk=} to dispatcher") | ||
| summary_dispatcher.delay(self.pk) |
There was a problem hiding this comment.
I think this summary_dispatcher and the helper function that it calls set_summary_for_event_entry both are writing to the Entry model instead of our new Page model – should we handle issue in this PR?
| {% with body=object.get_content|default:object.visible_content %} | ||
| {% if body %}{{ body|markdown|urlize }}{% endif %} | ||
| {% endwith %} |
There was a problem hiding this comment.
Tiny nit since this will not be a likely issue in Prod:
While testing after the Wagtail migration script, I noticed a rendering issue with a sample post I'd previously created via the V3 Create Post page. Using WYSIWYG, the post body was originally saved as markdown, but after migration it's now wrapped in HTML, so the UI no longer renders it correctly (screenshots below). While this might not be an issue in Prod, I think it'd be nice if we can update the script to retain the markdown format just so that any other dev's markdown posts will still render correctly – totally up to you 😄
| "Entry" V3 page | Wagtail V3 (/pages) | Wagtail Admin |
|---|---|---|
![]() |
![]() |
![]() |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a Wagtail "pages" app (models, blocks, migrations, app config), a migration-based PostPage index, conversion tooling, and integrates PostPage with news tasks, utils, templates, routing, and settings while gating v3 pages via a waffle flag. ChangesPages App and Dual-Model Content System
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
news/utils.py (2)
27-35:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd timeouts to both outbound thumbnail requests.
Lines 28 and 35 call external URLs without a timeout. These run in Celery/backfill workers, so one slow upstream can pin a worker indefinitely and stall the queue.
🛡️ Proposed fix
- response = requests.get(url) + response = requests.get(url, timeout=10) @@ - thumbnail_file = File( - BytesIO(requests.get(json.get("thumbnail_url")).content), + thumbnail_response = requests.get(json.get("thumbnail_url"), timeout=10) + thumbnail_response.raise_for_status() + thumbnail_file = File( + BytesIO(thumbnail_response.content), name={f"{video.slug} Thumbnail"}, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@news/utils.py` around lines 27 - 35, The outbound requests to the OEmbed endpoint and to fetch the thumbnail are missing timeouts and can hang workers; update both requests.get calls (the one that uses YOUTUBE_OEMBED_ENDPOINT + f"?url={video.external_url}" and the one inside the thumbnail_file = File(BytesIO(requests.get(json.get("thumbnail_url")).content), ...)) to pass a timeout parameter (e.g. timeout=10 or a module-level/configurable constant like REQUEST_TIMEOUT), and optionally wrap them in a try/except for requests.Timeout/requests.RequestException to raise or log a clear error if the call times out.
34-37:⚠️ Potential issue | 🔴 CriticalPass a string filename to
File(...).Line 36 uses
name={f"{video.slug} Thumbnail"}, which creates aset; Django file handling expectsnameto be a string (so saving to aFileFieldcan error).🐛 Proposed fix
- thumbnail_file = File( - BytesIO(requests.get(json.get("thumbnail_url")).content), - name={f"{video.slug} Thumbnail"}, - ) + thumbnail_file = File( + BytesIO(requests.get(json.get("thumbnail_url")).content), + name=f"{video.slug} Thumbnail", + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@news/utils.py` around lines 34 - 37, The thumbnail_file creation passes a set to File(...).name by using name={f"{video.slug} Thumbnail"}; change this to pass a string instead (e.g. name=f"{video.slug} Thumbnail") so Django receives a proper filename string; update the File(...) invocation in the thumbnail_file creation (the call that wraps BytesIO(requests.get(json.get("thumbnail_url")).content)) to use a plain formatted string for name and ensure the BytesIO/requests/get/json.get/video.slug references remain unchanged.
🧹 Nitpick comments (4)
marketing/models.py (1)
164-164: ⚡ Quick winRemove the leftover request-path print.
This writes every outreach request path to stdout and bypasses the structured logging configured in settings. If you still need rollout visibility here, switch it to a logger debug call instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@marketing/models.py` at line 164, Remove the stray print(path_components) call that emits request paths to stdout; replace it with a structured logger debug call instead (e.g., use the module-level logger or logging.getLogger(__name__)). Locate the print(path_components) usage (referenced symbol: path_components / print(path_components)) and change it to logger.debug with a clear message and the path_components as structured data so it uses the configured logging backend rather than stdout.pages/models.py (2)
338-343: 💤 Low valueOptional: Use iterable unpacking for content_panels.
Consider using iterable unpacking instead of concatenation for consistency with modern Python style and to match the suggestion in
mixins.py.♻️ Proposed refactor
- content_panels = BasePage.content_panels + [ + content_panels = [ + *BasePage.content_panels, "content", "image", "summary", "video_thumbnail", ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pages/models.py` around lines 338 - 343, The content_panels assignment currently concatenates BasePage.content_panels with a list; change it to use iterable unpacking so it mirrors the style used in mixins.py. Update the content_panels definition on the Page model to produce a single list by unpacking BasePage.content_panels (e.g., using the * operator) followed by the added panel names ("content", "image", "summary", "video_thumbnail"), ensuring the symbol names content_panels and BasePage.content_panels are used so the intent and ordering remain identical.Source: Linters/SAST tools
144-185: ⚡ Quick winReduce duplication by generating filter_terms from POST_CONTENT_TYPES.
The filter terms are hard-coded here and duplicated in
news/views.py. The TODO comment on line 143 notes this should be generated fromPOST_CONTENT_TYPES. Consider implementing this now to reduce duplication and ensure consistency across the codebase.♻️ Proposed refactor to generate filter_terms
+ # Generate filter terms from POST_CONTENT_TYPES + filter_terms = [] + for content_type in POST_CONTENT_TYPES: + if content_type.filter_name: + filter_terms.append({ + "label": content_type.content_type + "s" if content_type.content_type != "News" else content_type.content_type, + "value": content_type.filter_name, + "url": self.get_url() + f"?type={content_type.filter_name}", + }) + else: + # "All" filter + filter_terms.insert(0, { + "label": "All", + "value": "all", + "url": self.get_url(), + }) + + # Add placeholder filters (Discussions, Achievements, Issues) + # TODO: Implement these content types when ready + filter_terms.extend([ { "label": "Discussions", "value": "discussions", "url": self.get_url(), }, { "label": "Achievements", "value": "achievements", "url": self.get_url(), }, { "label": "Issues", "value": "issues", "url": self.get_url(), }, - ] + ]) + ctx["filter_terms"] = filter_terms - ctx["filter_terms"] = [ - { - "label": "All", - "value": "all", - "url": self.get_url(), - }, - { - "label": "News", - "value": "news", - "url": self.get_url() + "?type=news", - }, - { - "label": "Blogs", - "value": "blogpost", - "url": self.get_url() + "?type=blogpost", - }, - { - "label": "Links", - "value": "link", - "url": self.get_url() + "?type=link", - }, - { - "label": "Videos", - "value": "video", - "url": self.get_url() + "?type=video", - }, - { - "label": "Discussions", - "value": "discussions", - "url": self.get_url(), - }, - { - "label": "Achievements", - "value": "achievements", - "url": self.get_url(), - }, - { - "label": "Issues", - "value": "issues", - "url": self.get_url(), - }, - ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pages/models.py` around lines 144 - 185, The filter_terms list in the pages model is hard-coded and duplicates logic in news/views.py; instead iterate over the POST_CONTENT_TYPES mapping to build each term so they stay consistent. Update the code that sets ctx["filter_terms"] (and any related method using get_url()) to loop through POST_CONTENT_TYPES items, creating dicts with "label" from the human-readable name, "value" from the content type key, and "url" using self.get_url() with "?type={value}" conditionally appended (ensure the "all" entry still maps to self.get_url() without a query). Replace the static list with this generated list so filter_terms is derived solely from POST_CONTENT_TYPES.pages/mixins.py (1)
41-41: 💤 Low valueOptional: Use iterable unpacking for content_panels.
Consider using iterable unpacking instead of concatenation for better readability and consistency with modern Python style.
♻️ Proposed refactor
- content_panels = Page.content_panels + ["tags"] + content_panels = [*Page.content_panels, "tags"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pages/mixins.py` at line 41, Replace the list concatenation used to extend content_panels with iterable unpacking for clarity: change the expression that sets content_panels (currently using Page.content_panels + ["tags"]) to use unpacking of Page.content_panels and the "tags" item so the result is built via a single list literal with *Page.content_panels and "tags"; update the assignment to the content_panels attribute in the pages/mixins.py mixin where content_panels and Page.content_panels are referenced.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/urls.py`:
- Around line 410-412: The new Wagtail mount at path("outreach/",
include(wagtail_urls)) is being shadowed by an earlier route handled by
WhitePaperView (and the existing outreach/<slug>/<slug> pattern) so requests
never reach marketing.OutreachHomePage.route(); fix by making the routing
non-overlapping — either move the Wagtail include for "outreach/" so it appears
before the WhitePaperView route in urls.py, or tighten the WhitePaperView
pattern (or its view logic) to exclude the Wagtail path/shape (e.g., require a
different prefix or more specific regex) so marketing.OutreachHomePage.route()
can receive outreach topic/detail requests.
In `@news/tasks.py`:
- Around line 191-200: set_summary_for_event_page currently grabs
page.content[0] (a StreamChild) and passes it to summarize_content, which can
crash because PostPage.content is a StreamField; instead iterate the StreamField
(PostPage.content), extract plain text from text-like blocks (e.g.
RichTextBlock, MarkdownBlock, TextBlock) by reading each StreamChild's
value/text, skip non-text blocks (e.g. PollBlock, URLBlock), join those pieces
into a single str, then use that string for the length check and as the first
arg to summarize_content; update usage sites in set_summary_for_event_page (and
keep save_page_summary_value as the callback) so summarize_content.apply_async
receives the cleaned plain text.
In `@news/utils.py`:
- Around line 22-25: The PostPage guard in set_thumbnail_for_video_page is
inverted: change the condition for PostPage from "not video.post_content_type !=
'video'" to check inequality directly so only non-video pages raise;
specifically, locate the PostPage branch that currently reads "elif
isinstance(video, PostPage) and not video.post_content_type != 'video'" and
replace it with a direct inequality check ("and video.post_content_type !=
'video'") so valid video pages proceed and only pages whose post_content_type is
not "video" trigger the Exception.
In `@pages/blocks.py`:
- Around line 10-14: CustomVideoBlock is missing the thumbnail child so the
StreamField StructBlock schema no longer matches existing migrations and stored
data; restore schema-compatibility by adding a thumbnail field to
CustomVideoBlock (e.g. thumbnail = ImageChooserBlock(required=False)) and import
ImageChooserBlock at the top, ensuring the class definition (CustomVideoBlock)
and templates remain compatible with PostPage.video_thumbnail; alternatively, if
you intend to move thumbnails off the block, create a migration that removes
thumbnail from the content StreamField and include a data migration to strip
thumbnail entries from existing stored JSON, but the simplest fix is to re-add
the thumbnail ImageChooserBlock to CustomVideoBlock.
In `@pages/management/commands/convert_news_entries.py`:
- Around line 71-73: The code currently raises a bare Exception ("No Post Index
Page found...") which should be replaced with a command-native exception to
produce cleaner CLI output; update the raise in convert_news_entries.py (the
raise Exception(...) call) to raise a click.ClickException (or Django's
CommandError if this is a Django management command) and add the appropriate
import (click or CommandError from django.core.management.base) so the command
fails with a concise, user-friendly error message.
- Around line 49-52: Replace title-based dedupe with a stable file identity:
change the Image.objects.get_or_create call that currently uses
get_or_create(title=image.name, ...) to use the image file as the lookup key
(e.g., get_or_create(file=image.name) or
get_or_create(file=<file-field-or-filename>)) so you match existing Image
records by their file, not title; keep the same defaults (width, height, file)
and assign the result to wagtail_image as before (reference:
Image.objects.get_or_create, wagtail_image, image).
In `@pages/tests.py`:
- Line 1: pages/tests.py is empty so conversion logic in convert_news_entries
lacks coverage; add tests that exercise idempotent upsert behavior, content
block mapping, and image/video transfer. Write unit tests targeting the
convert_news_entries function: create fixture source entries with varied content
blocks and media, run convert_news_entries once and assert created targets and
correct mapped blocks/media, run it again and assert no duplicates (idempotent
upsert), and include edge cases (missing media, empty blocks) to validate
conversions and error handling. Ensure tests use the same symbols
convert_news_entries and any helper mappers/uploaders used in the diff so they
fail fast if mapping logic changes.
In `@templates/blocks/custom_video_block.html`:
- Line 1: The template currently bypasses escaping by outputting the EmbedBlock
with "{{ self.video|safe }}"; instead render the EmbedBlock with Wagtail's
include_block so the block's own safe rendering is used (use the {%
include_block %} tag targeting self.video), removing the "|safe" usage and
ensuring the EmbedBlock template is invoked (look for occurrences of
"self.video" and the template "templates/blocks/custom_video_block.html" to
update).
---
Outside diff comments:
In `@news/utils.py`:
- Around line 27-35: The outbound requests to the OEmbed endpoint and to fetch
the thumbnail are missing timeouts and can hang workers; update both
requests.get calls (the one that uses YOUTUBE_OEMBED_ENDPOINT +
f"?url={video.external_url}" and the one inside the thumbnail_file =
File(BytesIO(requests.get(json.get("thumbnail_url")).content), ...)) to pass a
timeout parameter (e.g. timeout=10 or a module-level/configurable constant like
REQUEST_TIMEOUT), and optionally wrap them in a try/except for
requests.Timeout/requests.RequestException to raise or log a clear error if the
call times out.
- Around line 34-37: The thumbnail_file creation passes a set to File(...).name
by using name={f"{video.slug} Thumbnail"}; change this to pass a string instead
(e.g. name=f"{video.slug} Thumbnail") so Django receives a proper filename
string; update the File(...) invocation in the thumbnail_file creation (the call
that wraps BytesIO(requests.get(json.get("thumbnail_url")).content)) to use a
plain formatted string for name and ensure the
BytesIO/requests/get/json.get/video.slug references remain unchanged.
---
Nitpick comments:
In `@marketing/models.py`:
- Line 164: Remove the stray print(path_components) call that emits request
paths to stdout; replace it with a structured logger debug call instead (e.g.,
use the module-level logger or logging.getLogger(__name__)). Locate the
print(path_components) usage (referenced symbol: path_components /
print(path_components)) and change it to logger.debug with a clear message and
the path_components as structured data so it uses the configured logging backend
rather than stdout.
In `@pages/mixins.py`:
- Line 41: Replace the list concatenation used to extend content_panels with
iterable unpacking for clarity: change the expression that sets content_panels
(currently using Page.content_panels + ["tags"]) to use unpacking of
Page.content_panels and the "tags" item so the result is built via a single list
literal with *Page.content_panels and "tags"; update the assignment to the
content_panels attribute in the pages/mixins.py mixin where content_panels and
Page.content_panels are referenced.
In `@pages/models.py`:
- Around line 338-343: The content_panels assignment currently concatenates
BasePage.content_panels with a list; change it to use iterable unpacking so it
mirrors the style used in mixins.py. Update the content_panels definition on the
Page model to produce a single list by unpacking BasePage.content_panels (e.g.,
using the * operator) followed by the added panel names ("content", "image",
"summary", "video_thumbnail"), ensuring the symbol names content_panels and
BasePage.content_panels are used so the intent and ordering remain identical.
- Around line 144-185: The filter_terms list in the pages model is hard-coded
and duplicates logic in news/views.py; instead iterate over the
POST_CONTENT_TYPES mapping to build each term so they stay consistent. Update
the code that sets ctx["filter_terms"] (and any related method using get_url())
to loop through POST_CONTENT_TYPES items, creating dicts with "label" from the
human-readable name, "value" from the content type key, and "url" using
self.get_url() with "?type={value}" conditionally appended (ensure the "all"
entry still maps to self.get_url() without a query). Replace the static list
with this generated list so filter_terms is derived solely from
POST_CONTENT_TYPES.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 73ccd1cf-0c21-49fd-839a-61d4d5efbef2
📒 Files selected for processing (30)
config/settings.pyconfig/urls.pycore/mixins.pymarketing/models.pynews/management/commands/backpopulate_thumbnails.pynews/models.pynews/tasks.pynews/utils.pynews/views.pypages/__init__.pypages/admin.pypages/apps.pypages/blocks.pypages/management/commands/convert_news_entries.pypages/migrations/0001_initial.pypages/migrations/0002_postpage_video_thumbnail.pypages/migrations/0003_alter_postpage_content.pypages/migrations/__init__.pypages/mixins.pypages/models.pypages/tests.pypages/views.pypages/wagtail_hooks.pystatic/css/v3/post-detail.cssstatic/css/v3/posts-list.csstemplates/blocks/custom_video_block.htmltemplates/news/v3/detail.htmltemplates/pages/routable_home_page.htmltemplates/v3/includes/_post_list_card.htmltemplates/v3/posts_list.html
| + [ | ||
| path("outreach/", include(wagtail_urls)), | ||
| path("pages/", include(wagtail_urls)), |
There was a problem hiding this comment.
The new /outreach/... Wagtail mount is still shadowed by WhitePaperView.
Line 133 already matches outreach/<slug>/<slug>, which is the same URL shape marketing.OutreachHomePage.route() now expects for topic/detail pages. Those requests will never reach this Wagtail include block, so the new outreach detail routing stays unreachable until the URL shapes stop overlapping.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/urls.py` around lines 410 - 412, The new Wagtail mount at
path("outreach/", include(wagtail_urls)) is being shadowed by an earlier route
handled by WhitePaperView (and the existing outreach/<slug>/<slug> pattern) so
requests never reach marketing.OutreachHomePage.route(); fix by making the
routing non-overlapping — either move the Wagtail include for "outreach/" so it
appears before the WhitePaperView route in urls.py, or tighten the
WhitePaperView pattern (or its view logic) to exclude the Wagtail path/shape
(e.g., require a different prefix or more specific regex) so
marketing.OutreachHomePage.route() can receive outreach topic/detail requests.
| page = PostPage.objects.get(pk=pk) | ||
| content = page.content[0] | ||
| logger.info(f"dispatching summarize task for {pk=} with {content[:40]=}...") | ||
| if content and len(content) < CONTENT_SUMMARIZATION_THRESHOLD: | ||
| logger.warning(f"Content too short to summarize for {pk=}, skipping.") | ||
| return | ||
| logger.info(f"handing off {pk=} to summarize_content task") | ||
| summarize_content.apply_async( | ||
| (content, page.title, "gpt-oss-120b"), | ||
| link=save_page_summary_value.s(pk), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and print relevant parts of the file around the reported lines
sed -n '150,240p' news/tasks.py | nl -ba
# Find PostPage model definition and inspect the 'content' field type/semantics
rg -n "class PostPage|PostPage" -S . -g'*.py' --hidden
# Try common locations
ls -la news || true
fd -g'*.py' PostPage . 2>/dev/null | head
# Search for 'content' field definition related to PostPage
rg -n "content\s*=|StreamField|page\.content" news -S --hidden -g'*.py'
rg -n "StreamField" -S . -g'*.py' --hidden | head -n 50
# Search for any accessor that converts content to text
rg -n "get_.*text|to_text|stream.*text|content.*text|content_text|body_text" -S news . --hidden -g'*.py'Repository: boostorg/website-v2
Length of output: 106
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
p = Path("news/tasks.py")
start, end = 160, 220
lines = p.read_text().splitlines()
for i in range(start, min(end, len(lines)) + 1):
print(f"{i:4d}: {lines[i-1]}")
PY
echo "---- set_summary_for_event_page context ----"
rg -n "def set_summary_for_event_page|set_summary_for_event_page" news/tasks.py
echo "---- search for PostPage definition ----"
rg -n "class PostPage\b" -S . -g'*.py' --hidden
echo "---- find PostPage.content field definition ----"
rg -n "PostPage.*content\s*=|content\s*=\s*StreamField|content\s*=" -S . -g'*.py' --hidden
echo "---- find direct indexing usage of page.content ----"
rg -n "page\.content\[[0-9]+\]" -S . -g'*.py' --hidden
echo "---- locate StreamField usage and any text conversion helpers ----"
rg -n "StreamField" -S . -g'*.py' --hidden | head -n 200
rg -n "to_text|get_.*text|stream.*text|content.*text|content_text|body_text|plain.*text" -S . -g'*.py' --hidden | head -n 200
echo "---- locate summarize_content task signature/implementation ----"
rg -n "def summarize_content|summarize_content" -S news -g'*.py' --hiddenRepository: boostorg/website-v2
Length of output: 29717
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
p = Path("pages/models.py")
lines = p.read_text().splitlines()
# Show around POST_BLOCKS and PostPage.content definition
for start,end in [(1,140),(170,260),(260,360)]:
for i in range(start, min(end, len(lines))+1):
if i-1 < len(lines):
print(f"{i:4d}: {lines[i-1]}")
print("----")
PY
echo "---- search for PostPage text/accessor helpers ----"
rg -n "class PostPage\b|def get_.*(text|content)|content_.*text|to_text|stream_.*text|plain.*text" pages/models.py -S
echo "---- search for POST_BLOCKS definition usage ----"
rg -n "POST_BLOCKS" pages/models.py -S
echo "---- inspect summarize_content implementation for input expectations ----"
sed -n '1,260p' news/tasks.py | nl -ba | head -n 220 2>/dev/null || true
# nl isn't available; print with python if needed
python3 - <<'PY'
from pathlib import Path
p = Path("news/tasks.py")
lines = p.read_text().splitlines()
for i in range(1,120):
if i-1 < len(lines):
print(f"{i:4d}: {lines[i-1]}")
PYRepository: boostorg/website-v2
Length of output: 16714
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- inspect pages/blocks.py (POST_BLOCKS) ----"
python3 - <<'PY'
from pathlib import Path
p=Path("pages/blocks.py")
lines=p.read_text().splitlines()
print(f"total lines: {len(lines)}")
# print first 220 lines; if smaller fine
for i in range(1, min(260, len(lines))+1):
print(f"{i:4d}: {lines[i-1]}")
PY
echo "---- find any conversion helpers to plain text in pages/models.py / blocks / news/helpers ----"
rg -n "extract_content|get_text|plain\s*text|to_text|stream_.*text|as_text" -S pages news core -g'*.py' --hidden | head -n 200
echo "---- inspect news/helpers.py extract_content ----"
sed -n '1,220p' news/helpers.py
echo "---- inspect news/tasks.py around set_summary_for_event_page for how it expects content ----"
python3 - <<'PY'
from pathlib import Path
p=Path("news/tasks.py")
lines=p.read_text().splitlines()
for i in range(170, 230):
if 0<=i-1<len(lines):
print(f"{i:4d}: {lines[i-1]}")
PY
echo "---- inspect any PostPage template usage of content/text ----"
rg -n "visible_content|get_content|content\[0\]|stream_content_type" -S pages/templates news/templates -g'*.html' --hidden | head -n 200Repository: boostorg/website-v2
Length of output: 6524
Fix StreamField handling in set_summary_for_event_page (pass plain text to the summarizer)
set_summary_for_event_page uses content = page.content[0], but PostPage.content is a StreamField over POST_BLOCKS (e.g., RichTextBlock, MarkdownBlock, URLBlock, PollBlock). The code then slices/len()s content and forwards it to summarize_content, which expects a str (and itself does content[:100]), so this can crash or send the wrong data. Extract the underlying text from the StreamChild (and/or skip non-text block types) before the length check and apply_async.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@news/tasks.py` around lines 191 - 200, set_summary_for_event_page currently
grabs page.content[0] (a StreamChild) and passes it to summarize_content, which
can crash because PostPage.content is a StreamField; instead iterate the
StreamField (PostPage.content), extract plain text from text-like blocks (e.g.
RichTextBlock, MarkdownBlock, TextBlock) by reading each StreamChild's
value/text, skip non-text blocks (e.g. PollBlock, URLBlock), join those pieces
into a single str, then use that string for the length check and as the first
arg to summarize_content; update usage sites in set_summary_for_event_page (and
keep save_page_summary_value as the callback) so summarize_content.apply_async
receives the cleaned plain text.
| if isinstance(video, Video) and not video.is_video: | ||
| raise Exception(f"{video} is not a video, cannot set thumbnail.") | ||
| elif isinstance(video, PostPage) and not video.post_content_type != "video": | ||
| raise Exception(f"{video}'s content is not a video, cannot set thumbnail.") |
There was a problem hiding this comment.
Fix the PostPage video-type guard.
not video.post_content_type != "video" is equivalent to video.post_content_type == "video", so valid video pages raise here and set_thumbnail_for_video_page never succeeds.
🐛 Proposed fix
- elif isinstance(video, PostPage) and not video.post_content_type != "video":
+ elif isinstance(video, PostPage) and video.post_content_type != "video":
raise Exception(f"{video}'s content is not a video, cannot set thumbnail.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@news/utils.py` around lines 22 - 25, The PostPage guard in
set_thumbnail_for_video_page is inverted: change the condition for PostPage from
"not video.post_content_type != 'video'" to check inequality directly so only
non-video pages raise; specifically, locate the PostPage branch that currently
reads "elif isinstance(video, PostPage) and not video.post_content_type !=
'video'" and replace it with a direct inequality check ("and
video.post_content_type != 'video'") so valid video pages proceed and only pages
whose post_content_type is not "video" trigger the Exception.
| class CustomVideoBlock(StructBlock): | ||
| video = EmbedBlock() | ||
|
|
||
| class Meta: | ||
| template = "blocks/custom_video_block.html" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the serialized block structure in migration 0003
rg -A 10 "StructBlock.*video.*thumbnail" pages/migrations/Repository: boostorg/website-v2
Length of output: 45
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== List migrations in pages/migrations =="
ls -la pages/migrations | sed -n '1,200p'
echo
echo "== Show migration 0003 file =="
MIG_FILE="$(ls pages/migrations | rg '^0003_.*\.py$' || true)"
if [ -z "${MIG_FILE}" ]; then
echo "Could not find 0003_*.py in pages/migrations"
exit 0
fi
echo "Found: ${MIG_FILE}"
sed -n '1,200p' "pages/migrations/${MIG_FILE}"
echo
echo "== Show lines around where the review comment claims line 34 is =="
sed -n '20,80p' "pages/migrations/${MIG_FILE}"
echo
echo "== Inspect CustomVideoBlock in pages/blocks.py =="
sed -n '1,120p' pages/blocks.py
echo
echo "== Search for 'CustomVideoBlock' usage across repo =="
rg -n "CustomVideoBlock" -S .
echo
echo "== Search for thumbnail references in migration files =="
rg -n "thumbnail" pages/migrations -S
echo
echo "== Search for StructBlock definition of CustomVideoBlock in blocks.py =="
rg -n "class CustomVideoBlock|EmbedBlock\(|thumbnail|ImageChooserBlock" pages/blocks.py -nRepository: boostorg/website-v2
Length of output: 4590
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for PostPage.video_thumbnail usage =="
rg -n "video_thumbnail" pages -S
echo
echo "== Inspect CustomVideoBlock template =="
fd -a "custom_video_block\.html" -S . | head -n 20 || true
# If found, print it (bounded)
TEMPLATE_PATH="$(fd -a "custom_video_block\.html" -S . | head -n 1 || true)"
if [ -n "${TEMPLATE_PATH}" ]; then
echo "Using: ${TEMPLATE_PATH}"
sed -n '1,200p' "${TEMPLATE_PATH}"
fi
echo
echo "== Inspect PostPage model field definitions =="
# Try likely files first
for f in pages/models.py pages/models/*.py pages/models/__init__.py; do
if [ -f "$f" ]; then
echo "== File: $f =="
rg -n "class PostPage|video_thumbnail|content\s*=" "$f" -S || true
fi
done
# Also search for "class PostPage" across pages/ safely
echo
echo "== Locate class PostPage definition =="
rg -n "class PostPage" pages -S
POSTPAGE_FILE="$(rg -l "class PostPage" pages -S | head -n 1 || true)"
if [ -n "${POSTPAGE_FILE}" ]; then
echo "Using: ${POSTPAGE_FILE}"
sed -n '1,240p' "${POSTPAGE_FILE}"
fiRepository: boostorg/website-v2
Length of output: 8618
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate custom_video_block.html in repo =="
fd -t f "custom_video_block\.html$" . -a || true
echo
echo "== Search for custom_video_block.html references =="
rg -n "custom_video_block\.html" . -S || true
echo
echo "== Search for 'video_thumbnail' usage in templates/static =="
rg -n "video_thumbnail" . -S --glob '!**/migrations/**' || true
echo
echo "== Inspect management/commands/convert_news_entries.py (thumbnail handling) =="
sed -n '80,160p' pages/management/commands/convert_news_entries.py || true
# (If path differs, list likely command files)
if [ ! -f pages/management/commands/convert_news_entries.py ]; then
fd -t f "convert_news_entries\.py$" . -a || true
fi
echo
echo "== Inspect news/tasks/set_thumbnail_for_video_page =="
rg -n "set_thumbnail_for_video_page" -S . || true
TASK_FILE="$(rg -l "set_thumbnail_for_video_page" -S . | head -n 1 || true)"
if [ -n "${TASK_FILE}" ]; then
sed -n '1,200p' "${TASK_FILE}"
fi
echo
echo "== Inspect migration 0002 and 0001 for video/thumbnail block schema =="
sed -n '1,120p' pages/migrations/0002_postpage_video_thumbnail.py
sed -n '120,220p' pages/migrations/0001_initial.py
echo
echo "== Inspect migration 0003 (full block_lookup snippet) =="
sed -n '1,120p' pages/migrations/0003_alter_postpage_content.pyRepository: boostorg/website-v2
Length of output: 15302
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Inspect templates/blocks/custom_video_block.html =="
sed -n '1,200p' templates/blocks/custom_video_block.html
echo
echo "== Inspect pages/models.py around video thumbnail logic =="
sed -n '220,320p' pages/models.py
echo
echo "== Inspect news/utils.py set_video_thumbnail =="
sed -n '1,220p' news/utils.py
echo
echo "== Search templates for '.thumbnail' inside the video block include =="
rg -n "thumbnail" templates/blocks templates/v3 -S || true
echo
echo "== Search for StreamField access to video block thumbnail key =="
rg -n "content.*thumbnail|video.*thumbnail|thumbnail.*video" pages news templates -S || trueRepository: boostorg/website-v2
Length of output: 8576
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for any usage of stream-block thumbnail key in templates/python =="
rg -n "content.*thumbnail|thumbnail.*content|block\.thumbnail|\.\s*thumbnail\b" pages news templates -S || true
echo
echo "== Search for video block field access (self.video / value.video) =="
rg -n "content.*\bvideo\b|\.video\b" templates/blocks templates/v3 pages news -S || true
echo
echo "== Search for StreamField access to thumbnail specifically within video struct =="
rg -n "content__0__.*thumbnail|content\[['\"].*thumbnail|content__.*thumbnail" pages news -S || trueRepository: boostorg/website-v2
Length of output: 1871
Critical: CustomVideoBlock missing thumbnail so the PostPage.content StreamField video StructBlock schema won’t match existing stored data/migrations.
File: pages/blocks.py (Lines 10-14)
class CustomVideoBlock(StructBlock):
video = EmbedBlock()
class Meta:
template = "blocks/custom_video_block.html"The video block schema in pages/migrations/0001_initial.py and pages/migrations/0003_alter_postpage_content.py defines a StructBlock with both ("video", ...) and ("thumbnail", ...). The current CustomVideoBlock defines only video, while the templates currently render the thumbnail from PostPage.video_thumbnail (model-level FK), not from the struct block.
🔧 Proposed fix: Align code vs StreamField schema
Option 1 (schema-compat / simplest): add the missing thumbnail child block back to CustomVideoBlock (and import it), e.g. thumbnail = ImageChooserBlock(required=False).
Option 2 (if you intend to store thumbnails only on PostPage.video_thumbnail): create a migration that removes thumbnail from the content StreamField video StructBlock and includes a data migration to strip thumbnail from existing content JSON values (editing 0003 alone won’t migrate existing rows).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pages/blocks.py` around lines 10 - 14, CustomVideoBlock is missing the
thumbnail child so the StreamField StructBlock schema no longer matches existing
migrations and stored data; restore schema-compatibility by adding a thumbnail
field to CustomVideoBlock (e.g. thumbnail = ImageChooserBlock(required=False))
and import ImageChooserBlock at the top, ensuring the class definition
(CustomVideoBlock) and templates remain compatible with
PostPage.video_thumbnail; alternatively, if you intend to move thumbnails off
the block, create a migration that removes thumbnail from the content
StreamField and include a data migration to strip thumbnail entries from
existing stored JSON, but the simplest fix is to re-add the thumbnail
ImageChooserBlock to CustomVideoBlock.
| wagtail_image, _ = Image.objects.get_or_create( | ||
| title=image.name, | ||
| defaults={"width": image.width, "height": image.height, "file": image}, | ||
| ) |
There was a problem hiding this comment.
Use a stable image identity; title-based dedupe can corrupt mappings.
At Line 49, get_or_create(title=image.name, ...) can bind the wrong existing Image when different files share the same title/name, causing incorrect media on migrated posts.
Proposed fix
- wagtail_image, _ = Image.objects.get_or_create(
- title=image.name,
- defaults={"width": image.width, "height": image.height, "file": image},
- )
+ # Avoid title-only matching; prefer file identity.
+ wagtail_image = Image.objects.filter(file=image.name).first()
+ if not wagtail_image:
+ wagtail_image = Image.objects.create(
+ title=image.name,
+ width=image.width,
+ height=image.height,
+ file=image,
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pages/management/commands/convert_news_entries.py` around lines 49 - 52,
Replace title-based dedupe with a stable file identity: change the
Image.objects.get_or_create call that currently uses
get_or_create(title=image.name, ...) to use the image file as the lookup key
(e.g., get_or_create(file=image.name) or
get_or_create(file=<file-field-or-filename>)) so you match existing Image
records by their file, not title; keep the same defaults (width, height, file)
and assign the result to wagtail_image as before (reference:
Image.objects.get_or_create, wagtail_image, image).
| @@ -0,0 +1 @@ | |||
| # Create your tests here. | |||
There was a problem hiding this comment.
Add migration/conversion test coverage before merge.
pages/tests.py is still a placeholder (Line 1), so critical conversion paths in convert_news_entries are untested (idempotent upsert, content block mapping, image/video transfer). This is a high-risk gap for a data migration feature.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pages/tests.py` at line 1, pages/tests.py is empty so conversion logic in
convert_news_entries lacks coverage; add tests that exercise idempotent upsert
behavior, content block mapping, and image/video transfer. Write unit tests
targeting the convert_news_entries function: create fixture source entries with
varied content blocks and media, run convert_news_entries once and assert
created targets and correct mapped blocks/media, run it again and assert no
duplicates (idempotent upsert), and include edge cases (missing media, empty
blocks) to validate conversions and error handling. Ensure tests use the same
symbols convert_news_entries and any helper mappers/uploaders used in the diff
so they fail fast if mapping logic changes.
| @@ -0,0 +1 @@ | |||
| {{ self.video|safe }} | |||
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In current Wagtail docs, what is the recommended way to render an EmbedBlock value inside a StructBlock template: {{ value.video }}, {{ value.video|safe }}, or {% include_block value.video %}?
💡 Result:
Recommended: use {% include_block value.video %} (not {{ value.video }} and not {{ value.video|safe }}). In Wagtail templates, {{ value.video }} outputs the plain Python value; for StreamField/StructBlock child access, Wagtail documents that children of a StructBlock (e.g. value.heading) return a plain value, so {% include_block value.heading %} would render as plain text unless you explicitly include the block’s HTML rendering via the correct include mechanism [1]. The Wagtail docs describe using {% include_block %} to include a block’s HTML representation in templates [2]. Therefore, inside your StructBlock template, render the EmbedBlock with {% include_block value.video %} so Wagtail can use the EmbedBlock’s own template/HTML output [2].
Citations:
- 1: https://docs.wagtail.org/en/7.3/advanced_topics/embeds.html
- 2: https://docs.wagtail.org/en/v7.4/topics/streamfield.html
🌐 Web query:
Wagtail EmbedBlock template rendering include_block safe filter discouraged
💡 Result:
In Wagtail, the use of the safe filter with {% include_block %} is generally discouraged because it bypasses Django's automatic HTML escaping, which is a critical security measure to prevent Cross-Site Scripting (XSS) attacks [1]. Wagtail has implemented strict HTML escaping for the {% include_block %} tag to ensure that content authored in the Wagtail admin—particularly in blocks like CharBlock or TextBlock that do not have custom templates—is safely rendered [1]. If you explicitly mark content as safe or use {% autoescape off %}, you risk allowing arbitrary HTML or script injection if those blocks are editable by users [1]. Regarding the rendering of blocks: 1. Recommended Approach: Use the {% include_block %} tag directly on your block instances [2][3]. This respects Django's autoescaping rules [4][5]. If a block has a custom template defined, Wagtail will use it; otherwise, it will use appropriate default rendering [2][3]. 2. EmbedBlock Specifics: You should use {% include_block %} for EmbedBlock instances within a StreamField [2][3]. Do not attempt to use the {% embed %} tag on an EmbedBlock directly, as the block already holds the rendered HTML fragment, not just the URL [6]. 3. Security Context: The need for escaping became particularly relevant after a vulnerability (GHSA-xfrw-hxr5-ghqf) where improper escaping in basic field blocks could be exploited by authors [1]. Site implementors are advised to keep autoescaping enabled [1]. If you have a legitimate need to render raw HTML (e.g., from a RawHTMLBlock), Wagtail's block system handles this automatically because such blocks are designed to return 'safe' marked strings [4][5]. If you find yourself reaching for the safe filter, it is likely an indication that you should instead use a block type specifically designed to hold safe/raw HTML or define a custom template for your block.
Citations:
- 1: GHSA-xfrw-hxr5-ghqf
- 2: https://docs.wagtail.org/en/7.2/topics/streamfield.html
- 3: https://docs.wagtail.org/en/v7.4/topics/streamfield.html
- 4: wagtail/wagtail@11b6b34
- 5: wagtail/wagtail@c564c29
- 6: embed template tag seems to be broken wagtail/wagtail#7894
Avoid bypassing escaping for EmbedBlock output
Line 1 uses |safe, which defeats Django/Wagtail’s HTML escaping protections. Render the EmbedBlock via Wagtail’s {% include_block %} so the EmbedBlock’s own template/HTML rendering is used safely.
Suggested fix
-{{ self.video|safe }}
+{% load wagtailcore_tags %}
+{% include_block self.video %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{ self.video|safe }} | |
| {% load wagtailcore_tags %} | |
| {% include_block self.video %} |
🧰 Tools
🪛 HTMLHint (1.9.2)
[error] 1-1: Doctype must be declared before any non-comment content.
(doctype-first)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@templates/blocks/custom_video_block.html` at line 1, The template currently
bypasses escaping by outputting the EmbedBlock with "{{ self.video|safe }}";
instead render the EmbedBlock with Wagtail's include_block so the block's own
safe rendering is used (use the {% include_block %} tag targeting self.video),
removing the "|safe" usage and ensuring the EmbedBlock template is invoked (look
for occurrences of "self.video" and the template
"templates/blocks/custom_video_block.html" to update).
javiercoronadonarvaez
left a comment
There was a problem hiding this comment.
LGTM with the latest PR comments. Just a matter of solving the merge conflicts.
herzog0
left a comment
There was a problem hiding this comment.
Heya, sorry for the delay, still looking through some things. It's a bit difficult to test all cases as this integration is a bit large. But I found some things that, unfortunately, can't be postponed to a follow-up ticket for improvements.




Implement Wagtail Integrations:
Mixins
TaggableMixin- Adds the ability to tag pages for categorizationRouting
RoutableHomePage- New home page for all wagtail interactions that allows for forwarding routing to child pages, e.g. if you get to wagtail through/outreach/, you will be routed through theOutreachHomePagePosts
This is the bulk of the work, and is a porting of the existing
EntryModel to WagtailPostIndexPage- Index Page for Posts, implements most of the logic in formatting found in thenews.views.EntryListView. NOTE: this uses queryparams for filtering the child pages, rather than seperate URLsPostPage- Functional translation ofEntry, but uses a stream field rather than having separate models for different types of content. These can be filtered by the content of the stream field blockNOTES:
DEPLOYMENT NOTES:
When deploying this, a
RoutableHomePageshould be created and the existingOutreachHomePageshould be moved under it to respect the new routing methods.Summary by CodeRabbit
New Features
Improvements