Conversation
Greptile SummaryThis PR updates the Confidence Score: 4/5Safe to merge; all changes are non-breaking SEO/structured-data improvements with two minor P2 concerns worth addressing. The code changes are additive (new optional parameters, H1 fix), no runtime errors are introduced, and the blog content updates are editorial. Two P2 concerns remain: (1) raw markdown word count inflation could misreport schema.org wordCount, and (2) the H1 change affects all blog posts with title_tag. Neither blocks the happy path but both are worth a deliberate sign-off. pcweb/pages/blog/page.py — word_count calculation and H1 title logic change affecting all blog posts. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["blog/*.md\n(frontmatter: title, title_tag,\ndescription, meta keywords, faq)"] --> B["pcweb/pages/blog/page.py\npage()"]
B --> C["Extract keywords_list\nfrom meta[name=keywords]"]
B --> D["Compute word_count\nvia document.content.split()"]
B --> E["Use meta[title] for H1\n(not title_tag)"]
C --> F["blog_jsonld()\npcweb/meta/meta.py"]
D --> F
F --> G["JSON-LD BlogPosting node\n+ url, wordCount, keywords"]
F --> H["JSON-LD FAQPage node\n(if faq present)"]
B --> I["blog.py route setup"]
I --> J["seo_title = title_tag or title\nused for title + OG/Twitter meta"]
Reviews (1): Last reviewed commit: "updated enterprise ready ai app builder" | Re-trigger Greptile |
| # Compute word count from the document content. | ||
| word_count = ( | ||
| len(document.content.split()) | ||
| if hasattr(document, "content") and document.content | ||
| else None | ||
| ) |
There was a problem hiding this comment.
word_count includes non-prose tokens from raw markdown
document.content.split() operates on the raw markdown source, so the resulting word count will include code fence contents, markdown syntax tokens (e.g. ##, **, -), inline backtick spans, and full URLs. For posts with multiple code samples the inflated count could be substantial, and schema.org's wordCount is defined as "the number of words in the text of the Article."
A lightweight improvement is to strip code blocks and markdown syntax before counting:
import re
def _prose_word_count(content: str) -> int:
# Remove fenced code blocks
text = re.sub(r"```.*?```", "", content, flags=re.DOTALL)
# Remove inline code
text = re.sub(r"`[^`]+`", "", text)
# Remove markdown link syntax, keeping the label
text = re.sub(r"\[([^\]]+)\]\([^)]+\)", r"\1", text)
# Strip remaining markdown symbols
text = re.sub(r"[#*_>~|`\[\]()!]", " ", text)
return len(text.split())This wouldn't need to be perfect, but would give a much closer approximation to actual prose word count.
| rx.el.h1( | ||
| meta.get("title_tag") or meta["title"], | ||
| meta["title"], | ||
| class_name="lg:text-5xl text-3xl text-m-slate-12 dark:text-m-slate-3 font-[575] mb-6 text-center text-balance", |
There was a problem hiding this comment.
H1 change silently impacts all blog posts that defined title_tag
The diff removes the fallback:
- meta.get("title_tag") or meta["title"]
+ meta["title"]There are currently 20+ published blog posts (e.g. using-table-component.md, top-7-enterprise-ai-app-builders.md, structuring-a-large-app.md, etc.) where title_tag was chosen specifically to be a shorter or more keyword-rich variant of title. For those posts the H1 displayed to visitors will now change — in some cases from a longer SEO-optimised string to a shorter display string (or vice versa).
The comment in blog.py clarifies that title_tag is intended for <title> / OG / Twitter only, so this change does align with that documented intent. But it is worth confirming the switch is deliberate for all existing posts, not just this new one, since Google uses H1 content as a ranking signal and mismatching H1 with prior crawl history can temporarily affect rankings.
No description provided.