ENG-8154: update check_dead_links script and fix broken links#1682
Merged
ENG-8154: update check_dead_links script and fix broken links#1682
Conversation
Contributor
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
Updated the dead link checker script to skip image URLs and added data: URI filtering. Fixed multiple broken documentation links by correcting template syntax, adding missing protocols, and updating internal paths.
- Critical Issue: Logic bug in
scripts/check_dead_links.pywhere external links are no longer checked at all (lines 160-167) - Fixed broken link in
case-studies/ansa.mdby adding missinghttps://protocol - Corrected template syntax links in multiple documentation files to use direct URL paths
- Updated
pcweb/pages/gallery/apps.pyto filter outreflex_build_templates/from display and redirect toopen_source_templates - Added URL redirects in
pcweb/pcweb.pyfor legacy documentation paths
Confidence Score: 1/5
- Critical logic bug prevents external link validation
- The indentation change in
scripts/check_dead_links.pycauses external links to be completely skipped during validation. The original code checked all links (internal and external) but only queued internal links for crawling. Now only internal, non-image links are checked, breaking the tool's core functionality. All other changes are safe link fixes. scripts/check_dead_links.pyrequires immediate fix before merge
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| scripts/check_dead_links.py | 1/5 | Added image URL detection and data: URI filtering, but introduced critical logic bug that skips checking external links entirely |
| pcweb/pages/gallery/apps.py | 5/5 | Updated import, filtered reflex_build_templates from display, and fixed gallery link to open_source_templates path |
| pcweb/pcweb.py | 5/5 | Added redirects for legacy documentation paths to new locations |
Sequence Diagram
sequenceDiagram
participant User
participant Script as check_dead_links.py
participant Sitemap
participant Pages as Internal Pages
participant Checker as Link Checker
User->>Script: Run script with base URL
Script->>Sitemap: Request sitemap.xml
Sitemap-->>Script: Return list of URLs
loop For each URL in sitemap
Script->>Pages: Crawl page
Pages-->>Script: Return HTML with links
Script->>Script: Extract all links
Script->>Script: Filter out special URIs
loop For each extracted link
alt Is image URL
Script->>Script: Skip checking (NEW)
else Not image URL
alt Is internal link
Script->>Checker: check_link(url)
Script->>Script: Add to pages_to_visit
else Is external link
Note right of Script: BUG: External links<br/>no longer checked!
end
end
end
end
Script-->>User: Report dead links found
10 files reviewed, 1 comment
Comment on lines
160
to
167
| for link in links: | ||
| self.check_link(link, url) | ||
| # Only check internal links and skip images | ||
| if self.is_internal_url(link) and not self.is_image_url(link): | ||
| self.check_link(link, url) | ||
|
|
||
| if self.is_internal_url(link): | ||
| normalized = self.normalize_url(link) | ||
| if normalized not in self.visited_pages: | ||
| self.pages_to_visit.append(normalized) |
Contributor
There was a problem hiding this comment.
logic: external links are no longer being checked. Original code checked ALL links but only queued internal ones for crawling. Now external links are completely skipped.
Suggested change
| for link in links: | |
| self.check_link(link, url) | |
| # Only check internal links and skip images | |
| if self.is_internal_url(link) and not self.is_image_url(link): | |
| self.check_link(link, url) | |
| if self.is_internal_url(link): | |
| normalized = self.normalize_url(link) | |
| if normalized not in self.visited_pages: | |
| self.pages_to_visit.append(normalized) | |
| for link in links: | |
| # Skip images for link checking | |
| if not self.is_image_url(link): | |
| self.check_link(link, url) | |
| # Only queue internal, non-image links for crawling | |
| if self.is_internal_url(link) and not self.is_image_url(link): | |
| normalized = self.normalize_url(link) | |
| if normalized not in self.visited_pages: | |
| self.pages_to_visit.append(normalized) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/check_dead_links.py
Line: 160:167
Comment:
**logic:** external links are no longer being checked. Original code checked ALL links but only queued internal ones for crawling. Now external links are completely skipped.
```suggestion
for link in links:
# Skip images for link checking
if not self.is_image_url(link):
self.check_link(link, url)
# Only queue internal, non-image links for crawling
if self.is_internal_url(link) and not self.is_image_url(link):
normalized = self.normalize_url(link)
if normalized not in self.visited_pages:
self.pages_to_visit.append(normalized)
```
How can I resolve this? If you propose a fix, please make it concise.
Alek99
approved these changes
Oct 30, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.