Skip to content

ENG-8154: update check_dead_links script and fix broken links#1682

Merged
Alek99 merged 1 commit intomainfrom
carlos/dont-check-external-links
Oct 30, 2025
Merged

ENG-8154: update check_dead_links script and fix broken links#1682
Alek99 merged 1 commit intomainfrom
carlos/dont-check-external-links

Conversation

@carlosabadia
Copy link
Copy Markdown
Collaborator

No description provided.

@linear
Copy link
Copy Markdown

linear bot commented Oct 30, 2025

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Updated the dead link checker script to skip image URLs and added data: URI filtering. Fixed multiple broken documentation links by correcting template syntax, adding missing protocols, and updating internal paths.

  • Critical Issue: Logic bug in scripts/check_dead_links.py where external links are no longer checked at all (lines 160-167)
  • Fixed broken link in case-studies/ansa.md by adding missing https:// protocol
  • Corrected template syntax links in multiple documentation files to use direct URL paths
  • Updated pcweb/pages/gallery/apps.py to filter out reflex_build_templates/ from display and redirect to open_source_templates
  • Added URL redirects in pcweb/pcweb.py for legacy documentation paths

Confidence Score: 1/5

  • Critical logic bug prevents external link validation
  • The indentation change in scripts/check_dead_links.py causes external links to be completely skipped during validation. The original code checked all links (internal and external) but only queued internal links for crawling. Now only internal, non-image links are checked, breaking the tool's core functionality. All other changes are safe link fixes.
  • scripts/check_dead_links.py requires immediate fix before merge

Important Files Changed

File Analysis

Filename Score Overview
scripts/check_dead_links.py 1/5 Added image URL detection and data: URI filtering, but introduced critical logic bug that skips checking external links entirely
pcweb/pages/gallery/apps.py 5/5 Updated import, filtered reflex_build_templates from display, and fixed gallery link to open_source_templates path
pcweb/pcweb.py 5/5 Added redirects for legacy documentation paths to new locations

Sequence Diagram

sequenceDiagram
    participant User
    participant Script as check_dead_links.py
    participant Sitemap
    participant Pages as Internal Pages
    participant Checker as Link Checker
    
    User->>Script: Run script with base URL
    Script->>Sitemap: Request sitemap.xml
    Sitemap-->>Script: Return list of URLs
    
    loop For each URL in sitemap
        Script->>Pages: Crawl page
        Pages-->>Script: Return HTML with links
        Script->>Script: Extract all links
        Script->>Script: Filter out special URIs
        
        loop For each extracted link
            alt Is image URL
                Script->>Script: Skip checking (NEW)
            else Not image URL
                alt Is internal link
                    Script->>Checker: check_link(url)
                    Script->>Script: Add to pages_to_visit
                else Is external link
                    Note right of Script: BUG: External links<br/>no longer checked!
                end
            end
        end
    end
    
    Script-->>User: Report dead links found
Loading

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 160 to 167
for link in links:
self.check_link(link, url)
# Only check internal links and skip images
if self.is_internal_url(link) and not self.is_image_url(link):
self.check_link(link, url)

if self.is_internal_url(link):
normalized = self.normalize_url(link)
if normalized not in self.visited_pages:
self.pages_to_visit.append(normalized)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: external links are no longer being checked. Original code checked ALL links but only queued internal ones for crawling. Now external links are completely skipped.

Suggested change
for link in links:
self.check_link(link, url)
# Only check internal links and skip images
if self.is_internal_url(link) and not self.is_image_url(link):
self.check_link(link, url)
if self.is_internal_url(link):
normalized = self.normalize_url(link)
if normalized not in self.visited_pages:
self.pages_to_visit.append(normalized)
for link in links:
# Skip images for link checking
if not self.is_image_url(link):
self.check_link(link, url)
# Only queue internal, non-image links for crawling
if self.is_internal_url(link) and not self.is_image_url(link):
normalized = self.normalize_url(link)
if normalized not in self.visited_pages:
self.pages_to_visit.append(normalized)
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/check_dead_links.py
Line: 160:167

Comment:
**logic:** external links are no longer being checked. Original code checked ALL links but only queued internal ones for crawling. Now external links are completely skipped.

```suggestion
            for link in links:
                # Skip images for link checking
                if not self.is_image_url(link):
                    self.check_link(link, url)

                # Only queue internal, non-image links for crawling
                if self.is_internal_url(link) and not self.is_image_url(link):
                    normalized = self.normalize_url(link)
                    if normalized not in self.visited_pages:
                        self.pages_to_visit.append(normalized)
```

How can I resolve this? If you propose a fix, please make it concise.

@Alek99 Alek99 merged commit ad5e14a into main Oct 30, 2025
10 checks passed
@Alek99 Alek99 deleted the carlos/dont-check-external-links branch October 30, 2025 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants