Skip to content

Fetcher writes HTML 404 errors instead of skipping broken URLs #941

@claude-yolo

Description

@claude-yolo

Problem

The fetch script is writing HTML 404 error pages to markdown files when platform.claude.com returns 404s.

Evidence

PR #940 contained 250+ broken files (PHP/Terraform API docs) with HTML error content instead of markdown.

Impact

  • Pollutes documentation with broken content
  • Makes docs repo unusable for those sections
  • Search results become meaningless

Solution Needed

Fetcher should:

  1. Detect HTML error responses vs markdown content
  2. Skip writing files when upstream returns 404/errors
  3. Log which URLs failed for debugging
  4. Maybe retry mechanism for transient failures

Files Affected

  • content/en/api/php/* (125 files)
  • content/en/api/terraform/* (125 files)

cc @lroolle - this blocks clean doc updates until fixed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions