Skip to content

Add opt-in search indexing middleware#75

Open
crack-kitty wants to merge 1 commit into
mainfrom
allow-indexing-middleware
Open

Add opt-in search indexing middleware#75
crack-kitty wants to merge 1 commit into
mainfrom
allow-indexing-middleware

Conversation

@crack-kitty
Copy link
Copy Markdown
Contributor

Summary

This replaces the original kitty-test approach with a safer router-level opt-in model for search indexing.

The kitty-test branch tried to make indexing configurable by moving X-Robots-Tag out of default-headers into a separate noindex middleware, then adding per-service middleware env vars. That inverted OnRamp's existing safety model: routes that still used only default-headers, especially file-provider/external routes and other unconverted edge paths, could silently lose the noindex header and become indexable by default. It also made examples risky because replacing a middleware chain with only default-headers could drop functional middleware such as redirects, compression, buffering limits, or other route-specific behavior.

This PR keeps the existing architecture intact:

  • default-headers continues to send the current noindex X-Robots-Tag
  • Traefik entrypoint defaults still protect routes that do not define router-level middleware
  • allow-indexing-headers clears X-Robots-Tag only when explicitly appended after default-headers
  • Docker service routers get middleware env overrides whose defaults preserve current behavior
  • existing functional middleware defaults are retained, including MinIO gzip, Pi-hole redirects, and upload-size limits
  • scaffolded services default to default-headers@file and can opt in by appending allow-indexing-headers@file
  • docs cover the ordering requirement and warn against removing functional middleware
  • validation checks ensure services do not allow indexing by default and that websecure routers expose middleware overrides

Validation

  • python3 make.d/scripts/check-search-indexing.py
  • git diff --check
  • targeted yamllint on changed core YAML and representative service YAML

Notes

Full yamllint -c .yamllint services-available is currently blocked by a pre-existing duplicate healthcheck key in services-available/jellyfin.yml. This PR intentionally leaves Jellyfin unchanged so that can be handled as a separate main-branch cleanup.

Replace the original kitty-test approach with a safer router-level opt-in model for search indexing.

The kitty-test branch tried to make indexing configurable by moving the X-Robots-Tag header out of default-headers into a separate noindex middleware, then adding per-service middleware env vars. That inverted the existing safety model: any route that still used only default-headers, especially file-provider/external routes and other unconverted edge paths, would silently lose the noindex header and become indexable by default. It also made service examples risky because setting a middleware env var to only default-headers could drop functional middleware such as redirects, compression, buffering limits, or other route-specific behavior.

This change keeps the existing architecture intact: default-headers continues to send the current noindex X-Robots-Tag; Traefik entrypoint defaults still protect routes that do not define router-level middleware; a new allow-indexing-headers file-provider middleware clears X-Robots-Tag only when explicitly appended after default-headers; and Docker service routers get env-var middleware overrides whose defaults preserve their current behavior.

Existing functional middleware defaults are retained, including MinIO gzip, Pi-hole redirects, and upload-size limit middleware. New scaffolded services default to default-headers@file and can opt in by appending allow-indexing-headers@file. Documentation covers the ordering requirement and warns against removing existing functional middleware, while validation checks ensure services do not allow indexing by default and that websecure routers expose middleware overrides.

This preserves OnRamp's default private/noindex posture while giving users an explicit, per-service escape hatch for public services that should be searchable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant