feat(analytics): filter bots and non-content endpoints from page-view tracking (#77) by x3ek · Pull Request #82 · xeek-dev/squishmark

x3ek · 2026-05-16T14:11:53Z

Summary

Stops the analytics middleware from recording bot traffic and non-content endpoints. Two filters added; existing path-prefix exclusions preserved.

Filters

Content-Type — must start with text/html. Cleanly excludes /robots.txt, /sitemap.xml, /feed.xml, /favicon.ico, /pygments.css without needing per-path allowlists.
User-Agent — pattern matches bot|crawler|spider|slurp|facebookexternalhit|curl|wget|python-requests|httpx case-insensitive. Missing UA is treated as a bot (real browsers always send one).

Middleware refactored to use early returns so the filter order is obvious: status → Content-Type → path prefix → User-Agent → track.

Test plan

python scripts/run-checks.py — 4/4 (format, lint, 209 tests, pyright)
is_bot_user_agent unit-tested against 12 known bots + 4 real browsers + missing/empty
Integration tests confirm /robots.txt, /health, and bot UAs don't trigger track_page_view

Scope note

Issue Implementation Notes suggest staging bot filtering as a follow-up. Bundling here because the title says "bot traffic AND non-content endpoints" and the Possible Approaches list "Combination of the above." See #77 comment.

Closes #77

🤖 Generated with Claude Code

… tracking The analytics middleware was recording every successful request, so /robots.txt, /sitemap.xml, /feed.xml, /favicon.ico, /pygments.css, and crawler hits all inflated view counts. Add two filters: 1. Content-Type must start with text/html — excludes XML, JSON, CSS, plain text, and image responses without needing per-path allowlists. 2. User-Agent must not look like a bot/crawler — pattern covers Googlebot, Bingbot, Baidu/Yandex, social card fetchers (Twitterbot, facebookexternalhit, Slackbot), and scripted clients (curl, wget, python-requests, httpx). Missing UA is treated as a bot since real browsers always send one. Refactor the middleware to use early returns so the order is obvious: status_code -> Content-Type -> path prefix -> User-Agent -> track. Existing path-prefix exclusions (/static, /admin, /health, /auth, /webhooks) are preserved. Tests cover is_bot_user_agent across known bots and real browsers, plus integration tests that the existing excluded paths still don't get tracked. Closes #77 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates SquishMark’s analytics middleware to avoid inflating page-view metrics by skipping bot traffic and non-content responses, aligning tracked “page views” more closely with real human HTML page loads.

Changes:

Added a bot User-Agent detection helper (regex-based) and used it in the analytics middleware.
Added a Content-Type gate (text/html only) and refactored middleware logic to use early returns for clearer filter ordering.
Added unit + integration-style tests intended to validate bot/non-HTML/non-content filtering behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/squishmark/main.py`	Adds bot UA detection + updates analytics middleware to early-return on non-200, non-HTML, excluded paths, and bot UAs before tracking.
`tests/test_analytics_filtering.py`	Adds tests for bot UA detection and middleware tracking suppression for selected endpoints/headers.

Copilot review on PR #82 flagged that test_bot_request_to_html_page_not_tracked used /health, which is filtered earlier by both Content-Type (JSON) and path prefix — so the test would pass even if the UA gate were removed. Register a stub /_test/html route on the test app (path chosen to avoid the /{slug} catch-all in pages.py) and assert the browser-UA hit IS tracked while the bot-UA hit is NOT. The bot UA filter now has a test that genuinely exercises it. Refs #77, #82 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

x3ek requested a review from Copilot May 16, 2026 14:12

Copilot started reviewing on behalf of x3ek May 16, 2026 14:12 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread tests/test_analytics_filtering.py

x3ek merged commit 821ee65 into main May 16, 2026
5 checks passed

x3ek deleted the feat/77-exclude-bot-traffic-analytics branch May 16, 2026 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(analytics): filter bots and non-content endpoints from page-view tracking (#77)#82

feat(analytics): filter bots and non-content endpoints from page-view tracking (#77)#82
x3ek merged 2 commits into
mainfrom
feat/77-exclude-bot-traffic-analytics

x3ek commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

x3ek commented May 16, 2026

Summary

Filters

Test plan

Scope note

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants