Skip to content

Insights: mentions monitoring + alternate analytics sources #655

@lbedner

Description

@lbedner

Summary

Expand the insights service with two new external data tracks:

  1. Mentions monitoring — discover where the project is being talked about across the developer web (blog posts, forums, Q&A sites, social).
  2. Alternate analytics sources — let Pulse pull docs traffic from GA4 in addition to Plausible, so users on either platform get the same dashboard.

The mentions feature alone would have surfaced the johnny233 CSDN article weeks earlier — that's the selling point.

Mentions monitoring — API reality

Platform API Auth Notes
Google Alerts No official API RSS via google-alerts PyPI (cookie scrape) Hacky but works. Set up alerts for keywords, poll the RSS feed.
Reddit Official OAuth2 Collector already exists. Search /r/all for keywords periodically.
Hacker News Algolia search None https://hn.algolia.com/api/v1/search?query=<kw> — free, no auth.
Dev.to Official API key Search articles endpoint.
Hashnode GraphQL None for public Query posts by tag/keyword.
Stack Overflow Official API key (free) Find questions mentioning the package.
GitHub Already have PAT Cross-repo code/discussion/issue search.
X/Twitter Official OAuth2, paid tiers Recent-tweet search.
Medium No public search API Scraping only Skip for v1 — not reliable.
CSDN, Tencent Cloud, Juejin, Zhihu No official API Scraping required Defer to v2 (Chinese platforms).

Priority for v1

  1. Google Alerts RSS (covers most of what Google indexes, including CSDN articles).
  2. Hacker News Algolia (one HTTP call, no auth).
  3. Stack Overflow search.
  4. GitHub cross-repo code/discussions search.
  5. Reddit (extend existing collector with mentions search).

That gets ~80% of English-language mentions with minimal effort.

Alternate analytics — GA4 collector

GA4 has a real API: Google Analytics Data API v1beta, Python package google-analytics-data. Auth is a GCP service-account JSON.

Mapping to the metrics already displayed:

Pulse view GA4 metric / dimension
Visitors activeUsers
Pageviews screenPageViews
Pages pagePath
Countries country
Sources sessionSource
Avg duration averageSessionDuration
Bounce rate bounceRate

Setup is heavier than Plausible (GCP project + service account) but it's free and matches what most users already have. Pulse would treat it as a sibling collector — same insights tables, same dashboard, different source under the hood.

Acceptance criteria

  • mentions collector(s) added behind feature toggles for each platform.
  • New mentions insight surface in the Overseer (or Pulse) — list view + per-platform filter.
  • GA4 collector implemented as an alternate to Plausible, configured via service-account JSON env var.
  • Existing analytics dashboard renders correctly when GA4 is the source.
  • Tests for each new collector with mocked API responses.

Out of scope

  • Chinese platforms (CSDN, Juejin, Zhihu, Tencent Cloud) — v2.
  • Medium scraping — not reliable enough.
  • Sentiment analysis on mentions — v2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendinsightInsight service — adoption metrics and analyticsservice

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions