Skip to content

feat: add devto article scraping PoC snippet#377

Open
ewbenigno wants to merge 2 commits into
he4rt:4.xfrom
ewbenigno:feat/article-scraping-poc
Open

feat: add devto article scraping PoC snippet#377
ewbenigno wants to merge 2 commits into
he4rt:4.xfrom
ewbenigno:feat/article-scraping-poc

Conversation

@ewbenigno

@ewbenigno ewbenigno commented Jun 28, 2026

Copy link
Copy Markdown

Closes #181

Implementado snippet de DOM scraping para extração de Reactions History da página /stats do dev.to, conforme especificado na issue.

O script extrai: type, username, userProfileUrl, userAvatarUrl e date de cada reação, monta JSON estruturado e copia pro clipboard.

Documentado em docs/superpowers/specs/ seguindo o padrão do projeto.

Edge cases tratados:

  • Artigo sem reações / sem seção "Reactions History" → ainda gera JSON estruturado (reactions: [], summary: {})
  • Falha ao copiar automaticamente (foco fora da página) → fallback via execCommand('copy'), com print do JSON no console como última opção
  • Execução fora da página /stats → erro explicativo, sem quebrar

Testado manualmente:

  • Artigo com 10 reações de múltiplos usuários — extração e clipboard OK
  • Artigo recém-criado sem reações — edge case validado, JSON estruturado correto
  • Clipboard com fallback funcionando mesmo com foco no DevTools

@BrunaDomingues BrunaDomingues left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image só não ta realizando o copy do resultado. Não sei se era pra revisar essa PR mas ta ae meus 5 centavos de contribuição

…ack de clipboard

- articleUrl agora remove /stats conforme issue he4rt#181
- articleSlug usa filter(Boolean).pop() apos remover /stats
- reactionType extraido via split('\n')[0] em vez de querySelector('strong')
- edge case de artigo sem reacoes agora retorna JSON estruturado
- adicionado fallback de clipboard via execCommand quando foco esta no DevTools
@ewbenigno ewbenigno marked this pull request as ready for review June 30, 2026 15:08
@ewbenigno ewbenigno requested a review from a team June 30, 2026 15:08
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

A new PoC spec file is added documenting a browser-console JavaScript snippet for extracting "Reactions History" data from a dev.to article's /stats page. The doc covers the execution context, the full extraction script (DOM traversal, clipboard copy with fallback, structured JSON output), edge cases (missing reactions section, clipboard focus errors, wrong page), and an example output JSON.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: a dev.to article scraping PoC snippet.
Description check ✅ Passed The description matches the documented dev.to Reactions History scraping snippet and its edge cases.
Linked Issues check ✅ Passed The PoC aligns with #181 by running on /stats, extracting reaction data, producing JSON, and handling missing section and copy fallback.
Out of Scope Changes check ✅ Passed The change appears limited to the requested documentation PoC snippet with no unrelated additions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/superpowers/specs/2026-06-27-devto-article-scraping-poc.md (1)

73-75: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Use .includes() for heading match.

Exact equality breaks if dev.to adds whitespace or icons. h.textContent.trim().includes('Reactions History') is more resilient.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-06-27-devto-article-scraping-poc.md` around lines
73 - 75, The heading lookup in the DOM scan is too strict and can miss the
target when dev.to adds extra text, whitespace, or icons. Update the logic in
the header iteration that assigns rhHeader to use a resilient match with
includes() on h.textContent.trim() instead of exact equality, while keeping the
existing Reactions History check in the same heading-selection flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/specs/2026-06-27-devto-article-scraping-poc.md`:
- Around line 19-24: The page check in the self-invoking script only validates
the pathname, so it can run on non-dev.to domains. Update the validation in the
top-level IIFE to also check window.location.hostname includes dev.to before
proceeding, keeping the existing /stats path guard in place.

---

Nitpick comments:
In `@docs/superpowers/specs/2026-06-27-devto-article-scraping-poc.md`:
- Around line 73-75: The heading lookup in the DOM scan is too strict and can
miss the target when dev.to adds extra text, whitespace, or icons. Update the
logic in the header iteration that assigns rhHeader to use a resilient match
with includes() on h.textContent.trim() instead of exact equality, while keeping
the existing Reactions History check in the same heading-selection flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 9c32960e-dfab-4da9-8ec6-7a9c42db9b0f

📥 Commits

Reviewing files that changed from the base of the PR and between 0aa31c6 and 9eb8c51.

📒 Files selected for processing (1)
  • docs/superpowers/specs/2026-06-27-devto-article-scraping-poc.md

Comment on lines +19 to +24
(() => {
// Validação: está na página certa?
if (!window.location.pathname.endsWith('/stats')) {
console.error('❌ Execute este script na página /stats de um artigo do dev.to');
return;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Incomplete page validation.

endsWith('/stats') allows any domain. Add window.location.hostname.includes('dev.to') to ensure this runs only on dev.to.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-06-27-devto-article-scraping-poc.md` around lines
19 - 24, The page check in the self-invoking script only validates the pathname,
so it can run on non-dev.to domains. Update the validation in the top-level IIFE
to also check window.location.hostname includes dev.to before proceeding,
keeping the existing /stats path guard in place.

@BrunaDomingues BrunaDomingues left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(integration-devto): article scraping PoC

2 participants