Skip to content

Story 2451: AI-Assisted Description for Link Posts type#2477

Open
javiercoronadonarvaez wants to merge 110 commits into
developfrom
javiercoronarv/2451-ai-assisted-description-link-post
Open

Story 2451: AI-Assisted Description for Link Posts type#2477
javiercoronadonarvaez wants to merge 110 commits into
developfrom
javiercoronarv/2451-ai-assisted-description-link-post

Conversation

@javiercoronadonarvaez

@javiercoronadonarvaez javiercoronadonarvaez commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Issue: #2451

Summary & Context

Adds an Auto-Generate Description flow to the Link post type for any public web URL: the backend fetches the page, extracts the main article text (title + body) with trafilatura, and runs it through the same summarizer used by Blog/News so the user gets an editable draft instead of a blank Description field.

  • Figma link: N/A
  • Link to components/page: localhost:8000/news/add (select Link as the post type)

⚠️ Built on top of #2473.
You'll need to update your local .env so the summarization endpoint can reach OpenRouter:

  • OPENROUTER_API_KEY=<key> (required)
  • SUMMARIZATION_MODEL=<model> (optional, defaults to gpt-oss-120b — see config/settings.py:658)

Changes

  • New endpoint POST /v3/news/generate-link-description/ that accepts any public http(s) URL, fetches the page (10s timeout) through an SSRF-guarded helper, extracts title + body, and returns a summarized description as JSON. Three distinct failure modes (invalid/non-public URL → 400, fetch/extraction failure → 502, summarization failure → 502) map to separate inline messages.
  • New helper extract_article in news/helpers.py: trafilatura isolates the main article and strips boilerplate (navigation, footers, comments, ads), keeping the summarizer focused and cheaper than feeding it the whole page. Falls back to a naive visible-text dump (extract_content) when trafilatura can't parse a page, and returns ("", "") only when even the fallback is empty. (Replaces the previous cppalliance.org-only extract_cppalliance_post CSS selectors.)
  • New SSRF-safe fetch safe_get in news/helpers.py, used by both the endpoint and the background set_summary_for_link_entry task: rejects non-http(s) schemes and hosts resolving to loopback/private/link-local/reserved IPs (e.g. 127.0.0.1, 10.x, 169.254.169.254), and follows redirects manually so each hop's host is re-validated.
  • Background set_summary_for_link_entry task now uses trafilatura extraction + the SSRF guard, and skips gracefully (logs and returns) when no readable text can be extracted.
  • Link post type mirrors Blog/News' Description UX: Auto-Generate button (enabled once the link field holds a valid http(s) URL), 1000-char counter, Saving/Saved indicator, per-post-type localStorage draft, inline help (hidden once the URL is valid) / error states, and the same spinner + "generating…" placeholder shown while the request is in flight.
  • description field is persisted to Link.summary on submit (same path Blog/News already use), so the generated text is available immediately on the post page without waiting for the summary_dispatcher task.

‼️ Risks & Considerations ‼️

  • Extraction is not universal: trafilatura is a solid, widely-used main-content extractor, but no extractor handles every page. JS-rendered SPAs (content not in the fetched HTML), paywalled pages, anti-bot blocks, and non-article URLs (PDFs, etc.) won't yield usable text. When extraction comes back empty the user sees the inline "We couldn't read that link" error and can still write the description manually — failure is graceful, not a crash.
  • SSRF surface: removing the cppalliance.org allowlist means the server now fetches arbitrary user-supplied URLs. This is mitigated by safe_get, which blocks loopback/private/link-local/reserved targets (including the cloud-metadata address 169.254.169.254) and re-validates each redirect hop. Known residual gap: DNS rebinding (a host that resolves to a public IP at check time and a private one at connect time) is not closed — documented in safe_get; fully closing it would require pinning the validated IP into the actual connection.
  • Endpoint is not login-gated and not rate-limited yet (matches the existing generate_description endpoint — flagged in a NOTE: in news/views.py). Both should get @login_required + throttling before this is exposed externally — more important now that the fetch target is fully user-controlled.
  • Synchronous outbound fetch from a web worker with a 10s timeout — a hung upstream still ties up a worker for up to 10s.
  • Stacked on Story 2428: AI-Assisted Description for Blog and News Posts #2473: the diff against develop includes that PR's work; review against the parent branch (javiercoronarv/2428-ai-assisted-desc-blog-news) for a clean view of the changes unique to this PR.

Screenshots

Persistence and Functionality

Functionality.Persistence.mov

Dimensionality and Dark/Light Modes

Dimensionality.DarkLightMode.mov

Self-review Checklist

  • Tag at least one team member from each team to review this PR
  • Link this PR to the related GitHub Project ticket

Frontend

  • UI implementation matches Figma design
  • Tested in light and dark mode
  • Responsive / mobile verified
  • Accessibility checked (keyboard navigation, etc.)
  • Ensure design tokens are used for colors, spacing, typography, etc. – No hardcoded values
  • Test without JavaScript (if applicable)
  • No console errors or warnings

Summary by CodeRabbit

  • New Features

    • AI-powered description endpoints for posts and external links
    • Per-post-type draft persistence in the create-post form
    • WYSIWYG editor bridge emitting live updates and accepting programmatic content
  • Improvements

    • Summary field added to post forms with live character counters and generate/save-state UI
    • Safer external link fetching and more robust main-content extraction
    • Configurable AI model settings and input-size guardrails
  • Chores

    • Tooling and dependency updates; formatting hook bumped

@javiercoronadonarvaez javiercoronadonarvaez force-pushed the javiercoronarv/2451-ai-assisted-description-link-post branch 2 times, most recently from f72233e to e57a705 Compare June 3, 2026 15:23
@javiercoronadonarvaez javiercoronadonarvaez changed the title Javiercoronarv/2451 ai assisted description link post Story 2451: AI-Assisted Description for Link Posts type Jun 3, 2026
@javiercoronadonarvaez javiercoronadonarvaez linked an issue Jun 3, 2026 that may be closed by this pull request
@javiercoronadonarvaez javiercoronadonarvaez force-pushed the javiercoronarv/2451-ai-assisted-description-link-post branch from eda4415 to ebd76ef Compare June 4, 2026 19:38
- Add description box alongside 'Saving' + Saving Icon section
- Does not display anything if the user hasn't started typing
- Display Saving text + Icon while user types and display Saved text + Icon once he's finished typing
- Corroborate auto generate description works with wysiwyg component
- Fix in accordance to GitHub Actions failure
…ated

- Add 'Hold on! We are generating a description for your content, it may take a few seconds' as placeholder in description box
…ry.summary

- Persist user-typed summary
- Skip background regeneration when set
- Auto-Generate button disabled unless a properly composed URL is input
javiercoronadonarvaez and others added 19 commits June 22, 2026 10:44
…ry.summary

- Persist user-typed summary
- Skip background regeneration when set
- Truncation to comply with 1000 character limit from Design templates
- Optimize prompt
- Update pre commit config file to match black's version in requirements (26.1.0)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@javiercoronadonarvaez javiercoronadonarvaez force-pushed the javiercoronarv/2428-ai-assisted-desc-blog-news branch from 5809a96 to c1728bf Compare June 22, 2026 16:44

@herzog0 herzog0 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking and working pretty good!! Thanks a ton @javiercoronadonarvaez

@javiercoronadonarvaez javiercoronadonarvaez force-pushed the javiercoronarv/2428-ai-assisted-desc-blog-news branch from 3567de4 to 8143846 Compare June 23, 2026 14:01
@javiercoronadonarvaez javiercoronadonarvaez changed the base branch from javiercoronarv/2428-ai-assisted-desc-blog-news to develop June 23, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task: AI-Assisted Description for Link Posts type

3 participants