Skip to content

Enhance extraction feature and improve login flow#7

Open
corovcam wants to merge 2 commits into
leonid-shevtsov:mainfrom
corovcam:fixes-and-collection-extract-feature
Open

Enhance extraction feature and improve login flow#7
corovcam wants to merge 2 commits into
leonid-shevtsov:mainfrom
corovcam:fixes-and-collection-extract-feature

Conversation

@corovcam
Copy link
Copy Markdown

@corovcam corovcam commented May 30, 2026

  • Fixed login flow
  • Added extract command to extract JSON and Markdown pairs by collection title.
  • Introduced extractByCollectionTitle function for handling extraction logic.
  • Updated CLI to support extraction command with input and output options.
  • Improved login flow to handle session checks more robustly.
  • Updated README with usage instructions for the new extraction feature.
  • Modified .gitignore to include JSON and Markdown files.

Credits: #6 for fixing bugs

osedlacek and others added 2 commits May 3, 2026 03:05
The upstream stopped working sometime after July 2025 because Perplexity
changed several DOM selectors, the login flow, and the per-thread API
pagination defaults.

Changes:

- login.ts: multi-selector cookie banner (EN + CS variants); verify login
  via /api/auth/session poll instead of waiting for #ask-input (which is
  rendered to logged-out users too); explicit instructions to use the
  6-digit code and not the magic link in the email (the magic link logs
  in the user's regular browser, not the Puppeteer-controlled one).

- listConversations.ts: replaced data-testid based DOM scrape (which only
  saw the ~20 sidebar threads) with observe-and-replay of the
  /rest/thread/list_ask_threads POST. Walks offset to enumerate the full
  archive.

- ConversationSaver.ts: replaced response-listener based capture with a
  direct API call to /rest/thread/<uuid>?limit=1000. Paginates via
  has_next_page + offset. The original captured the SPA's natural
  request which used limit=10, silently truncating any thread with >10
  turns. ~10x faster (no per-thread page navigation) and avoids
  detached-frame errors.

- exportLibrary.ts: try/catch per conversation with page-recreation
  recovery on detached-Frame / Target-closed / Session-closed errors.
  Cookies persist on the browser context so no re-login during recovery.
  Render errors are caught per-thread (JSON saved even if MD rendering
  fails).

- README: fork notice up top describing what changed.
- Added command to extract JSON and Markdown pairs by collection title.
- Implemented `extractByCollectionTitle` function for file handling.
- Updated CLI to support extraction command.
- Improved login process to handle session checks more robustly.
- Added support for additional cookie banner acceptance options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants