Skip to content

Final submission#42

Merged
julian-praesent merged 66 commits into
mainfrom
dev
May 31, 2026
Merged

Final submission#42
julian-praesent merged 66 commits into
mainfrom
dev

Conversation

@julian-praesent

Copy link
Copy Markdown
Collaborator

Implemented, reviewed and tested everything. It's time to submit this project

julian-praesent and others added 30 commits May 20, 2026 17:35
Replace the QDateEdit-based date inputs on screen 1 with a fully custom
DatePicker widget (QPushButton display + floating QCalendarWidget popup).
The new widget opens the calendar on any click, enforces optional min/max
bounds, emits dateChanged, and matches the app's dark minimal theme.

screen_search.py wires start ≤ end constraints in both directions so
neither picker can be set to an invalid range.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Display the package version (from importlib.metadata) in the title bar
with accent colour; increase step-indicator top margin and title bar
height for better visual breathing room.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add SCROLLBAR_STYLE to theme.py — thin 8px track, rounded 4px handle,
orange accent tinted at rest / brighter on hover / full on press, arrow
buttons hidden. Applied globally via app.setStyleSheet() in main.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Output folder is set on screen 2 (Download settings), so the redundant
folder card on ScreenSearch has been removed. Drops the associated
_browse_folder slot, unused imports (Path, QFileDialog, QSizePolicy),
_FOLDER_INPUT_STYLE, _BROWSE_BTN_STYLE constants, and the output_folder
assignment in get_params(). Removes the now-dead prefill block in
ScreenPreview.load() and the five tests that covered the removed behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add _make_param_line() with setWordWrap(True) + stretch factor 1 so long
values reflow as the window is resized. License is placed inline on row 1
when its joined text is ≤ 40 chars (typical: "CC-BY"); falls back to its
own wrapping row when many licenses are selected. Publication types always
get their own wrapping row. Author ORCIDs stay on their own row too.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
set_progress() was stopping the animation timer, so the 4 thread dots
were always static. Now the timer stays alive in progress mode too.
Thread dots use the same _DOT_BRIGHT/_DOT_DIM pair as loading dots;
_tick_dots() routes each tick to whichever row is currently visible.
Removed the unused _THREAD_DOT_STYLES static-opacity list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Percentage now reflects downloaded vs requested target (self._total)
instead of downloaded vs processed. Added processed: int | None param
to ProgressWidget.set_progress() so the count label can show
"downloaded X out of Y  –  processed Z results". Both _on_progress
and _on_finished pass self._total as the denominator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
estimated_downloadable is the raw count from the ~10 preview papers, not the
full result set. Displaying it as a raw number ("892") was misleading. Now
shown as a percentage of the previewed batch (e.g. "~80%") with updated
tile subtitle "of previewed results".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thread count was hardcoded to MAX_WORKERS (4). Now _download_batch tracks
active threads with an atomic counter, stamps the live count onto each
DownloadResult, and screen 3 reads result.active_threads so the dot
indicators reflect the true number of concurrently running threads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dots next to "X threads running" are now a fixed 3-dot animation purely
as a visual indicator. Previously they showed 1–4 dots matching the live
thread count, which was confusing. The label still reflects the true count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Validate ORCID format (regex + ISO 7064 MOD 11-2 checksum) and
  existence (async pub.orcid.org registry check) on Screen 1.
  Pills show three states: pending (grey), valid (orange), invalid (red).
  Fail-open on network errors; URL-prefixed ORCIDs are normalised
  automatically. New: OrcidClient, OrcidValidationService, orcid_utils.

- Fix pub-type query clause: prepend Europe PMC catch-all
  'HAS_BOOK:Y OR (SRC:(MED OR PMC OR AGR OR CBA) NOT PUB_TYPE:(Review))'
  so regular journal articles are not silently excluded. PUB_TYPE values
  are now lowercase with correct API spellings (e.g. 'observational
  study, veterinary').

- Fix tag_input: hide pills before setParent(None) to prevent Qt
  promoting visible widgets to top-level windows on status update.

- Update docs: README (F2 spec, Python API), FUTURE_TODOS, __init__.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Action bar hint label (Screens 1 & 2):
- Add a single QLabel to the action bar of ScreenSearch and ScreenPreview
  that explains why the primary button is disabled. Exactly one message is
  shown via a priority cascade even when multiple conditions are violated
  simultaneously. The label is hidden as soon as the button becomes enabled.

ORCID query filtering:
- Add TagInput.get_tags_by_status(statuses) — returns tags in insertion
  order whose status is in the provided list. Allows callers to exclude
  tags by validation state without duplicating status logic.
- ScreenSearch.get_params() now passes get_tags_by_status(['valid', 'pending'])
  for author_orcids, so format-invalid and registry-rejected (red) ORCIDs
  are silently dropped from the query. Pending (grey, network-unreachable)
  ORCIDs are still included to preserve fail-open behaviour.
- 8 new unit tests covering set_tag_status and get_tags_by_status.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Validation:
- License field now gated: at least one license required to enable
  'Continue to preview' (spec compliance — was not enforced before)
- Output folder writability checked via os.access after Browse dialog;
  non-writable folder blocks 'Start download' with an explanatory hint
- Invalid ORCIDs (red pills) excluded from search query; pending (grey)
  ORCIDs still included (fail-open)
- SearchParams.output_folder changed to Path | None = None; DownloadService
  raises ValueError if called without an explicit folder

UI:
- _validate() on Screen 1 refactored to 3-condition list-join hint:
  'Enter a search keyword  ·  Select a publication type  ·  Select a license'
- _validate() on Screen 2 extended with writability condition and matching hint
- 'Count' label renamed to 'Number of papers' on Screen 2
- Query field max length set to 500 characters
- Date From minimum changed from 2000-01-01 to 1900-01-01

Docs:
- README F1/F2/F3/F5 updated to reflect all validation changes
- Vignette Screen 1/2 callouts and Maria's scenario updated accordingly

Tests: 17 new tests covering all new validation paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace all QMessageBox popups with toast overlay (elevated card,
  coloured left/right stripe, icon badge, 10 s hold)
- Export dialogs default to 'results.xlsx' / 'results.pdf'
- Download settings card moved above results preview on screen 2
- Results preview, live log, and skipped papers sections now fill
  remaining screen height (shared EXPANDABLE_MIN_HEIGHT in theme)
- Search progress card mirrors download screen layout and sizing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Critical: guard resizeEvent in ScreenDownload and ScreenSummary against
  early access to self._toast before _build_ui() sets it; initialize to None
- Moderate: rename Toast._reposition() to public reposition() and update all callers
- Moderate: pass progress widget explicitly to _make_loading_card() instead of
  relying on implicit self._progress closure
- Moderate: truncate export path to filename in toast to prevent overflow
- Minor: replace QWidget.__new__ antipattern in test_reposition_noop_without_parent
  with proper setParent(None) approach
- Minor: document DEFAULT_WIDTH/DEFAULT_HEIGHT change rationale in app.py
- Minor: expand EXPANDABLE_MIN_HEIGHT comment in theme.py
- Minor: add test for loading card visible while search is in progress
- Minor: add intent comment to test_hold_durations_are_both_10_seconds

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three bugs caused the UI to freeze at '2 threads running' after cancel:

1. client.py: download_pdf now accepts cancel_event; checks it before each
   retry, replaces time.sleep with cancel_event.wait so the sleep is
   interruptible; also catches requests.exceptions.Timeout (previously
   unhandled, causing it to bubble up as an unrecoverable worker error)

2. download_service.py: cancel_event is threaded through _download_one and
   forwarded to download_pdf; ConnectionError raised while the event is set
   becomes STATUS_SKIPPED 'Cancelled' instead of STATUS_FAILED

3. screen_download.py: _on_error now calls set_progress before showing the
   toast so the thread row is cleared when the worker exits via exception

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
QVBoxLayout with no trailing stretch distributes items evenly when
QScrollArea expands the container widget to fill the viewport height.
This caused the first few download rows to appear spread out instead
of stacking from the top.

- Add trailing addStretch() to the log layout on construction so rows
  always stack from the top regardless of container height
- Change _add_log_row to insertWidget(count-1) to insert rows before
  the stretch rather than after it
- Restore the stretch in _clear_log after removing all row widgets

Added _log_row_count() test helper that counts only QWidget items in
the layout, correctly ignoring the non-widget stretch spacer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rate-limiting (HTTP 429)
- download_pdf() now treats 429 as retryable (was incorrectly grouped
  with non-retryable 4xx errors); retries up to _PDF_MAX_RETRIES times
  using the Retry-After header value when present, falling back to the
  same 1 s / 2 s exponential schedule used for 5xx errors
- search() gains a 3-attempt retry loop handling 429 and transient
  ConnectionError / Timeout with exponential backoff
- _parse_retry_after() helper extracts the Retry-After delay from a
  response, returning None when the header is absent or non-numeric
- MAX_WORKERS reduced from 4 to 2 to halve the concurrent PDF request
  rate and reduce the likelihood of triggering rate limits
- Exhausted 429 errors surface to the user as 'Server rate limiting'
  rather than the raw status code '429'

PDF content validation
- Added InvalidPdfContentError exception to client.py; raised by
  download_pdf() whenever a 200 response body does not start with the
  %PDF magic bytes (e.g. an HTML bot-challenge page)
- _download_one() catches InvalidPdfContentError and returns
  STATUS_SKIPPED with reason 'Not a valid PDF', preventing corrupted
  HTML files from being written to disk and reported as downloaded

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Integration tests previously made live HTTP calls to the Europe PMC API
on every CI run, causing non-deterministic failures from rate limits and
network timeouts.

pytest-recording (vcrpy wrapper) records real HTTP conversations once as
YAML cassettes (tests/integration/cassettes/) and replays them in CI
without any network access. --block-network is added to the CI workflow
so accidental live calls fail fast rather than timing out silently.

- pyproject.toml: add pytest-recording to dev dependencies
- tests/integration/conftest.py: configure cassette directory
- tests/integration/test_*.py: replace per-class @pytest.mark.integration
  decorators with module-level pytestmark combining vcr + integration;
  update docstrings to reflect cassette workflow
- tests/integration/test_client_integration.py: also catch
  InvalidPdfContentError in the PDF download loop (added after tests
  were written) so the graceful-skip path still works when URLs return
  HTML challenge pages
- tests/integration/cassettes/: 32 recorded cassettes covering every
  integration test; search responses are small JSON, download responses
  include PDF bytes
- .github/workflows/ci.yml: remove -m "not integration" filter and add
  --block-network; all 592 tests now run in ~7 s with no live API calls
- README.md: update testing section with new commands and an explanation
  of the cassette recording/replay approach

To re-record cassettes after an API change:
  pytest -m integration --record-mode=all

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the CSV-style landscape table with a structured dark-themed
portrait PDF matching the app's visual style:

Layout sections (portrait A4, APP_BG page background):
- Header: 'epmcminer' in accent orange + generated timestamp
- Stat row: three side-by-side cards — Downloaded (orange), Skipped
  (red), Total results (orange) — on dark card backgrounds
- Search parameters card: Query, Sort, Date always shown; License,
  Publication types, and Author ORCIDs appended when non-empty
- Downloaded papers: one card per successfully downloaded paper showing
  title, authors/journal/year, and filename; omitted when empty
- Skipped papers: one card per non-downloaded paper showing title,
  meta, and reason in red; omitted when all papers downloaded

Add total_found: int = 0 parameter to export_pdf() so the PDF can
display the full API result count. Store _total_found in ScreenSummary
and pass it through from load() to the export worker lambda.

Add 10 new TestExportPdf tests covering: all-downloaded, all-skipped,
mixed results, total_found variations, mixed filters, empty metadata,
both sections present/absent, and downloaded paper without file_path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Consolidate retry constants:
- Merge _PDF_MAX_RETRIES/_SEARCH_MAX_RETRIES into _MAX_RETRIES (both were 3)
- Merge _PDF_RETRY_BACKOFF_BASE/_SEARCH_RETRY_BACKOFF_BASE into
  _RETRY_BACKOFF_BASE (both were 1); prevents silent divergence

Simplify download_service page-result accumulation:
- Replace for-loop with extend() + sum() comprehension

Fix report_service trailing spacer:
- Use enumerate() to skip Spacer(1, 3) after the last paper card in
  both downloaded and skipped sections
- Fix _build_pdf_story return type annotation: list -> list[Any]

Fix screen_download type safety and silent CSV failure:
- Assert output_folder is not None before passing to save_csv()
- Show 'Could not save report.csv' toast when save_csv raises so the
  failure is visible to the user instead of only logged

Add missing test coverage:
- test_search_429_non_numeric_retry_after_falls_back_to_backoff:
  covers the except ValueError branch in _parse_retry_after (lines 44-45)
- test_5xx_cancel_during_backoff_aborts_retries: covers the
  cancel_event.wait path inside the 5xx retry branch (lines 274-275)
- TestOrcidExistenceWorker: 3 tests for run() (validation_done true/false
  and network_error); covers lines 122-132
- TestOrcidTagValidation: 7 tests for _on_orcid_tags_changed logic
  (URL normalisation, format-invalid, pending, valid, invalid, network
  error, no-service early-return); covers lines 413-446 and 455-469

Coverage: 94% -> 96% overall; screen_search 84% -> 99%;
client.py 96% -> 100%

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ooling

Fixes three high-severity bugs: assert → raise ValueError for output_folder
validation (disabled under -O), OrcidClient now catches Timeout alongside
ConnectionError (fail-open design), and bare except in _on_finished now names
the exception variable and logs the full traceback.

Extracts paper_from_raw() helper to eliminate three copy-pasted Paper
constructions across search and download services. Removes the duplicate
FREE_FULL_TEXT_FILTER that build_query was appending (the client already
enforces it on every request); integration cassettes re-recorded against the
corrected single-filter URLs.

Centralises status colours (SUCCESS, DANGER, SKIPPED_BG) in theme.py, moves
the QFrame inline import to module level in screen_preview.py, and guards
_InputSlot._confirm() against emitting on empty text.

Adds ruff rules B/C4/UP/SIM, mypy configuration, and --block-network as the
default pytest addopt to prevent accidental live API calls in tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…add date validation

- Relocate DownloadResult from api/ to services/ (it is a service output, not an HTTP model)
- Delete api/models.py (redundant re-export module)
- Fix active_threads counter: capture thread count before decrement so it includes the finishing thread
- Add ISO-8601 date validation and date ordering check to SearchParams.__post_init__
- Fix PreviewWorker cancellation: replace quit()/wait() with threading.Event cancel_event so the main thread is not blocked waiting for an in-flight HTTP request
- Replace 8-tuple return from _make_stats_row with _StatsWidgets NamedTuple
- Update all import paths across services, tests, and integration tests
- Re-record 32 integration cassettes after query filter deduplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trings

Adds Google-style Examples blocks to preview(), download(), save_csv(),
export_excel(), export_pdf(), search(), and download_pdf() to satisfy the
rubric requirement that all functions be documented with useful examples.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Service/API layer (real type safety improvements):
- Narrow DownloadResult.STATUS_* ClassVars to Literal types — lets mypy
  verify callers pass valid status strings
- Add dict[str, Any] type arguments throughout client, search_service,
  and download_service (raw API result dicts)
- Add list[Any] type arguments to report_service private helpers
  (reportlab flowables have no common base type)
- Cast response.json() and response.content to concrete types so mypy
  can verify return types without losing information
- Wrap OrcidClient bool comparison in bool() to resolve no-any-return
- Annotate app._on_download_complete with list[DownloadResult]

GUI layer (PyQt6 stub compatibility):
- Change resizeEvent/paintEvent signatures to accept event | None to
  match PyQt6 stubs and eliminate override errors in date_picker,
  screen_summary, screen_download, and toast
- Add type: ignore[override] on mouse event handlers (body uses event
  directly and cannot safely accept None)
- Annotate FlowLayout.addItem/itemAt/takeAt to match QLayout stubs;
  add None guard in addItem since Qt never passes None in practice
- Add type: ignore[union-attr] for Qt methods that return Optional
  types but are guaranteed non-None at call sites

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Auto-fix all D413 (blank line after last docstring section) and COM812
  (trailing comma) violations across the entire codebase via ruff
- Add noqa: DTZ005 comment to logger.py explaining intentional local time use
- Remove dead ensure_output_structure() from file_utils.py (inline mkdir
  in DownloadService covers this); drop its four unit tests
- Fix cancellation latency in _download_page: submit futures one at a time
  with a cancel_event check before each, so a pending cancellation stops
  queuing new work immediately rather than after a full batch is submitted
- Show a toast notification in screen_search.py when ORCID registry check
  fails due to a network error (was silently leaving the pill in pending)
- Add docs/architecture.md: comprehensive architecture overview for the project

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
julian-praesent and others added 29 commits May 31, 2026 09:27
Insert three pre-executed code cells (query builder, filename construction,
ORCID validation) that demonstrate the Python API without requiring network
access. Also correct factual errors in the prose cells (Python version,
screen descriptions).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Re-recorded all 32 integration test cassettes with --record-mode=all.
All tests pass against the updated responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…response

The Europe PMC core response nests the journal name under
journalInfo.journal.title, not a top-level journalTitle field, so the
journal column was always empty. Updated paper_from_raw() and the test
fixtures accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ling

Non-429 HTTP 4xx/5xx failures now report the reason as 'PDF unavailable'
instead of the raw numeric status code. Results carrying that reason are
rendered with red ✗ styling in the download log (screen 3) and the skipped
papers list (screen 4), consistent with other failure types. True skips
(already downloaded, cancelled) keep the orange – styling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tation

Add a dedicated subsection for each service (SearchService, DownloadService,
ReportService, OrcidValidationService) with method signatures, parameter
descriptions, and runnable examples. Add a SearchParams field table and a
data model reference (Paper, SearchResult, DownloadResult) including all
status constants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vignette: correct flowchart so preview results flow through PreviewWorker
rather than directly from the API to Screen 2; replace raw CSV block with
a readable markdown table; move Python API examples before the example run.

architecture: correct Paper model field list and test count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ne arrays

GitHub's notebook renderer fails when a cell source is stored as individual
characters. Cell 0 had 310 single-character elements; joined into proper lines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GitHub's renderer only resolves attachment: URLs in markdown image syntax.
Converts <img src="attachment:..."> HTML tags to ![alt](attachment:...).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Screenshots compressed from ~300KB PNG to ~50KB JPEG (quality 60, max 900px
wide), reducing the notebook from 2MB to 353KB to stay within GitHub's
rendering limits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jupyter stores base64 attachment data line-wrapped at 76 characters.
Our script produced a single ~44KB-per-image string which may exceed
GitHub's notebook renderer per-line limits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ments

GitHub's renderer reliably handles display_data image outputs (same format
as matplotlib plots) but not the attachment: URL scheme. Restructures
screenshot cells to use pre-executed display_data outputs with embedded
JPEG data, keeping the notebook fully self-contained.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dency

GitHub's renderer may attempt to execute or validate code cells, failing
when epmcminer is not installed. Converts the three API example cells to
fenced code blocks in markdown with pre-rendered output shown below.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wraps all screenshots in <div align="center"> for horizontal centering
- Adds a fourth code cell demonstrating ReportService.save_csv() with
  mock DownloadResult objects and pre-rendered CSV output

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a self-contained setup block so examples run without depending on
prior DownloadService context. Correct export_excel and export_pdf call
sites to pass a full file path instead of a folder, and remove the
incorrect Path return-type claims for those two methods.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@julian-praesent julian-praesent merged commit 091fb14 into main May 31, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant