Final submission#42
Merged
Merged
Conversation
Replace the QDateEdit-based date inputs on screen 1 with a fully custom DatePicker widget (QPushButton display + floating QCalendarWidget popup). The new widget opens the calendar on any click, enforces optional min/max bounds, emits dateChanged, and matches the app's dark minimal theme. screen_search.py wires start ≤ end constraints in both directions so neither picker can be set to an invalid range. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Display the package version (from importlib.metadata) in the title bar with accent colour; increase step-indicator top margin and title bar height for better visual breathing room. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add SCROLLBAR_STYLE to theme.py — thin 8px track, rounded 4px handle, orange accent tinted at rest / brighter on hover / full on press, arrow buttons hidden. Applied globally via app.setStyleSheet() in main.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Output folder is set on screen 2 (Download settings), so the redundant folder card on ScreenSearch has been removed. Drops the associated _browse_folder slot, unused imports (Path, QFileDialog, QSizePolicy), _FOLDER_INPUT_STYLE, _BROWSE_BTN_STYLE constants, and the output_folder assignment in get_params(). Removes the now-dead prefill block in ScreenPreview.load() and the five tests that covered the removed behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add _make_param_line() with setWordWrap(True) + stretch factor 1 so long values reflow as the window is resized. License is placed inline on row 1 when its joined text is ≤ 40 chars (typical: "CC-BY"); falls back to its own wrapping row when many licenses are selected. Publication types always get their own wrapping row. Author ORCIDs stay on their own row too. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
set_progress() was stopping the animation timer, so the 4 thread dots were always static. Now the timer stays alive in progress mode too. Thread dots use the same _DOT_BRIGHT/_DOT_DIM pair as loading dots; _tick_dots() routes each tick to whichever row is currently visible. Removed the unused _THREAD_DOT_STYLES static-opacity list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Percentage now reflects downloaded vs requested target (self._total) instead of downloaded vs processed. Added processed: int | None param to ProgressWidget.set_progress() so the count label can show "downloaded X out of Y – processed Z results". Both _on_progress and _on_finished pass self._total as the denominator. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
estimated_downloadable is the raw count from the ~10 preview papers, not the
full result set. Displaying it as a raw number ("892") was misleading. Now
shown as a percentage of the previewed batch (e.g. "~80%") with updated
tile subtitle "of previewed results".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thread count was hardcoded to MAX_WORKERS (4). Now _download_batch tracks active threads with an atomic counter, stamps the live count onto each DownloadResult, and screen 3 reads result.active_threads so the dot indicators reflect the true number of concurrently running threads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dots next to "X threads running" are now a fixed 3-dot animation purely as a visual indicator. Previously they showed 1–4 dots matching the live thread count, which was confusing. The label still reflects the true count. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Validate ORCID format (regex + ISO 7064 MOD 11-2 checksum) and existence (async pub.orcid.org registry check) on Screen 1. Pills show three states: pending (grey), valid (orange), invalid (red). Fail-open on network errors; URL-prefixed ORCIDs are normalised automatically. New: OrcidClient, OrcidValidationService, orcid_utils. - Fix pub-type query clause: prepend Europe PMC catch-all 'HAS_BOOK:Y OR (SRC:(MED OR PMC OR AGR OR CBA) NOT PUB_TYPE:(Review))' so regular journal articles are not silently excluded. PUB_TYPE values are now lowercase with correct API spellings (e.g. 'observational study, veterinary'). - Fix tag_input: hide pills before setParent(None) to prevent Qt promoting visible widgets to top-level windows on status update. - Update docs: README (F2 spec, Python API), FUTURE_TODOS, __init__.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Action bar hint label (Screens 1 & 2): - Add a single QLabel to the action bar of ScreenSearch and ScreenPreview that explains why the primary button is disabled. Exactly one message is shown via a priority cascade even when multiple conditions are violated simultaneously. The label is hidden as soon as the button becomes enabled. ORCID query filtering: - Add TagInput.get_tags_by_status(statuses) — returns tags in insertion order whose status is in the provided list. Allows callers to exclude tags by validation state without duplicating status logic. - ScreenSearch.get_params() now passes get_tags_by_status(['valid', 'pending']) for author_orcids, so format-invalid and registry-rejected (red) ORCIDs are silently dropped from the query. Pending (grey, network-unreachable) ORCIDs are still included to preserve fail-open behaviour. - 8 new unit tests covering set_tag_status and get_tags_by_status. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Validation: - License field now gated: at least one license required to enable 'Continue to preview' (spec compliance — was not enforced before) - Output folder writability checked via os.access after Browse dialog; non-writable folder blocks 'Start download' with an explanatory hint - Invalid ORCIDs (red pills) excluded from search query; pending (grey) ORCIDs still included (fail-open) - SearchParams.output_folder changed to Path | None = None; DownloadService raises ValueError if called without an explicit folder UI: - _validate() on Screen 1 refactored to 3-condition list-join hint: 'Enter a search keyword · Select a publication type · Select a license' - _validate() on Screen 2 extended with writability condition and matching hint - 'Count' label renamed to 'Number of papers' on Screen 2 - Query field max length set to 500 characters - Date From minimum changed from 2000-01-01 to 1900-01-01 Docs: - README F1/F2/F3/F5 updated to reflect all validation changes - Vignette Screen 1/2 callouts and Maria's scenario updated accordingly Tests: 17 new tests covering all new validation paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace all QMessageBox popups with toast overlay (elevated card, coloured left/right stripe, icon badge, 10 s hold) - Export dialogs default to 'results.xlsx' / 'results.pdf' - Download settings card moved above results preview on screen 2 - Results preview, live log, and skipped papers sections now fill remaining screen height (shared EXPANDABLE_MIN_HEIGHT in theme) - Search progress card mirrors download screen layout and sizing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Critical: guard resizeEvent in ScreenDownload and ScreenSummary against early access to self._toast before _build_ui() sets it; initialize to None - Moderate: rename Toast._reposition() to public reposition() and update all callers - Moderate: pass progress widget explicitly to _make_loading_card() instead of relying on implicit self._progress closure - Moderate: truncate export path to filename in toast to prevent overflow - Minor: replace QWidget.__new__ antipattern in test_reposition_noop_without_parent with proper setParent(None) approach - Minor: document DEFAULT_WIDTH/DEFAULT_HEIGHT change rationale in app.py - Minor: expand EXPANDABLE_MIN_HEIGHT comment in theme.py - Minor: add test for loading card visible while search is in progress - Minor: add intent comment to test_hold_durations_are_both_10_seconds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three bugs caused the UI to freeze at '2 threads running' after cancel: 1. client.py: download_pdf now accepts cancel_event; checks it before each retry, replaces time.sleep with cancel_event.wait so the sleep is interruptible; also catches requests.exceptions.Timeout (previously unhandled, causing it to bubble up as an unrecoverable worker error) 2. download_service.py: cancel_event is threaded through _download_one and forwarded to download_pdf; ConnectionError raised while the event is set becomes STATUS_SKIPPED 'Cancelled' instead of STATUS_FAILED 3. screen_download.py: _on_error now calls set_progress before showing the toast so the thread row is cleared when the worker exits via exception Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
QVBoxLayout with no trailing stretch distributes items evenly when QScrollArea expands the container widget to fill the viewport height. This caused the first few download rows to appear spread out instead of stacking from the top. - Add trailing addStretch() to the log layout on construction so rows always stack from the top regardless of container height - Change _add_log_row to insertWidget(count-1) to insert rows before the stretch rather than after it - Restore the stretch in _clear_log after removing all row widgets Added _log_row_count() test helper that counts only QWidget items in the layout, correctly ignoring the non-widget stretch spacer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rate-limiting (HTTP 429) - download_pdf() now treats 429 as retryable (was incorrectly grouped with non-retryable 4xx errors); retries up to _PDF_MAX_RETRIES times using the Retry-After header value when present, falling back to the same 1 s / 2 s exponential schedule used for 5xx errors - search() gains a 3-attempt retry loop handling 429 and transient ConnectionError / Timeout with exponential backoff - _parse_retry_after() helper extracts the Retry-After delay from a response, returning None when the header is absent or non-numeric - MAX_WORKERS reduced from 4 to 2 to halve the concurrent PDF request rate and reduce the likelihood of triggering rate limits - Exhausted 429 errors surface to the user as 'Server rate limiting' rather than the raw status code '429' PDF content validation - Added InvalidPdfContentError exception to client.py; raised by download_pdf() whenever a 200 response body does not start with the %PDF magic bytes (e.g. an HTML bot-challenge page) - _download_one() catches InvalidPdfContentError and returns STATUS_SKIPPED with reason 'Not a valid PDF', preventing corrupted HTML files from being written to disk and reported as downloaded Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Integration tests previously made live HTTP calls to the Europe PMC API on every CI run, causing non-deterministic failures from rate limits and network timeouts. pytest-recording (vcrpy wrapper) records real HTTP conversations once as YAML cassettes (tests/integration/cassettes/) and replays them in CI without any network access. --block-network is added to the CI workflow so accidental live calls fail fast rather than timing out silently. - pyproject.toml: add pytest-recording to dev dependencies - tests/integration/conftest.py: configure cassette directory - tests/integration/test_*.py: replace per-class @pytest.mark.integration decorators with module-level pytestmark combining vcr + integration; update docstrings to reflect cassette workflow - tests/integration/test_client_integration.py: also catch InvalidPdfContentError in the PDF download loop (added after tests were written) so the graceful-skip path still works when URLs return HTML challenge pages - tests/integration/cassettes/: 32 recorded cassettes covering every integration test; search responses are small JSON, download responses include PDF bytes - .github/workflows/ci.yml: remove -m "not integration" filter and add --block-network; all 592 tests now run in ~7 s with no live API calls - README.md: update testing section with new commands and an explanation of the cassette recording/replay approach To re-record cassettes after an API change: pytest -m integration --record-mode=all Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the CSV-style landscape table with a structured dark-themed portrait PDF matching the app's visual style: Layout sections (portrait A4, APP_BG page background): - Header: 'epmcminer' in accent orange + generated timestamp - Stat row: three side-by-side cards — Downloaded (orange), Skipped (red), Total results (orange) — on dark card backgrounds - Search parameters card: Query, Sort, Date always shown; License, Publication types, and Author ORCIDs appended when non-empty - Downloaded papers: one card per successfully downloaded paper showing title, authors/journal/year, and filename; omitted when empty - Skipped papers: one card per non-downloaded paper showing title, meta, and reason in red; omitted when all papers downloaded Add total_found: int = 0 parameter to export_pdf() so the PDF can display the full API result count. Store _total_found in ScreenSummary and pass it through from load() to the export worker lambda. Add 10 new TestExportPdf tests covering: all-downloaded, all-skipped, mixed results, total_found variations, mixed filters, empty metadata, both sections present/absent, and downloaded paper without file_path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Consolidate retry constants: - Merge _PDF_MAX_RETRIES/_SEARCH_MAX_RETRIES into _MAX_RETRIES (both were 3) - Merge _PDF_RETRY_BACKOFF_BASE/_SEARCH_RETRY_BACKOFF_BASE into _RETRY_BACKOFF_BASE (both were 1); prevents silent divergence Simplify download_service page-result accumulation: - Replace for-loop with extend() + sum() comprehension Fix report_service trailing spacer: - Use enumerate() to skip Spacer(1, 3) after the last paper card in both downloaded and skipped sections - Fix _build_pdf_story return type annotation: list -> list[Any] Fix screen_download type safety and silent CSV failure: - Assert output_folder is not None before passing to save_csv() - Show 'Could not save report.csv' toast when save_csv raises so the failure is visible to the user instead of only logged Add missing test coverage: - test_search_429_non_numeric_retry_after_falls_back_to_backoff: covers the except ValueError branch in _parse_retry_after (lines 44-45) - test_5xx_cancel_during_backoff_aborts_retries: covers the cancel_event.wait path inside the 5xx retry branch (lines 274-275) - TestOrcidExistenceWorker: 3 tests for run() (validation_done true/false and network_error); covers lines 122-132 - TestOrcidTagValidation: 7 tests for _on_orcid_tags_changed logic (URL normalisation, format-invalid, pending, valid, invalid, network error, no-service early-return); covers lines 413-446 and 455-469 Coverage: 94% -> 96% overall; screen_search 84% -> 99%; client.py 96% -> 100% Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ooling Fixes three high-severity bugs: assert → raise ValueError for output_folder validation (disabled under -O), OrcidClient now catches Timeout alongside ConnectionError (fail-open design), and bare except in _on_finished now names the exception variable and logs the full traceback. Extracts paper_from_raw() helper to eliminate three copy-pasted Paper constructions across search and download services. Removes the duplicate FREE_FULL_TEXT_FILTER that build_query was appending (the client already enforces it on every request); integration cassettes re-recorded against the corrected single-filter URLs. Centralises status colours (SUCCESS, DANGER, SKIPPED_BG) in theme.py, moves the QFrame inline import to module level in screen_preview.py, and guards _InputSlot._confirm() against emitting on empty text. Adds ruff rules B/C4/UP/SIM, mypy configuration, and --block-network as the default pytest addopt to prevent accidental live API calls in tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…add date validation - Relocate DownloadResult from api/ to services/ (it is a service output, not an HTTP model) - Delete api/models.py (redundant re-export module) - Fix active_threads counter: capture thread count before decrement so it includes the finishing thread - Add ISO-8601 date validation and date ordering check to SearchParams.__post_init__ - Fix PreviewWorker cancellation: replace quit()/wait() with threading.Event cancel_event so the main thread is not blocked waiting for an in-flight HTTP request - Replace 8-tuple return from _make_stats_row with _StatsWidgets NamedTuple - Update all import paths across services, tests, and integration tests - Re-record 32 integration cassettes after query filter deduplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trings Adds Google-style Examples blocks to preview(), download(), save_csv(), export_excel(), export_pdf(), search(), and download_pdf() to satisfy the rubric requirement that all functions be documented with useful examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Service/API layer (real type safety improvements): - Narrow DownloadResult.STATUS_* ClassVars to Literal types — lets mypy verify callers pass valid status strings - Add dict[str, Any] type arguments throughout client, search_service, and download_service (raw API result dicts) - Add list[Any] type arguments to report_service private helpers (reportlab flowables have no common base type) - Cast response.json() and response.content to concrete types so mypy can verify return types without losing information - Wrap OrcidClient bool comparison in bool() to resolve no-any-return - Annotate app._on_download_complete with list[DownloadResult] GUI layer (PyQt6 stub compatibility): - Change resizeEvent/paintEvent signatures to accept event | None to match PyQt6 stubs and eliminate override errors in date_picker, screen_summary, screen_download, and toast - Add type: ignore[override] on mouse event handlers (body uses event directly and cannot safely accept None) - Annotate FlowLayout.addItem/itemAt/takeAt to match QLayout stubs; add None guard in addItem since Qt never passes None in practice - Add type: ignore[union-attr] for Qt methods that return Optional types but are guaranteed non-None at call sites Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Auto-fix all D413 (blank line after last docstring section) and COM812 (trailing comma) violations across the entire codebase via ruff - Add noqa: DTZ005 comment to logger.py explaining intentional local time use - Remove dead ensure_output_structure() from file_utils.py (inline mkdir in DownloadService covers this); drop its four unit tests - Fix cancellation latency in _download_page: submit futures one at a time with a cancel_event check before each, so a pending cancellation stops queuing new work immediately rather than after a full batch is submitted - Show a toast notification in screen_search.py when ORCID registry check fails due to a network error (was silently leaving the pill in pending) - Add docs/architecture.md: comprehensive architecture overview for the project Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Insert three pre-executed code cells (query builder, filename construction, ORCID validation) that demonstrate the Python API without requiring network access. Also correct factual errors in the prose cells (Python version, screen descriptions). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Re-recorded all 32 integration test cassettes with --record-mode=all. All tests pass against the updated responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…response The Europe PMC core response nests the journal name under journalInfo.journal.title, not a top-level journalTitle field, so the journal column was always empty. Updated paper_from_raw() and the test fixtures accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ling Non-429 HTTP 4xx/5xx failures now report the reason as 'PDF unavailable' instead of the raw numeric status code. Results carrying that reason are rendered with red ✗ styling in the download log (screen 3) and the skipped papers list (screen 4), consistent with other failure types. True skips (already downloaded, cancelled) keep the orange – styling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tation Add a dedicated subsection for each service (SearchService, DownloadService, ReportService, OrcidValidationService) with method signatures, parameter descriptions, and runnable examples. Add a SearchParams field table and a data model reference (Paper, SearchResult, DownloadResult) including all status constants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vignette: correct flowchart so preview results flow through PreviewWorker rather than directly from the API to Screen 2; replace raw CSV block with a readable markdown table; move Python API examples before the example run. architecture: correct Paper model field list and test count. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ne arrays GitHub's notebook renderer fails when a cell source is stored as individual characters. Cell 0 had 310 single-character elements; joined into proper lines. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GitHub's renderer only resolves attachment: URLs in markdown image syntax. Converts <img src="attachment:..."> HTML tags to . Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Screenshots compressed from ~300KB PNG to ~50KB JPEG (quality 60, max 900px wide), reducing the notebook from 2MB to 353KB to stay within GitHub's rendering limits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jupyter stores base64 attachment data line-wrapped at 76 characters. Our script produced a single ~44KB-per-image string which may exceed GitHub's notebook renderer per-line limits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ments GitHub's renderer reliably handles display_data image outputs (same format as matplotlib plots) but not the attachment: URL scheme. Restructures screenshot cells to use pre-executed display_data outputs with embedded JPEG data, keeping the notebook fully self-contained. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dency GitHub's renderer may attempt to execute or validate code cells, failing when epmcminer is not installed. Converts the three API example cells to fenced code blocks in markdown with pre-rendered output shown below. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wraps all screenshots in <div align="center"> for horizontal centering - Adds a fourth code cell demonstrating ReportService.save_csv() with mock DownloadResult objects and pre-rendered CSV output Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a self-contained setup block so examples run without depending on prior DownloadService context. Correct export_excel and export_pdf call sites to pass a full file path instead of a folder, and remove the incorrect Path return-type claims for those two methods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implemented, reviewed and tested everything. It's time to submit this project