fix(processors): log actual page count being processed, not document total#513
Conversation
…total Objective: When users pass --pages with values that are out of range (e.g. --pages 99999 on a 15-page PDF), warnings correctly report that no pages will be processed, but the same run also logs "Processing 15 pages with 1 threads". Users reading the log cannot tell whether processing actually happened or not, and the contradiction between WARN and INFO lines undermines trust in every other log message. Approach: In DocumentProcessor.processDocument, switch the INFO log to report the size of pagesToProcess (the validated set) instead of the document's total page count. When pagesToProcess is null (no --pages filter), fall back to totalPages so full-document runs still report correctly. This is the smallest change that resolves the contradiction; the surrounding behavior (exit code, empty-output handling, range auto-clamp asymmetry) is left alone — those belong to separate discussions about CLI validation policy, not log accuracy. Evidence: Built the CLI and ran 5 scenarios against a 15-page PDF (samples/pdf/1901.03003.pdf). | Scenario | Before | After | |------------------------|-----------------|-----------------| | no --pages | "Processing 15" | "Processing 15" | | --pages 99999 | "Processing 15" | "Processing 0" | | --pages 1,99999 | "Processing 15" | "Processing 1" | | --pages 1-5 | "Processing 15" | "Processing 5" | | --pages 22-30 | "Processing 15" | "Processing 0" | The log now matches the WARN message and the actual JSON output content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe ChangesPage Count Logging
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Objective
When users pass
--pageswith values that are out of range (e.g.--pages 99999on a 15-page PDF), warnings correctly report that no pages will be processed, but the same run also logsProcessing 15 pages with 1 threads. Users reading the log cannot tell whether processing actually happened or not, and the contradiction between WARN and INFO lines undermines trust in every other log message.Example before this patch:
The same contradiction occurs with
--pages 1,99999,--pages 22-30, and any other input that filters out some or all pages — the INFO line always echoes the document's total page count rather than what is actually being processed.Approach
In
DocumentProcessor.processDocument, switch the INFO log to report the size ofpagesToProcess(the validated set) instead of the document's total page count. WhenpagesToProcessisnull(no--pagesfilter), fall back tototalPagesso full-document runs still report correctly.This is the smallest change that resolves the contradiction. Surrounding behavior — exit code on empty results, empty-output file generation, the range auto-clamp asymmetry between
1-99999and22-30— is intentionally left alone. Those belong to a separate discussion about CLI validation policy, not log accuracy.Evidence
Built the CLI and ran five scenarios against a 15-page PDF (
samples/pdf/1901.03003.pdf), grepping for theProcessing N pages with 1 threadsline.--pages(full doc)Processing 15 pagesProcessing 15 pages--pages 99999(all invalid)Processing 15 pagesProcessing 0 pages--pages 1,99999(1 valid + 1 invalid)Processing 15 pagesProcessing 1 pages--pages 1-5(5 valid)Processing 15 pagesProcessing 5 pages--pages 22-30(all out of range)Processing 15 pagesProcessing 0 pagesAfter the patch, the INFO line matches both the WARN messages emitted by
getValidPageNumbersand the actual JSON output content.Test plan
mvn clean install -DskipTests)--pagesreports 0--pagesreports only the valid countSummary by CodeRabbit