Skip to content

Commit 2e585ac

Browse files
bundoleeclaude
andcommitted
fix(processors): log actual page count being processed, not document total
Objective: When users pass --pages with values that are out of range (e.g. --pages 99999 on a 15-page PDF), warnings correctly report that no pages will be processed, but the same run also logs "Processing 15 pages with 1 threads". Users reading the log cannot tell whether processing actually happened or not, and the contradiction between WARN and INFO lines undermines trust in every other log message. Approach: In DocumentProcessor.processDocument, switch the INFO log to report the size of pagesToProcess (the validated set) instead of the document's total page count. When pagesToProcess is null (no --pages filter), fall back to totalPages so full-document runs still report correctly. This is the smallest change that resolves the contradiction; the surrounding behavior (exit code, empty-output handling, range auto-clamp asymmetry) is left alone — those belong to separate discussions about CLI validation policy, not log accuracy. Evidence: Built the CLI and ran 5 scenarios against a 15-page PDF (samples/pdf/1901.03003.pdf). | Scenario | Before | After | |------------------------|-----------------|-----------------| | no --pages | "Processing 15" | "Processing 15" | | --pages 99999 | "Processing 15" | "Processing 0" | | --pages 1,99999 | "Processing 15" | "Processing 1" | | --pages 1-5 | "Processing 15" | "Processing 5" | | --pages 22-30 | "Processing 15" | "Processing 0" | The log now matches the WARN message and the actual JSON output content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 56e5543 commit 2e585ac

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -283,7 +283,8 @@ private static List<List<IObject>> processDocument(String inputPdfName, Config c
283283

284284
int parallelism = config.getThreads();
285285
ForkJoinPool pool = new ForkJoinPool(parallelism);
286-
LOGGER.log(Level.INFO, "Processing {0} pages with {1} threads", new Object[]{totalPages, parallelism});
286+
int pagesToProcessCount = (pagesToProcess != null) ? pagesToProcess.size() : totalPages;
287+
LOGGER.log(Level.INFO, "Processing {0} pages with {1} threads", new Object[]{pagesToProcessCount, parallelism});
287288

288289
try {
289290
// Loop 1: ContentFilter per-page (largest bottleneck)

0 commit comments

Comments
 (0)