Bug
When --to-stdout is combined with --format json or --format html, the CLI exits successfully (exit code 0) but writes nothing to stdout. There is no warning on stderr, no error message, and no validation at argument-parsing time. From the user's perspective the conversion silently fails — the only signal that something went wrong is an empty stdout.
text and markdown work as expected; the issue affects only json and html.
The relevant code path explicitly comments that these formats are not implemented, and the help text for --to-stdout does not document which formats are supported:
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java:451-465
if (config.isOutputStdout()) {
java.io.Writer stdoutWriter = new java.io.BufferedWriter(
new java.io.OutputStreamWriter(System.out, java.nio.charset.StandardCharsets.UTF_8));
if (config.isGenerateText()) {
TextGenerator textGenerator = new TextGenerator(stdoutWriter, config);
textGenerator.writeToText(contents);
stdoutWriter.flush();
} else if (config.isGenerateMarkdown()) {
MarkdownGenerator markdownGenerator = new MarkdownGenerator(stdoutWriter, config);
markdownGenerator.writeToMarkdown(contents);
stdoutWriter.flush();
}
// JSON and HTML stdout not yet supported
return;
}
java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/cli/CLIOptions.java:201-202
private static final String TO_STDOUT_LONG_OPTION = "to-stdout";
private static final String TO_STDOUT_DESC = "Write output to stdout instead of file (single format only)";
The original commit (7aaae2e, "perf: add --to-stdout option and refine text-only fast path") describes the feature as supporting "text and markdown formats", so JSON/HTML appears to be a known gap rather than a regression — but there is no user-facing signal that they are unsupported.
Related: #287 (closed) addressed a similar silent-failure-with-exit-0 pattern for hybrid-backend errors. This issue is a different code path (DocumentProcessor.generateOutputs rather than HybridDocumentProcessor) but the user-visible symptom — silent empty output, exit 0, no diagnostic on stderr — is the same class of UX problem.
Steps to reproduce
# Build the shaded CLI jar
./scripts/build-java.sh
JAR=java/opendataloader-pdf-cli/target/opendataloader-pdf-cli-*.jar
# Affected: --to-stdout + json (silent empty output, exit 0)
java -jar $JAR --to-stdout -f json samples/pdf/lorem.pdf > out.json 2>/dev/null
echo "exit=$? size=$(wc -c < out.json)"
# → exit=0 size=0 ❌ no warning, no error
# Affected: --to-stdout + html (same behavior)
java -jar $JAR --to-stdout -f html samples/pdf/lorem.pdf > out.html 2>/dev/null
echo "exit=$? size=$(wc -c < out.html)"
# → exit=0 size=0 ❌
# Working: text and markdown
java -jar $JAR --to-stdout -f markdown samples/pdf/lorem.pdf 2>/dev/null | wc -c
# → 462 ✅
java -jar $JAR --to-stdout -f text samples/pdf/lorem.pdf 2>/dev/null | wc -c
# → 459 ✅
Reproduced on both Linux (WSL) and Windows PowerShell with OpenJDK 21.
Expected behavior
The user should not be left guessing whether the conversion succeeded. Any of the following would be acceptable, in increasing order of effort:
-
Reject at argument-parse time (preferred minimal fix). In CLIOptions.applyAllOptionsTo (or a dedicated validator), throw IllegalArgumentException when --to-stdout is combined with --format json/html/pdf/tagged-pdf:
if (config.isOutputStdout() && (config.isGenerateJSON() || config.isGenerateHtml()
|| config.isGeneratePDF() || config.isGenerateTaggedPDF())) {
throw new IllegalArgumentException(
"--to-stdout currently supports only --format text or markdown");
}
This produces a clear error and a non-zero exit code.
-
Update the help text. Change TO_STDOUT_DESC from "Write output to stdout instead of file (single format only)" to "Write output to stdout instead of file. Supported formats: text, markdown." so the constraint is discoverable without trial and error.
-
Implement JSON/HTML stdout support (full fix). JsonWriter and HtmlGenerator would need overloads that accept a Writer, mirroring how TextGenerator and MarkdownGenerator already do, and DocumentProcessor.generateOutputs would route them through that path.
A minimal, defensive fix is (1) + (2); (3) can follow if the feature is desired.
Version
main @ 1bb3e71 (current HEAD as of report). Reproduced against a freshly built opendataloader-pdf-cli-0.0.0.jar.
Java version
openjdk version "21.0.10" 2026-01-20 LTS
OpenJDK Runtime Environment Microsoft-13106404 (build 21.0.10+7-LTS)
OpenJDK 64-Bit Server VM Microsoft-13106404 (build 21.0.10+7-LTS, mixed mode, sharing)
Bug
When
--to-stdoutis combined with--format jsonor--format html, the CLI exits successfully (exit code0) but writes nothing to stdout. There is no warning on stderr, no error message, and no validation at argument-parsing time. From the user's perspective the conversion silently fails — the only signal that something went wrong is an empty stdout.textandmarkdownwork as expected; the issue affects onlyjsonandhtml.The relevant code path explicitly comments that these formats are not implemented, and the help text for
--to-stdoutdoes not document which formats are supported:java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java:451-465java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/cli/CLIOptions.java:201-202The original commit (
7aaae2e, "perf: add --to-stdout option and refine text-only fast path") describes the feature as supporting "text and markdown formats", so JSON/HTML appears to be a known gap rather than a regression — but there is no user-facing signal that they are unsupported.Related: #287 (closed) addressed a similar silent-failure-with-exit-0 pattern for hybrid-backend errors. This issue is a different code path (
DocumentProcessor.generateOutputsrather thanHybridDocumentProcessor) but the user-visible symptom — silent empty output, exit 0, no diagnostic on stderr — is the same class of UX problem.Steps to reproduce
Reproduced on both Linux (WSL) and Windows PowerShell with OpenJDK 21.
Expected behavior
The user should not be left guessing whether the conversion succeeded. Any of the following would be acceptable, in increasing order of effort:
Reject at argument-parse time (preferred minimal fix). In
CLIOptions.applyAllOptionsTo(or a dedicated validator), throwIllegalArgumentExceptionwhen--to-stdoutis combined with--format json/html/pdf/tagged-pdf:This produces a clear error and a non-zero exit code.
Update the help text. Change
TO_STDOUT_DESCfrom"Write output to stdout instead of file (single format only)"to"Write output to stdout instead of file. Supported formats: text, markdown."so the constraint is discoverable without trial and error.Implement JSON/HTML stdout support (full fix).
JsonWriterandHtmlGeneratorwould need overloads that accept aWriter, mirroring howTextGeneratorandMarkdownGeneratoralready do, andDocumentProcessor.generateOutputswould route them through that path.A minimal, defensive fix is (1) + (2); (3) can follow if the feature is desired.
Version
main@1bb3e71(current HEAD as of report). Reproduced against a freshly builtopendataloader-pdf-cli-0.0.0.jar.Java version