Skip to content

bug(cli): --to-stdout silently produces empty output for --format json or html #492

@hnc-jglee

Description

@hnc-jglee

Bug

When --to-stdout is combined with --format json or --format html, the CLI exits successfully (exit code 0) but writes nothing to stdout. There is no warning on stderr, no error message, and no validation at argument-parsing time. From the user's perspective the conversion silently fails — the only signal that something went wrong is an empty stdout.

text and markdown work as expected; the issue affects only json and html.

The relevant code path explicitly comments that these formats are not implemented, and the help text for --to-stdout does not document which formats are supported:

java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java:451-465

if (config.isOutputStdout()) {
    java.io.Writer stdoutWriter = new java.io.BufferedWriter(
        new java.io.OutputStreamWriter(System.out, java.nio.charset.StandardCharsets.UTF_8));
    if (config.isGenerateText()) {
        TextGenerator textGenerator = new TextGenerator(stdoutWriter, config);
        textGenerator.writeToText(contents);
        stdoutWriter.flush();
    } else if (config.isGenerateMarkdown()) {
        MarkdownGenerator markdownGenerator = new MarkdownGenerator(stdoutWriter, config);
        markdownGenerator.writeToMarkdown(contents);
        stdoutWriter.flush();
    }
    // JSON and HTML stdout not yet supported
    return;
}

java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/cli/CLIOptions.java:201-202

private static final String TO_STDOUT_LONG_OPTION = "to-stdout";
private static final String TO_STDOUT_DESC = "Write output to stdout instead of file (single format only)";

The original commit (7aaae2e, "perf: add --to-stdout option and refine text-only fast path") describes the feature as supporting "text and markdown formats", so JSON/HTML appears to be a known gap rather than a regression — but there is no user-facing signal that they are unsupported.

Related: #287 (closed) addressed a similar silent-failure-with-exit-0 pattern for hybrid-backend errors. This issue is a different code path (DocumentProcessor.generateOutputs rather than HybridDocumentProcessor) but the user-visible symptom — silent empty output, exit 0, no diagnostic on stderr — is the same class of UX problem.

Steps to reproduce

# Build the shaded CLI jar
./scripts/build-java.sh
JAR=java/opendataloader-pdf-cli/target/opendataloader-pdf-cli-*.jar

# Affected: --to-stdout + json (silent empty output, exit 0)
java -jar $JAR --to-stdout -f json samples/pdf/lorem.pdf > out.json 2>/dev/null
echo "exit=$?  size=$(wc -c < out.json)"
# → exit=0  size=0          ❌ no warning, no error

# Affected: --to-stdout + html (same behavior)
java -jar $JAR --to-stdout -f html samples/pdf/lorem.pdf > out.html 2>/dev/null
echo "exit=$?  size=$(wc -c < out.html)"
# → exit=0  size=0          ❌

# Working: text and markdown
java -jar $JAR --to-stdout -f markdown samples/pdf/lorem.pdf 2>/dev/null | wc -c
# → 462                     ✅
java -jar $JAR --to-stdout -f text samples/pdf/lorem.pdf 2>/dev/null | wc -c
# → 459                     ✅

Reproduced on both Linux (WSL) and Windows PowerShell with OpenJDK 21.

Expected behavior

The user should not be left guessing whether the conversion succeeded. Any of the following would be acceptable, in increasing order of effort:

  1. Reject at argument-parse time (preferred minimal fix). In CLIOptions.applyAllOptionsTo (or a dedicated validator), throw IllegalArgumentException when --to-stdout is combined with --format json/html/pdf/tagged-pdf:

    if (config.isOutputStdout() && (config.isGenerateJSON() || config.isGenerateHtml()
            || config.isGeneratePDF() || config.isGenerateTaggedPDF())) {
        throw new IllegalArgumentException(
            "--to-stdout currently supports only --format text or markdown");
    }

    This produces a clear error and a non-zero exit code.

  2. Update the help text. Change TO_STDOUT_DESC from "Write output to stdout instead of file (single format only)" to "Write output to stdout instead of file. Supported formats: text, markdown." so the constraint is discoverable without trial and error.

  3. Implement JSON/HTML stdout support (full fix). JsonWriter and HtmlGenerator would need overloads that accept a Writer, mirroring how TextGenerator and MarkdownGenerator already do, and DocumentProcessor.generateOutputs would route them through that path.

A minimal, defensive fix is (1) + (2); (3) can follow if the feature is desired.

Version

main @ 1bb3e71 (current HEAD as of report). Reproduced against a freshly built opendataloader-pdf-cli-0.0.0.jar.

Java version

openjdk version "21.0.10" 2026-01-20 LTS
OpenJDK Runtime Environment Microsoft-13106404 (build 21.0.10+7-LTS)
OpenJDK 64-Bit Server VM Microsoft-13106404 (build 21.0.10+7-LTS, mixed mode, sharing)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions