Skip to content

plan: replace clap with a library specifically designed for the needs of parallel-disk-usage #400

@KSXGitHub

Description

@KSXGitHub

The current limitations of clap

No easy way to define a flag alias for an option with a value

We want to define the following aliases:

  • --len--quantity=apparent-size
  • --blksize--quantity=block-size
  • --blocks--quantity=block-count
  • --align-left--align=left
  • --align-right--align=right
  • --top-down--direction=top-down
  • --bottom-up--direction=bottom-up

Currently, clap provides no built-in way to do it.

This project chose not to implement them with custom logic.

Impossible states as a result of conflicts_with

Currently, this project has to use conflicts_with and its family to enforce invariants.

#[clap(
long,
conflicts_with_all = ["quantity", "deduplicate_hardlinks", "one_file_system"]
)]
pub json_input: bool,

#[clap(
long,
short = 'w',
conflicts_with = "column_width",
visible_alias = "width"
)]
pub total_width: Option<usize>,

This is error-prone.

Hacky example section

This project currently relies on after_help and family to simulate the example section.

after_help = text_block! {
"Examples:"
" $ pdu"
" $ pdu path/to/file/or/directory"
" $ pdu file.txt dir/"
" $ pdu --quantity=apparent-size"
" $ pdu --deduplicate-hardlinks"
" $ pdu --bytes-format=plain"
" $ pdu --bytes-format=binary"
" $ pdu --min-ratio=0"
" $ pdu --min-ratio=0.05"
" $ pdu --min-ratio=0 --max-depth=inf --json-output | jq"
" $ pdu --json-input < disk-usage.json"
},

after_long_help = text_block! {
"Examples:"
" Show disk usage chart of current working directory"
" $ pdu"
""
" Show disk usage chart of a single file or directory"
" $ pdu path/to/file/or/directory"
""
" Compare disk usages of multiple files and/or directories"
" $ pdu file.txt dir/"
""
" Show chart in apparent sizes instead of block sizes"
" $ pdu --quantity=apparent-size"
""
" Detect and subtract the sizes of hardlinks from their parent nodes"
" $ pdu --deduplicate-hardlinks"
""
" Show sizes in plain numbers instead of metric units"
" $ pdu --bytes-format=plain"
""
" Show sizes in base 2¹⁰ units (binary) instead of base 10³ units (metric)"
" $ pdu --bytes-format=binary"
""
" Show disk usage chart of all entries regardless of size"
" $ pdu --min-ratio=0"
""
" Only show disk usage chart of entries whose size is at least 5% of total"
" $ pdu --min-ratio=0.05"
""
" Show disk usage data as JSON instead of chart"
" $ pdu --min-ratio=0 --max-depth=inf --json-output | jq"
""
" Visualize existing JSON representation of disk usage data"
" $ pdu --json-input < disk-usage.json"
},

The problems:

  1. Since the text is colorless, pdu is forced to remove all color from --help to make it consistent.
  2. The script that generates example sections for USAGE.md and the man page has to parse this text.
  3. Duplication of information: The commands are defined twice, one for short, one for long.

Option flag with more than a single value (for tuples or fixed-size arrays)

We wanted to define a tuple of 2 for a column width layout. But clap only offers number_of_values. And so, we were forced to use Vec.

/// Maximum widths of the tree column and width of the bar column.
#[clap(long, number_of_values = 2, value_names = &["TREE_WIDTH", "BAR_WIDTH"])]
pub column_width: Option<Vec<usize>>,

This opens the door for validation errors and impossible states.

Conflating between Rust documentation and CLI documentation

clap, by default, takes #[doc] attributes as documentation and description of the CLI.

This causes conceptual confusion.

Not to mention, the specialized syntax for Rust documentation (which is a flavor of Markdown) looks completely nonsensical in the CLI help message.

But this is not a problem that can't be worked around: Just define your own #[clap(about)].

Syntax drafts

Container-level attributes to define flag aliases

#[derive(ParseCli)]
#[parse_cli(shorthand("len", ["--quantity", "apparent-size"], about = "Measure apparent sizes"))]
#[parse_cli(shorthand("blksize", ["--quantity", "block-size"], about = "Measure block sizes (block-count * 512B)"))]
#[parse_cli(shorthand("blocks", ["--quantity", "block-count"], about = "Count numbers of blocks"))]
#[parse_cli(test(
    module = args_shorthand_tests, // this creates a `#[cfg(test)] mod args_shorthand_tests` that tests the correctness of the shorthand expansions
    scopes = ["shorthands"], // the scopes must be known string literals
))]
struct Args { /* ... */ }
#[derive(ParseCli)]
#[parse_cli(shorthand("len", "--quantity=apparent-size"))]
#[parse_cli(shorthand("blksize", "--quantity=block-size"))]
#[parse_cli(shorthand("blocks", "--quantity=block-count"))]
#[parse_cli(test(
    module = args_shorthand_tests, // this creates a `#[cfg(test)] mod args_shorthand_tests` that tests the correctness of the shorthand expansions
    scopes = ["shorthands"], // the scopes must be known string literals
))]
struct Args { /* ... */ }
#[derive(ParseCli)]
#[parse_cli(
    shorthand("len", "--quantity=apparent-size"),
    shorthand("blksize", "--quantity=block-size"),
    shorthand("blocks", "--quantity=block-count"),
)]
#[parse_cli(test(
    module = args_shorthand_tests, // this creates a `#[cfg(test)] mod args_shorthand_tests` that tests the correctness of the shorthand expansions
    scopes = ["shorthands"], // the scopes must be known string literals
))]
struct Args { /* ... */ }
#[derive(ParseCli)]
#[parse_cli(shorthands(
    ("len", "--quantity=apparent-size"),
    ("blksize", "--quantity=block-size"),
    ("blocks", "--quantity=block-count"),
))]
#[parse_cli(test(
    module = args_shorthand_tests, // this creates a `#[cfg(test)] mod args_shorthand_tests` that tests the correctness of the shorthand expansions
    scopes = ["shorthands"], // the scopes must be known string literals
))]
struct Args { /* ... */ }

Field-level attributes to define complex flag aliases

#[derive(ParseCli)]
struct Args {
  /* ... */

  #[parse_cli(
    long,
    short = 'q',
    value_enum, // `value_enum` implies `option`, conflicts with `flag`
    default_value = Quantity::DEFAULT, // alternatively, `default_value_raw = "block-size"`
    shorthands(
        ("len", "--{long}=apparent-size"),
        ("blksize", "--{long}=block-size"),
        ("blocks", "--{long}=block-count"),
    ),
    test(
        module = args_quantity_shorthand_tests,
        scopes = ["shorthands"],
    ),
  ]
  quantity: Quantity,

  /* ... */
}

Discriminated flags to eliminate impossible states

// accepts `--total-width=100`, `--total-width 100`, `-w 100`, `--column-width 30 70`
// rejects `--column-width 30`, `--column-width=30 70`, `--column-width=30 --column-width=70`
#[derive(ParseCli)]
enum ColumnWidthDistribution {
    // accepts `--total-width=100`, `--total-width 100`, `-w 100`
    #[parse_cli(
        long,
        short = 'w',
        option,
        about = "Width of the visualization",
    )]
    TotalWidth(usize),

    // accepts `--column-width 30 70`
    // rejects `--column-width 30`, `--column-width=30 70`, `--column-width=30 --column-width=70`
    #[parse_cli(
        long,
        value_tuple = 2, // `value_tuple` implies `option`, conflicts with `flag` and `value_enum`
        value_names = ["TREE_WIDTH", "BAR_WIDTH"],
        about = "Maximum widths of the tree column and width of the bar column",
    )]
    ColumnWidth((usize, usize)), // could also be `[usize; 2]`
}

The use of ColumnWidthDistribution within Args is demonstrated right below.

Field-level flatten attribute to embed sub-CLI

#[derive(ParseCli)]
struct Args {
    /* ... */

    #[parse_cli(
        flatten, // `flatten` conflicts with `long`, `short`, `flag`, `option`, and `value_enum`
        optional, // this is required, the new library is stricter than `clap`, it won't try to be "smart"
    )]
    column_width_distribution: Option<ColumnWidthDistribution>,

    /* ... */
}

Container-level attribute to define examples

#[derive(ParseCli)]
#[parse_cli(program = "pdu")]
#[parse_cli(examples(
    /* ... */
    (
        about = "Show sizes in base 2¹⁰ units (binary) instead of base 10³ units (metric)",
        sh = "{program} --bytes-format=binary",
    ),
    /* ... */
    (
        about = "Show disk usage data as JSON instead of chart",
        sh = "pdu --min-ratio=0 --max-depth=inf --json-output | jq", // `pdu` is also fine, as `#[parse_cli(program = "pdu")]` should detect it
    ),
    (
        about = "Visualize existing JSON representation of disk usage data",
        sh = "pdu --json-input < disk-usage.json",
    ),
    /* ... */
))]
#[parse_cli(test(
    module = args_example_tests, // this creates a `#[cfg(test)] mod args_example_tests` that test the correctness of the examples
    scopes = ["examples"], // the scopes must be known string literals
))]
struct Args { /* ... */ }

Container-level attribute to define early exit flags or value options

Since the new library is going to be stricter than clap, it shall not provide --help by default.

The early exit flags or option values are attached with a function that replaces the whole parsing procedure. The function can either actually exit early (with std::process::exit) or return the type of the parsed argument itself.

// When the user pass `--help`, `-h`, `--version`, `-v`, the program is going to terminate.
// The call-site of `Args::parse()` still thinks it returns `Args`.
// And the code that follow `Args::parse()` continues like nothing special happened.
#[derive(ParseCli)]
#[parse_cli(early_exits(
    (
        flag,
        long = "help",
        short = 'h',
        action = print_help,
    ),
    (
        flag,
        long = "version",
        short = 'v',
        action = print_version,
    ),
))]
struct Args { /* ... */ }

fn print_help() -> Args {
    println!("{}", Args::help().ansi(AnsiMode::Auto));
    std::process::exit(0)
}

fn print_version() -> Args {
    println!("{VERSION}");
    std::process::exit(0)
}

Attributes to generate tests to check the correctness of string literals

Partially demonstrated in other sections.

Additions:

#[derive(ParseCli)]
#[parse_cli(test(
    module = args_tests,
    all, // this generates tests for everything
))]
struct Args { /* ... */ }
#[derive(ParseCli)]
#[parse_cli(test(
    function = args_tests, // put everything in a single `#[test] fn args_test()` instead of a module
    all,
))]
struct Args { /* ... */ }
#[derive(ParseCli)]
#[parse_cli(test(
    module = args_tests,

    // planned scopes so far
    scopes = [
        "documentation", // ensures all CLI documentations are correct
        "examples", // ensures all examples are correct
        "input-value-raws", // ensures all values of `default_value_raw` are correct
        "name-collisions", // ensures none of the flag names and option names collide
        "shorthands", // ensures all shorthands are correct
    ],
))]
struct Args { /* ... */ }

Do not conflate Rust documentation for CLI documentation

Enough said.

Specialized markup languages to create descriptions in CLI help texts, man pages, and shell completions

They are defined in #[parse_cli(about)] attributes and verified in #[parse_cli(test(scope = "documentation"))].

Potential special markups:

  1. Auto hyperlink: #[parse_cli(about = "Visit <https://example.com/> for more information")].
  2. Labelled hyperlink: #[parse_cli(about = "Visit [this page](https://example.com/) for more information")].
  3. Intra-link: #[parse_cli(about = "Recommended to use with [`--the-other-flag`]")].
  4. Cross manual reference: #[parse_cli(about = "See [pdu(7)] to get started")].

Design principles

The derive macro is the only correct way to use this library.

Be as declarative as possible.

Planned features

Documentation

--help text, ANSI colored and non-ANSI colored.

Man page (roff syntax).

USAGE.md.

Auto-completion

Supported shells

Priorities:

  • Bash and ZSH: Highest. I personally use them.
  • Fish and Powershell: Medium. There are users for this.
  • Nushell: Below medium. Depends on how easy it is to create completion for this shell.
  • Elvish: Lowest. Provided by clap. But who uses this?

Additional requirements

Prevent shell injections, markup injections, and other similar injections with appropriate escapes. It's not about security (you're in control of the program anyway), it's about convenience and confidence that your text isn't going to produce corrupted outputs.

Implementation hints, suggestions, and directions

Use the parser combinator pattern to parse strings.

Implement only what is necessary. But leave room for future features.

Test extensively. Test carefully. Prevent errors as a result of AI hallucinations.

The names and identifiers above are not final. Implementers and AI agents may bike-shed and choose the best names.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions