Skip to content

refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility.#2229

Open
gibber9809 wants to merge 5 commits intoy-scope:mainfrom
gibber9809:refactor-output-handler-options
Open

refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility.#2229
gibber9809 wants to merge 5 commits intoy-scope:mainfrom
gibber9809:refactor-output-handler-options

Conversation

@gibber9809
Copy link
Copy Markdown
Contributor

@gibber9809 gibber9809 commented Apr 28, 2026

Description

This PR reworks how we parse arguments for output destinations during search in order to make it easy to support having multiple output destinations in a follow-up PR.

Each output destination is treated like a subcommand, e.g. <output-destination-type> <output-destination-arguments>. Previously, when supporting only one output destination, we would simply collect the unrecognized options after parsing the main search options and hand those options to an options_description specific to the relevant output destination. With multiple output destinations though we need to do a little bit of extra work to figure out which unrecognized options are related to each subcommand.

This PR solves that problem by parsing the unrecognized options left to right, using both the known output destination names and the options_description object for each subcommand to attempt to find the options for each output destination. After this step the options can be parsed with each destination-specific options_description like normal. We can't simply iteratively use each options_description with allow_unregistered to parse the full list of options, because different output destinations share options with the same name (e.g. --host is used both by reducer and network), so some combinations of output destinations could result in throwing errors because of repeated options. As well, we can't simply rely on splitting up the options for each subcommand based solely on the ouptut destination names, because the output destination names can also appear as arguments (e.g. in the package, because of the way our DNS is set up, a reducer query will look like clp-s s <archive> <query> reducer --host reducer --port <port> --job-id <id> --count.

This PR also refactors the CommandLineArguments class to group together options for each output destination into structs.

This PR includes a small functional change, in that aggregation options are now part of the reducer output destination options. This changes how aggregation options must be specified on the command line (i.e. they explicitly need to be part of the reducer output destination arguments), but doesn't represent any change in functionality since aggregation options are only allowed for the reducer output destination anyway.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Confirmed that package search through webui and cli work as expected
  • Manually validated that options are split as expected when passing multiple output destinations (before the command is rejected for having multiple output destinations)

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved validation of output handler configuration and aggregation options.
  • Refactor

    • Reorganized command-line argument handling for output handlers and aggregation options to provide clearer structure and consistency.
    • Aggregation flag positioning adjusted in command-line construction for reducer operations.

@gibber9809 gibber9809 requested a review from a team as a code owner April 28, 2026 21:21
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Walkthrough

Output handler configuration is refactored from individual getter methods to structured option objects. Aggregation flags move from search-level options to reducer-specific options. Command-line parsing now treats output handlers as subcommands with specialized argument groups, and handler dispatch requires exactly one output handler with validated per-handler arguments.

Changes

Cohort / File(s) Summary
Build Configuration
components/core/src/clp_s/CMakeLists.txt
Removes cli_utils.cpp and cli_utils.hpp from build sources.
Option Structures & API
components/core/src/clp_s/CommandLineArguments.hpp
Introduces four option structs (ResultsCacheOutputHandlerOptions, FileOutputHandlerOptions, NetworkOutputHandlerOptions, ReducerOutputHandlerOptions), adds AggregationType enum, replaces individual configuration getters with structured option getters, updates helper function signatures to accept std::vector<std::string> instead of boost::program_options::option.
Parsing Logic
components/core/src/clp_s/CommandLineArguments.cpp
Refactors output handler parsing to collect unrecognized trailing tokens and split them into per-handler subcommand argument groups; moves aggregation flags (--count, --count-by-time) into reducer-specific parsing with validation; implements per-handler parse_*_output_handler_options functions.
Application Integration
components/core/src/clp_s/clp-s.cpp
Refactors output handler creation to use structured option objects; branches aggregation selection logic on AggregationType enum; wraps case bodies in braces for localized option variables.
Search Implementation
components/core/src/clp_s/kv_ir_search.cpp
Changes aggregation support check to reference output handler type (Reducer) instead of aggregation flag methods.
Build Command Assembly
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
Reorders aggregation flags (--count, --count-by-time) to appear after reducer arguments in command-line construction.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: refactoring command-line argument parsing for output destinations to improve extensibility.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.12)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py

Unexpected Ruff issue shape at index 6


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/CommandLineArguments.cpp`:
- Around line 184-211: The loop currently defaults
get_max_tokens_for_option(...).value_or(0) to 0 which lets an option with
unknown arity cause the next token (which might match a subcommand name) to be
misinterpreted as a subcommand; change the logic in the branch that sets
remaining_tokens_in_current_argument so that when
get_max_tokens_for_option(current_subcommand_options_description, option)
returns std::nullopt you handle it explicitly (e.g., throw
std::invalid_argument("unrecognized option \"...\"") or otherwise treat it as an
error) instead of silently defaulting to 0; update code around
remaining_tokens_in_current_argument, current_subcommand,
current_subcommand_arguments and collect_subcommand to rely on that explicit
handling.
- Around line 993-995: The thrown std::invalid_argument in
CommandLineArguments.cpp currently says "clp-s only supports one output handler
at a time"; change this to include the actual detected handlers from
output_options_map so users see which handlers were passed. Locate the check
using output_options_map (the one that throws when output_options_map.size() >
1) and build a descriptive message by joining the map keys / handler names
(e.g., iterate output_options_map and collect the keys into a comma-separated
string) and include that string in the exception text so the thrown message
shows the list of detected handlers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c84e61f7-19bf-4f9e-90b9-3b0851f0bfab

📥 Commits

Reviewing files that changed from the base of the PR and between aa28a0c and 359ab95.

📒 Files selected for processing (6)
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp_s/CommandLineArguments.cpp
  • components/core/src/clp_s/CommandLineArguments.hpp
  • components/core/src/clp_s/clp-s.cpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
💤 Files with no reviewable changes (1)
  • components/core/src/clp_s/CMakeLists.txt

Comment thread components/core/src/clp_s/CommandLineArguments.cpp
Comment on lines +993 to 995
if (output_options_map.size() > 1) {
throw std::invalid_argument("clp-s only supports one output handler at a time");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider clarifying the single-handler constraint.

The error message could be more helpful by listing which handlers were detected. This would aid debugging when users accidentally specify multiple handlers.

💡 Optional enhancement for better error diagnostics
             if (output_options_map.size() > 1) {
-                throw std::invalid_argument("clp-s only supports one output handler at a time");
+                std::string handlers;
+                for (auto const& [name, _] : output_options_map) {
+                    if (false == handlers.empty()) {
+                        handlers += ", ";
+                    }
+                    handlers += name;
+                }
+                throw std::invalid_argument(fmt::format(
+                        "clp-s only supports one output handler at a time, but found: {}",
+                        handlers
+                ));
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/CommandLineArguments.cpp` around lines 993 - 995,
The thrown std::invalid_argument in CommandLineArguments.cpp currently says
"clp-s only supports one output handler at a time"; change this to include the
actual detected handlers from output_options_map so users see which handlers
were passed. Locate the check using output_options_map (the one that throws when
output_options_map.size() > 1) and build a descriptive message by joining the
map keys / handler names (e.g., iterate output_options_map and collect the keys
into a comma-separated string) and include that string in the exception text so
the thrown message shows the list of detected handlers.

@gibber9809 gibber9809 requested a review from LinZhihao-723 May 3, 2026 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant