refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility.#2229
Conversation
…multiple output handlers.
…parate options related to a given subcommand.
WalkthroughOutput handler configuration is refactored from individual getter methods to structured option objects. Aggregation flags move from search-level options to reducer-specific options. Command-line parsing now treats output handlers as subcommands with specialized argument groups, and handler dispatch requires exactly one output handler with validated per-handler arguments. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.15.12)components/job-orchestration/job_orchestration/executor/query/fs_search_task.pyUnexpected Ruff issue shape at index 6 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/core/src/clp_s/CommandLineArguments.cpp`:
- Around line 184-211: The loop currently defaults
get_max_tokens_for_option(...).value_or(0) to 0 which lets an option with
unknown arity cause the next token (which might match a subcommand name) to be
misinterpreted as a subcommand; change the logic in the branch that sets
remaining_tokens_in_current_argument so that when
get_max_tokens_for_option(current_subcommand_options_description, option)
returns std::nullopt you handle it explicitly (e.g., throw
std::invalid_argument("unrecognized option \"...\"") or otherwise treat it as an
error) instead of silently defaulting to 0; update code around
remaining_tokens_in_current_argument, current_subcommand,
current_subcommand_arguments and collect_subcommand to rely on that explicit
handling.
- Around line 993-995: The thrown std::invalid_argument in
CommandLineArguments.cpp currently says "clp-s only supports one output handler
at a time"; change this to include the actual detected handlers from
output_options_map so users see which handlers were passed. Locate the check
using output_options_map (the one that throws when output_options_map.size() >
1) and build a descriptive message by joining the map keys / handler names
(e.g., iterate output_options_map and collect the keys into a comma-separated
string) and include that string in the exception text so the thrown message
shows the list of detected handlers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c84e61f7-19bf-4f9e-90b9-3b0851f0bfab
📒 Files selected for processing (6)
components/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp_s/CommandLineArguments.cppcomponents/core/src/clp_s/CommandLineArguments.hppcomponents/core/src/clp_s/clp-s.cppcomponents/core/src/clp_s/kv_ir_search.cppcomponents/job-orchestration/job_orchestration/executor/query/fs_search_task.py
💤 Files with no reviewable changes (1)
- components/core/src/clp_s/CMakeLists.txt
| if (output_options_map.size() > 1) { | ||
| throw std::invalid_argument("clp-s only supports one output handler at a time"); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider clarifying the single-handler constraint.
The error message could be more helpful by listing which handlers were detected. This would aid debugging when users accidentally specify multiple handlers.
💡 Optional enhancement for better error diagnostics
if (output_options_map.size() > 1) {
- throw std::invalid_argument("clp-s only supports one output handler at a time");
+ std::string handlers;
+ for (auto const& [name, _] : output_options_map) {
+ if (false == handlers.empty()) {
+ handlers += ", ";
+ }
+ handlers += name;
+ }
+ throw std::invalid_argument(fmt::format(
+ "clp-s only supports one output handler at a time, but found: {}",
+ handlers
+ ));
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/core/src/clp_s/CommandLineArguments.cpp` around lines 993 - 995,
The thrown std::invalid_argument in CommandLineArguments.cpp currently says
"clp-s only supports one output handler at a time"; change this to include the
actual detected handlers from output_options_map so users see which handlers
were passed. Locate the check using output_options_map (the one that throws when
output_options_map.size() > 1) and build a descriptive message by joining the
map keys / handler names (e.g., iterate output_options_map and collect the keys
into a comma-separated string) and include that string in the exception text so
the thrown message shows the list of detected handlers.
Description
This PR reworks how we parse arguments for output destinations during search in order to make it easy to support having multiple output destinations in a follow-up PR.
Each output destination is treated like a subcommand, e.g.
<output-destination-type> <output-destination-arguments>. Previously, when supporting only one output destination, we would simply collect the unrecognized options after parsing the main search options and hand those options to an options_description specific to the relevant output destination. With multiple output destinations though we need to do a little bit of extra work to figure out which unrecognized options are related to each subcommand.This PR solves that problem by parsing the unrecognized options left to right, using both the known output destination names and the
options_descriptionobject for each subcommand to attempt to find the options for each output destination. After this step the options can be parsed with each destination-specificoptions_descriptionlike normal. We can't simply iteratively use eachoptions_descriptionwithallow_unregisteredto parse the full list of options, because different output destinations share options with the same name (e.g.--hostis used both byreducerandnetwork), so some combinations of output destinations could result in throwing errors because of repeated options. As well, we can't simply rely on splitting up the options for each subcommand based solely on the ouptut destination names, because the output destination names can also appear as arguments (e.g. in the package, because of the way our DNS is set up, a reducer query will look likeclp-s s <archive> <query> reducer --host reducer --port <port> --job-id <id> --count.This PR also refactors the
CommandLineArgumentsclass to group together options for each output destination into structs.This PR includes a small functional change, in that aggregation options are now part of the reducer output destination options. This changes how aggregation options must be specified on the command line (i.e. they explicitly need to be part of the reducer output destination arguments), but doesn't represent any change in functionality since aggregation options are only allowed for the reducer output destination anyway.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Release Notes
Bug Fixes
Refactor