Skip to content

Commit 701f8f8

Browse files
committed
dedup command
1 parent 2e77b31 commit 701f8f8

2 files changed

Lines changed: 30 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
- built-in REPL setting `check_badness` to render a badness panel after each parse, available regardless of the active parser plugin.
1616
- parser plugin settings have an optional flag to indicate that no parser reload is nodes, REPL adjusted to accommodate this mechanism.
1717
- functional pattern `deep`.
18-
- CLI `stats` command.
18+
- CLI `stats` and `dedup` commands.
1919

2020
### Changed
2121

src/hyperbase/cli/__init__.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,29 @@ def main() -> None:
188188
help="Number of histogram bins (default: auto)",
189189
)
190190

191+
# --- dedup subcommand --------------------------------------------------
192+
dedup_parser = subparsers.add_parser(
193+
"dedup",
194+
help="Remove duplicate sentences from a JSONL parse-results file",
195+
)
196+
dedup_parser.add_argument(
197+
"file",
198+
type=str,
199+
help="Path to a .jsonl parse-results file",
200+
)
201+
dedup_parser.add_argument(
202+
"-o",
203+
"--output",
204+
type=str,
205+
default=None,
206+
help="Output .jsonl path (required unless --in-place)",
207+
)
208+
dedup_parser.add_argument(
209+
"--in-place",
210+
action="store_true",
211+
help="Overwrite the input file in place",
212+
)
213+
191214
# Dynamically inject parser-specific args, derived from the active
192215
# parser's ``accepted_params()``. We do this in two passes so that
193216
# plugin packages stay the source of truth for their CLI surface.
@@ -225,6 +248,12 @@ def main() -> None:
225248
run_stats(args)
226249
sys.exit(0)
227250

251+
if args.command == "dedup":
252+
from hyperbase.cli.dedup import run_dedup
253+
254+
run_dedup(args)
255+
sys.exit(0)
256+
228257
if args.command == "repl":
229258
from hyperbase.cli.repl import run_repl
230259

0 commit comments

Comments
 (0)