Skip to content

Commit b79994f

Browse files
committed
Add --radix flag, tracing, strict parsing, and broader test coverage
This change establishes a strong correctness contract: **a clean run with no warnings means every input value was successfully parsed and counted**. It introduces a more flexible `--radix` flag, replaces the old `-x` shortcut, and migrates internal logging to `tracing`. ## Breaking changes - **Removed `-x, --hex` flag.** Replaced by `--radix <auto|hex|decimal>` (default: `auto`). To preserve old behavior, use `--radix=hex`. - **`--radix=hex` is strict.** With this flag (or a `0x` prefix), values that fail to parse as hex no longer silently fall back to float -- they warn and contribute `0`. - **New `--radix=decimal`** mode rejects `0x`-prefixed values entirely, useful when input may legitimately contain `0x`-prefixed strings that are not numbers. - **Integer overflow now panics in release builds**, not just debug. Uses `checked_add` to make the behavior consistent across profiles. ## Behavior changes (warnings) Every silently-skipped value now emits a warning, so users can trust that the absence of warnings means a correct sum: - Failed parses (any non-numeric value) now warn. - Field index out of range now warns and skips the line. - Comma stripping (`"1,000"` -> `"1000"`) now warns, surfacing potential locale mismatches (`"1,5"` -> `"15"` is silent no longer). - Large integers that overflow `i128` and fall back to `f64` now warn about possible precision loss. ## Internal changes - Replaced `log` + `env_logger` with `tracing` + `tracing-subscriber` (structured logging, better field formatting). `RUST_LOG` continues to work. - Replaced `std::fs` with `fs-err` for clearer file-open error messages (now includes the path). - Extracted a self-contained `parse_value(&str, Radix)` function with strict-hex semantics. - Various readability cleanups: iterator-based reader setup, flattened reader/line loops, `let-else` for field extraction, simplified verbose output. ## Tests - Added 11 unit tests for `parse_value` covering decimal/hex, integer/float, negatives, scientific notation, hex strictness, invalid input, empty string, and overflow. - Added integration tests for: single-file/multi-file input, nonexistent files, field 0, out-of-range field, negative integers/floats/mixed, comma-formatted numbers, invalid `0x` prefix, large-integer overflow warning, strict hex mode, `--radix=decimal` mode, empty input. ## Verbose output (`-v`) changes - `radix=` now shows `Hex`/`Decimal` instead of `16`/`10`. - Removed `cnt=` field (was unused outside verbose output). - `err=` now contains the warning message rather than the raw `ParseFloatError` debug string. ## Documentation - README updated for the new `--radix` flag, accurate warning examples, and the strict-hex contract.
1 parent 8afa30a commit b79994f

5 files changed

Lines changed: 452 additions & 143 deletions

File tree

Cargo.toml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,13 @@ documentation = "https://docs.rs/crate/sumcol/latest"
1313

1414
[dependencies]
1515
clap = { version = "4.4.7", features = ["derive"] }
16+
fs-err = "2"
1617
colored = "2.1.0"
17-
env_logger = "0.10.1"
18-
log = "0.4.20"
18+
tracing = "0.1"
19+
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
1920
regex = "1.10.2"
2021

2122
[dev-dependencies]
2223
assert_cmd = "1"
2324
predicates = "3"
25+
tempfile = "3"

README.md

Lines changed: 39 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@
66
`sumcol` is a simple unix-style command-line tool for summing numbers from a
77
column of text. It's a replacement for the tried and true Unix-isms, like `awk
88
'{s += $3} END {print s}'` (prints the sum of the numbers in the third
9-
whitespace delimited column), without all the verbosity.
9+
whitespace delimited column), without all the verbosity. `sumcol` tries to be
10+
smart and interpret hex, float, and decimal values automatically, though you
11+
can force the radix with the `--radix` flag.
1012

1113
## Quick Install
1214
```console
13-
$ cargo install sumcol
15+
$ cargo install --locked sumcol
1416
```
1517

1618
## Examples
@@ -32,10 +34,10 @@ Arguments:
3234

3335
Options:
3436
-f, --field <FIELD> The field to sum. If not specified, uses the full line [default: 0]
35-
-x, --hex Treat all numbers as hex, not just those with a leading 0x
37+
--radix <RADIX> How to interpret numeric input [default: auto] [possible values: auto, hex, decimal]
3638
-d, --delimiter <DELIMITER> The regex on which to split fields [default: \s+]
3739
-v, --verbose Print each number that's being summed, along with some metadata
38-
-h, --help Print help
40+
-h, --help Print help (see more with '--help')
3941
-V, --version Print version
4042
```
4143

@@ -56,17 +58,20 @@ The size is shown in column -- or field -- number 5 (starting from 1), so we can
5658

5759
```console
5860
$ ls -l | sumcol -f5
61+
WARN sumcol: Field index out of range, skipping field=5 line="total 48"
5962
17469
6063
```
61-
Which is equivalent to (but shorter than) the classic awk incantation:
64+
The warning is from the `total 48` summary line which doesn't have a fifth
65+
field; it's safely skipped and the sum is still correct. Equivalent to (but
66+
shorter than) the classic awk incantation:
6267
```console
6368
$ ls -l | awk '{s += $5} END {print s}'
6469
17469
6570
```
6671

6772
### Sum all input
6873

69-
Sometimes you use other tools to extact a column of numbers, in which case you
74+
Sometimes you use other tools to extract a column of numbers, in which case you
7075
can still use sumcol with no arguments to simply sum all of the input. Using
7176
the file listing from above, we could do the following:
7277

@@ -78,10 +83,10 @@ $ ls -l | awk '{print $5}' | sumcol
7883
### Summing hex numbers
7984

8085
Programmers are often dealing with numbers written in hex. Typically in forms
81-
like `0x123abc` or even simply `0000abcd`. When `sumcol` sees a number starting
82-
with `0x` it always assumes it's written in hex and parses it accordingly.
83-
However, a hex number written without that prefix requires that we tell sumcol
84-
to use hex.
86+
like `0x123abc` or even simply `0000abcd`. By default, when `sumcol` sees a
87+
number starting with `0x` it assumes it's written in hex and parses it
88+
accordingly. However, a hex number written without that prefix requires that we
89+
tell sumcol to use hex via `--radix=hex`.
8590

8691
For this example we'll sum the sizes of each section in the compiled `sumcol`
8792
binary. We can see this information with the `objdump` command.
@@ -161,47 +166,40 @@ LOAD,
161166
00000148
162167
```
163168

164-
Yuck. That has numbers, and non-numbers. Luckily, `sumcol` will easily handle
165-
this! It quietly ignores non-numbers treating them as if they're a `0`. So
166-
let's see what answer we get:
169+
Yuck. That has numbers, and non-numbers. The numeric values are hex without a
170+
`0x` prefix, so we need to pass `--radix=hex` to tell `sumcol` to parse them as
171+
hex. Non-numeric tokens (table headers, comma-separated description tags) will
172+
emit warnings and be treated as `0`:
167173

168174
```console
169-
$ objdump -h target/release/sumcol | sumcol -f3
170-
[2023-11-10T21:02:06Z WARN sumcol] Failed to parse "0014c350". Consider using -x
171-
[2023-11-10T21:02:06Z WARN sumcol] Failed to parse "000003b4". Consider using -x
172-
[2023-11-10T21:02:06Z WARN sumcol] Failed to parse "0004f458". Consider using -x
173-
[2023-11-10T21:02:06Z WARN sumcol] Failed to parse "0000cae8". Consider using -x
174-
[2023-11-10T21:02:06Z WARN sumcol] Failed to parse "000087c8". Consider using -x
175-
[2023-11-10T21:02:06Z WARN sumcol] Failed to parse "0002e5e0". Consider using -x
176-
[2023-11-10T21:02:06Z WARN sumcol] Failed to parse "0002c9c0". Consider using -x
177-
732
175+
$ objdump -h target/release/sumcol | sumcol -f3 --radix=hex
176+
WARN sumcol: Failed to parse as hex, treating as 0 clean_str="format"
177+
WARN sumcol: Field index out of range, skipping field=3 line="Sections:"
178+
WARN sumcol: Failed to parse as hex, treating as 0 clean_str="Size"
179+
WARN sumcol: Stripped commas from value original="LOAD," clean="LOAD"
180+
WARN sumcol: Failed to parse as hex, treating as 0 clean_str="LOAD"
181+
... (similar warnings for each header and description line) ...
182+
0x20C3AC
178183
```
179184

180-
Interesting. Sumcol quietly ignores non-numbers like `LOAD` in the above
181-
example, but here it's warning us that it's seeing strings that _look like_ hex
182-
numbers but we didn't tell it to parse the numbers as hex. Let's try again
183-
following the recommendation to use `-x`.
185+
The warnings here are expected and benign -- `format`, `Size`, `LOAD,`, etc. are
186+
not hex values and contribute `0` to the sum, so the final answer is correct.
184187

185-
```console
186-
$ objdump -h target/release/sumcol | sumcol -f3 -x
187-
0x20C3AC
188-
```
189-
NOTE: If the hex numbers started with a leading `'0x`, `sumcol` would have
190-
silently parsed them correctly and omitted the warning.
188+
If the values had been written with a `0x` prefix, `sumcol` would have
189+
auto-detected them as hex with no flag needed.
191190

192191
## Debugging
193192

194193
If `sumcol` doesn't seem to be working right, feel free to look at the code on
195194
github (it's pretty straight forward), or run it with the `-v` or `--verbose`
196-
flag, or even enable the `RUST_LOG=debug` environment variable set. For
197-
example:
195+
flag, or run with the `RUST_LOG=debug` environment variable set. For example:
198196

199-
```console:
197+
```console
200198
$ printf "1\n2.5\nOOPS\n3" | sumcol -v
201-
1 # n=Integer(1) sum=Integer(1) cnt=1 radix=10 raw_str="1"
202-
2.5 # n=Float(2.5) sum=Float(3.5) cnt=2 radix=10 raw_str="2.5"
203-
0 # n=Integer(0) sum=Float(3.5) cnt=2 radix=10 raw_str="OOPS" err="ParseFloatError { kind: Invalid }"
204-
3 # n=Integer(3) sum=Float(6.5) cnt=3 radix=10 raw_str="3"
199+
1 # n=Integer(1) sum=Integer(1) radix=Decimal raw_str="1"
200+
2.5 # n=Float(2.5) sum=Float(3.5) radix=Decimal raw_str="2.5"
201+
0 # n=Integer(0) sum=Float(3.5) radix=Decimal raw_str="OOPS" err="Failed to parse (use --radix=hex if hex), treating as 0"
202+
3 # n=Integer(3) sum=Float(6.5) radix=Decimal raw_str="3"
205203
==
206204
6.5
207205
```
@@ -212,10 +210,9 @@ The metadata that's displayed on each line is
212210
|------|-------------|
213211
| `n` | The parsed numeric value |
214212
| `sum` | The running sum up to and including the current `n` |
215-
| `cnt` | The running count of _successfully_ parsed numbers. If a number fails to parse and 0 is used instead, it will not be included in `cnt` |
216-
| `radix` | The radix used when trying to parse the number as an integer |
213+
| `radix` | The effective radix used when parsing the value (`Hex` or `Decimal`) |
217214
| `raw_str` | The raw string data that was parsed |
218-
| `err` | If present, this shows the error from trying to parse the string into a number |
215+
| `err` | If present, the warning message from a failed parse |
219216

220217
This should be enough to help you debug the problem you're seeing. However, if
221218
that's not enough, give it a try with `RUST_LOG=debug`.

src/lib.rs

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,9 @@ impl Add for Sum {
1616
/// Adds two Sums. If either is a Float, the result will be a Float.
1717
fn add(self, other: Self) -> Self {
1818
match (self, other) {
19-
(Sum::Integer(a), Sum::Integer(b)) => Sum::Integer(a + b),
19+
(Sum::Integer(a), Sum::Integer(b)) => {
20+
Sum::Integer(a.checked_add(b).expect("integer overflow"))
21+
}
2022
(Sum::Float(a), Sum::Float(b)) => Sum::Float(a + b),
2123
(Sum::Integer(a), Sum::Float(b)) => Sum::Float(a as f64 + b),
2224
(Sum::Float(a), Sum::Integer(b)) => Sum::Float(a + b as f64),
@@ -78,4 +80,22 @@ mod tests {
7880
assert_eq!(a + b, Sum::Float(1.2));
7981
assert_eq!(b + a, Sum::Float(1.2));
8082
}
83+
84+
#[test]
85+
fn sum_mixed_add_assign_works() {
86+
let mut a = Sum::Integer(1);
87+
a += Sum::Float(0.2);
88+
assert_eq!(a, Sum::Float(1.2));
89+
90+
let mut b = Sum::Float(0.2);
91+
b += Sum::Integer(1);
92+
assert_eq!(b, Sum::Float(1.2));
93+
}
94+
95+
#[test]
96+
#[should_panic]
97+
fn sum_integer_overflow_panics() {
98+
let mut a = Sum::Integer(i128::MAX);
99+
a += Sum::Integer(1);
100+
}
81101
}

0 commit comments

Comments
 (0)