Skip to content

Commit 2a7a7a9

Browse files
userFRMclaude
andauthored
fix(flatfiles): reject zero-column and over-width header drift in block decode (#1101)
decode_block silently emitted zero rows for a header declaring zero columns over a non-empty DATA block, and silently clipped rows carrying more fields than the header column count. Both now return a typed decode error, matching the FPSS delta path width guard and the existing mid-row truncation guard so a drifted header fails loud. Co-authored-by: preview <noreply@anthropic.com>
1 parent 47c57ab commit 2a7a7a9

3 files changed

Lines changed: 47 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111

1212
- **Projected historical frames keep the trading `date`.** A response whose wire sends one `Timestamp` header split into a time-of-day field and `date` (every EOD and trade/quote/greeks history endpoint) no longer drops `date` from the Arrow / Polars frame, so rows spanning multiple days are distinguishable instead of collapsing to a near-constant time-of-day.
1313
- **Projected snapshot frames keep the per-row `symbol`.** A multi-symbol snapshot response no longer labels every row with the first row's symbol; the broadcast symbol column is emitted only when the response's `symbol` is provably constant across all rows.
14+
- **Flat-files block decode fails loud on two drifted-header shapes.** A header declaring zero columns over a non-empty DATA block, and a row carrying more fields than the header's column count, now return a typed decode error instead of silently emitting zero rows or clipping the surplus field. This matches the FPSS delta path's width guard and the existing mid-row truncation guard.
1415

1516
## [13.0.0-rc.13] - 2026-07-02
1617

docs-site/docs/changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111

1212
- **Projected historical frames keep the trading `date`.** A response whose wire sends one `Timestamp` header split into a time-of-day field and `date` (every EOD and trade/quote/greeks history endpoint) no longer drops `date` from the Arrow / Polars frame, so rows spanning multiple days are distinguishable instead of collapsing to a near-constant time-of-day.
1313
- **Projected snapshot frames keep the per-row `symbol`.** A multi-symbol snapshot response no longer labels every row with the first row's symbol; the broadcast symbol column is emitted only when the response's `symbol` is provably constant across all rows.
14+
- **Flat-files block decode fails loud on two drifted-header shapes.** A header declaring zero columns over a non-empty DATA block, and a row carrying more fields than the header's column count, now return a typed decode error instead of silently emitting zero rows or clipping the surplus field. This matches the FPSS delta path's width guard and the existing mid-row truncation guard.
1415

1516
## [13.0.0-rc.13] - 2026-07-02
1617

thetadatadx-rs/src/flatfiles/decode.rs

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,17 @@ pub(crate) fn decode_block(
3232
out: &mut Vec<Vec<i32>>,
3333
) -> Result<(), Error> {
3434
out.clear();
35-
if block.is_empty() || n_columns == 0 {
35+
if block.is_empty() {
3636
return Ok(());
3737
}
38+
if n_columns == 0 {
39+
// A zero-column schema over non-empty DATA is a drifted header: every
40+
// row would decode to nothing while the block still carries bytes.
41+
// Fail loud rather than emit zero rows, matching the truncation guard.
42+
return Err(Error::decode_codec(
43+
"flatfiles: zero-column header with non-empty data",
44+
));
45+
}
3846

3947
let mut reader = FitReader::new(block);
4048
let mut prev: Vec<i32> = vec![0; n_columns];
@@ -55,6 +63,15 @@ pub(crate) fn decode_block(
5563
"flatfiles: FIT block truncated mid-row",
5664
));
5765
}
66+
if n > n_columns {
67+
// Row carries more fields than the blob's column schema, so the
68+
// FIT reader already dropped the surplus into a silent clip. A
69+
// wider row is a drifted header; reject it rather than emit a
70+
// truncated row, matching the FPSS delta path's width guard.
71+
return Err(Error::decode_codec(
72+
"flatfiles: FIT row wider than header column count",
73+
));
74+
}
5875
if reader.is_date {
5976
// DATE marker row — no user-visible data. Vendor's writer
6077
// skips DATE rows before they reach `toCSV2`, so we do too.
@@ -165,6 +182,33 @@ mod tests {
165182
);
166183
}
167184

185+
#[test]
186+
fn zero_column_header_with_data_is_rejected() {
187+
// A header claiming zero columns over a non-empty DATA block is a
188+
// drifted schema: silently decoding to zero rows would hide the drift.
189+
let buf = vec![pack(1, END)];
190+
let mut out = Vec::new();
191+
let err = decode_block(&buf, 0, &mut out).unwrap_err();
192+
assert!(
193+
err.to_string().contains("zero-column"),
194+
"expected a zero-column decode error, got: {err}"
195+
);
196+
}
197+
198+
#[test]
199+
fn over_width_row_is_rejected() {
200+
// "12,34<END>" carries 2 fields against a 1-column schema: the extra
201+
// field would be silently clipped. The FPSS delta path rejects the
202+
// same width drift; decode_block must too.
203+
let buf = vec![pack(1, 2), pack(FIELD_SEP, 3), pack(4, END)];
204+
let mut out = Vec::new();
205+
let err = decode_block(&buf, 1, &mut out).unwrap_err();
206+
assert!(
207+
err.to_string().contains("wider"),
208+
"expected an over-width decode error, got: {err}"
209+
);
210+
}
211+
168212
#[test]
169213
fn date_marker_is_skipped() {
170214
// DATE marker row, then absolute "7"

0 commit comments

Comments
 (0)