Skip to content

Commit 2a526c1

Browse files
thisisnicjonkeane
andauthored
GH-35806: [R] Improve error message for null type inference with sparse CSV data (#49338)
### Rationale for this change When reading a CSV with sparse data (many missing values followed by actual values), Arrow can infer a column type as `null` based on the first block of data. When non-null values appear later, the error message incorrectly suggests using `skip = 1` for header rows, which is misleading. ### What changes are included in this PR? Adds a specific check for "conversion error to null" that provides a helpful message explaining the cause (type inference from sparse data) and the solution (change the block size to use for inference). ### Are these changes tested? Yes, added a test in `test-dataset-csv.R`. ### Are there any user-facing changes? Yes, improved error message when CSV type inference fails due to sparse data. --- This PR was authored by Claude (Opus 4.5) and reviewed by @ thisisnic. 🤖 Generated with [Claude Code](https://claude.ai/code) * GitHub Issue: #35806 Lead-authored-by: Nic Crane <thisisnic@gmail.com> Co-authored-by: Jonathan Keane <jkeane@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
1 parent 80db102 commit 2a526c1

2 files changed

Lines changed: 36 additions & 0 deletions

File tree

r/R/util.R

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,20 @@ repeat_value_as_array <- function(object, n) {
196196
}
197197

198198
handle_csv_read_error <- function(msg, call, schema) {
199+
# Dataset collection passes empty schema() when no explicit
200+
# CSV schema from the original call is available in this error path.
201+
if (grepl("conversion error to null", msg) && is_empty_schema(schema)) {
202+
msg <- c(
203+
msg,
204+
i = paste(
205+
"If you have not specified the schema, this error may be due to the column type being",
206+
"inferred as `null` because the first block of data contained only missing values.",
207+
"See `?csv_read_options` for how to set a larger value or specify a schema if you know the correct types."
208+
)
209+
)
210+
abort(msg, call = call)
211+
}
212+
199213
if (grepl("conversion error", msg) && inherits(schema, "Schema")) {
200214
msg <- c(
201215
msg,
@@ -290,3 +304,7 @@ col_type_from_compact <- function(x, y) {
290304
abort(paste0("Unsupported compact specification: '", x, "' for column '", y, "'"))
291305
)
292306
}
307+
308+
is_empty_schema <- function(x) {
309+
x == schema()
310+
}

r/tests/testthat/test-dataset-csv.R

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -711,3 +711,21 @@ test_that("open_dataset() with `decimal_point` argument", {
711711
tibble(x = 1.2, y = "c")
712712
)
713713
})
714+
715+
test_that("more informative error when column inferred as null due to sparse data (GH-35806)", {
716+
tf <- tempfile()
717+
on.exit(unlink(tf))
718+
719+
writeLines(c("x,y", paste0(1:100, ",")), tf)
720+
write("101,foo", tf, append = TRUE)
721+
722+
expect_error(
723+
open_dataset(
724+
tf,
725+
format = "csv",
726+
read_options = csv_read_options(block_size = 100L)
727+
) |>
728+
collect(),
729+
"column type being inferred as"
730+
)
731+
})

0 commit comments

Comments
 (0)