Skip to content

Commit 3ba62bb

Browse files
github-actions[bot]Repo AssistCopilotdsyme
authored
[Repo Assist] Fix #1439: InferRows counts CSV rows, not text lines (multiline quoted fields) (#1625)
* Update test/build dependencies: FAKE 6.1.4, NUnit 3.13.3, FsUnit 4.2.0, FsCheck 2.16.6 - FAKE packages: 6.1.3 → 6.1.4 (patch) - NUnit: 3.13.1 → 3.13.3 (patch) - FsUnit: 4.0.4 → 4.2.0 (minor) - FsCheck: 2.15.1 → 2.16.6 (minor) Build: passes (0 errors) Tests: all offline tests pass; network tests skip due to sandbox Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix #1439: InferRows now counts CSV rows not text lines for multiline quoted fields The maxNumberOfRows text-truncation in parseTextAtDesignTime counted text lines using reader.ReadLine(), which broke CSV files where a single data row spans multiple text lines due to quoted fields (e.g. "multi-\nline",2). Fix: pass None as maxNumberOfRows so the raw text is never pre-truncated. Row-count limiting is already handled correctly by InferColumnTypes via Seq.truncate inferRows - this has always been the authoritative row limit. The performance cost is reading the full sample file as a string; this is the same cost as all other providers (XmlProvider, JsonProvider) which also pass None here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Repo Assist <repo-assist@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Don Syme <dsyme@users.noreply.github.com> Co-authored-by: Don Syme <dsyme@github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent e32127a commit 3ba62bb

3 files changed

Lines changed: 25 additions & 4 deletions

File tree

src/FSharp.Data.DesignTime/Csv/CsvProvider.fs

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -202,10 +202,12 @@ type public CsvProvider(cfg: TypeProviderConfig) as this =
202202
CreateFromTextReaderForSampleList = fun _ -> failwith "Not Applicable"
203203
CreateFromValue = None }
204204

205-
let maxNumberOfRows = if inferRows > 0 then Some inferRows else None
206-
207205
// On the CsvProvider the schema might be partial and we will still infer from the sample
208-
// So we handle it in a custom way
206+
// So we handle it in a custom way.
207+
// Note: we pass None for maxNumberOfRows so that the raw text is never truncated by
208+
// line count. Truncating by line is incorrect for CSV because a single data row can span
209+
// multiple text lines when fields are quoted (see issue #1439). Row limiting during
210+
// inference is handled correctly by InferColumnTypes via its own inferRows parameter.
209211
generateType
210212
"CSV"
211213
(if sample <> "" then Sample sample else Schema schema)
@@ -216,7 +218,7 @@ type public CsvProvider(cfg: TypeProviderConfig) as this =
216218
resolutionFolder
217219
resource
218220
typeName
219-
maxNumberOfRows
221+
None
220222

221223
// Add static parameter that specifies the API we want to get (compile-time)
222224
let parameters =

tests/FSharp.Data.Tests/CsvProvider.fs

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -705,3 +705,18 @@ let ``Can infer from a multiline schema`` () =
705705
csv.NumberOfColumns |> should equal 16
706706
firstRow.OrderCreated |> should equal "2022-01-01 10:00:00"
707707
firstRow.FioFull |> should equal "John Smith"
708+
709+
// Regression test for issue #1439: InferRows must count CSV rows, not text lines.
710+
// A multiline quoted field occupies 2 text lines but is only 1 data row.
711+
// With InferRows=2, both data rows should be accessible (the first spans 2 lines).
712+
type MultilineFieldsCsv = CsvProvider<"Data/MultilineFields.csv", InferRows=2>
713+
714+
[<Test>]
715+
let ``InferRows counts CSV rows not text lines for multiline quoted fields`` () =
716+
let csv = MultilineFieldsCsv.GetSample()
717+
let rows = csv.Rows |> Seq.toArray
718+
rows.Length |> should equal 2
719+
rows.[0].F1 |> should equal "multi-\nline field"
720+
rows.[0].F2 |> should equal 2
721+
rows.[1].F1 |> should equal "normal"
722+
rows.[1].F2 |> should equal 3
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
f1,f2
2+
"multi-
3+
line field",2
4+
normal,3

0 commit comments

Comments
 (0)