Skip to content

Commit e9bc521

Browse files
github-actions[bot]Repo AssistCopilot
authored
[Repo Assist] Add TypeInference documentation page for type inference and missing values (#1669)
* Add TypeInference.fsx documentation page for type inference and missing values - Add comprehensive docs/library/TypeInference.fsx covering: - Numeric type hierarchy (int -> int64 -> decimal -> float) - Boolean inference in CSV - Date/time type inference rules - Missing values in CSV (NaN, Nullable, option) with comparison table - null/missing properties in JSON -> option<T> - Missing attributes/elements in XML -> option<T> - Heterogeneous types - Design-time vs runtime behaviour - Summary table of inference-control parameters - Add link from docs/index.md to new TypeInference page Closes #347 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: trigger CI checks --------- Co-authored-by: Repo Assist <repo-assist@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent f019838 commit e9bc521

2 files changed

Lines changed: 279 additions & 0 deletions

File tree

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ to provide easy to use type-safe access to documents that follow the same struct
3737
* [HTML Type Provider](library/HtmlProvider.html) - discusses the `HtmlProvider<...>` type
3838
* [JSON Type Provider](library/JsonProvider.html) - discusses the `JsonProvider<..>` type
3939
* [XML Type Provider](library/XmlProvider.html) - discusses the `XmlProvider<..>` type
40+
* [Type Inference and Missing Values](library/TypeInference.html) - explains type inference rules, how missing/null values map to F# types, and how to control inference behaviour
4041

4142
The package also contains a type provider for accessing data from
4243
[the WorldBank](library/WorldBank.html).

docs/library/TypeInference.fsx

Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
(**
2+
---
3+
category: Type Providers
4+
categoryindex: 1
5+
index: 6
6+
---
7+
*)
8+
(*** condition: prepare ***)
9+
#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.Runtime.Utilities.dll"
10+
#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.Csv.Core.dll"
11+
#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.Json.Core.dll"
12+
#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.Http.dll"
13+
#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.dll"
14+
(*** condition: fsx ***)
15+
#if FSX
16+
#r "nuget: FSharp.Data,{{fsdocs-package-version}}"
17+
#endif
18+
(*** condition: ipynb ***)
19+
#if IPYNB
20+
#r "nuget: FSharp.Data,{{fsdocs-package-version}}"
21+
22+
Formatter.SetPreferredMimeTypesFor(typeof<obj>, "text/plain")
23+
Formatter.Register(fun (x: obj) (writer: TextWriter) -> fprintfn writer "%120A" x)
24+
#endif
25+
(**
26+
27+
# Type Inference and Missing Values
28+
29+
This page describes the **type inference rules** used by the FSharp.Data type providers
30+
([CSV](CsvProvider.html), [JSON](JsonProvider.html), [XML](XmlProvider.html) and [HTML](HtmlProvider.html)).
31+
Understanding these rules helps you know what F# types to expect for each property,
32+
and how to handle missing, null, or optional values at runtime.
33+
34+
## Overview
35+
36+
All FSharp.Data type providers infer types from a **sample document** (or a list of samples)
37+
at compile time (design time). The generated F# types reflect the structure of the sample.
38+
At runtime, any document with a compatible structure can be read — but the generated types
39+
are fixed by the sample.
40+
41+
A key principle: **the sample should be representative.** If a property is present in the
42+
sample but absent from runtime data, it can raise a `KeyNotFoundException`. Conversely,
43+
if runtime data contains new properties not in the sample, they are not accessible via the
44+
generated type (though they may still be reachable through the underlying `JsonValue`,
45+
`XElement`, etc.).
46+
47+
## Numeric Type Inference
48+
49+
When inferring numeric types, the providers prefer the most precise type that can represent
50+
all values. The preference order (most preferred first) is:
51+
52+
1. `int` – 32-bit signed integer
53+
2. `int64` – 64-bit signed integer
54+
3. `decimal` – exact decimal arithmetic (preferred for financial/monetary values)
55+
4. `float` – 64-bit floating point (used when `decimal` cannot represent the value,
56+
or when missing values appear in a CSV column that would otherwise be `decimal`)
57+
58+
If values in a column or array mix two types, the provider automatically promotes to the
59+
wider type. For example, a JSON array `[1, 2, 3.14]` will produce `decimal` values.
60+
*)
61+
62+
open FSharp.Data
63+
64+
// int is inferred when all values are integers
65+
type IntsOnly = JsonProvider<""" [1, 2, 3] """>
66+
67+
// decimal is inferred when any value has a fractional part
68+
type WithDecimal = JsonProvider<""" [1, 2, 3.14] """>
69+
70+
(*** include-fsi-merged-output ***)
71+
72+
(**
73+
## Boolean Inference (CSV)
74+
75+
In CSV files, columns whose values are exclusively drawn from the set
76+
`0`, `1`, `Yes`, `No`, `True`, `False` (case-insensitive) are inferred as `bool`.
77+
Any other values in the column cause it to be treated as a string.
78+
79+
## Date and Time Inference
80+
81+
The providers recognise date and time strings in standard ISO 8601 formats:
82+
83+
| Inferred Type | When Used | Example Value |
84+
|---|---|---|
85+
| `DateTime` | Date + time strings (default) | `"2023-06-15T12:00:00"` |
86+
| `DateTimeOffset` | Date + time + timezone offset | `"2023-06-15T12:00:00+02:00"` |
87+
| `DateOnly` (.NET 6+) | Date-only strings when `PreferDateOnly=true` | `"2023-06-15"` |
88+
| `TimeOnly` (.NET 6+) | Time-only strings when `PreferDateOnly=true` | `"12:00:00"` |
89+
90+
By default (`PreferDateOnly = false`), date-only strings such as `"2023-06-15"` are
91+
inferred as `DateTime` for backward compatibility. Set `PreferDateOnly = true` on
92+
.NET 6 and later to infer them as `DateOnly` instead.
93+
94+
If a column mixes `DateOnly` and `DateTime` values, they are unified to `DateTime`.
95+
96+
## Missing Values and Optionals
97+
98+
This is the most important topic for understanding how the providers behave at runtime.
99+
The rules differ slightly across providers.
100+
101+
### JSON Provider
102+
103+
In JSON, a property can be **absent** from an object, or its value can be **null** (`null` literal).
104+
Both cases are handled the same way by the JSON type provider:
105+
106+
- If a property is **missing in some samples**, it is inferred as `option<T>`.
107+
- If a property has a **null value** in some samples, it is inferred as `option<T>`.
108+
109+
This means `None` represents either a missing key or a `null` value at runtime.
110+
*)
111+
112+
// 'age' is missing from the second record → inferred as option<int>
113+
type People =
114+
JsonProvider<"""
115+
[ { "name":"Alice", "age":30 },
116+
{ "name":"Bob" } ] """>
117+
118+
for person in People.GetSamples() do
119+
printf "%s" person.Name
120+
121+
match person.Age with
122+
| Some age -> printfn " (age %d)" age
123+
| None -> printfn " (age unknown)"
124+
125+
(*** include-fsi-merged-output ***)
126+
127+
(**
128+
> **Important runtime note:** If a property is present and non-null in *all* samples, it will be
129+
> inferred as a non-optional type. If such a property is then absent or null in runtime data,
130+
> accessing it will throw a runtime exception. Use multiple samples (or `SampleIsList=true`)
131+
> to ensure optional properties are correctly modelled.
132+
133+
#### Null values in JSON
134+
135+
A JSON `null` value that appears as the value of a typed property is treated as `None`.
136+
A `null` value in a heterogeneous context (e.g. an array of numbers and nulls) is
137+
represented via the `option` mechanism on the generated accessor.
138+
139+
### CSV Provider
140+
141+
CSV files do not have a native null/missing concept. Instead, certain string values are
142+
treated as missing. By default, the following strings (case-insensitive) are recognised
143+
as missing: `NaN`, `NA`, `N/A`, `#N/A`, `:`, `-`, `TBA`, `TBD` (and empty string `""`).
144+
145+
You can override this list with the `MissingValues` static parameter.
146+
147+
When a column has at least one missing value, the inferred type changes as follows:
148+
149+
| Base type | With missing values (default) | With `PreferOptionals=true` |
150+
|---|---|---|
151+
| `int` | `Nullable<int>` (`int?`) | `int option` |
152+
| `int64` | `Nullable<int64>` (`int64?`) | `int64 option` |
153+
| `decimal` | `float` (using `Double.NaN`) | `float option` |
154+
| `float` | `float` (using `Double.NaN`) | `float option` |
155+
| `bool` | `bool option` | `bool option` |
156+
| `DateTime` | `DateTime option` | `DateTime option` |
157+
| `DateTimeOffset` | `DateTimeOffset option` | `DateTimeOffset option` |
158+
| `DateOnly` | `Nullable<DateOnly>` | `DateOnly option` |
159+
| `Guid` | `Guid option` | `Guid option` |
160+
| `string` | `string` (empty string `""` for missing) | `string option` |
161+
162+
The key differences between the default and `PreferOptionals=true`:
163+
- In the default mode, integers use `Nullable<T>` and decimals are widened to `float` with `Double.NaN`.
164+
- With `PreferOptionals=true`, **all** types use `T option` and you never get `Double.NaN` or `Nullable<T>`.
165+
- Strings are never made into `string option` by default (empty string represents missing); use
166+
`PreferOptionals=true` to get `string option`.
167+
168+
**Design-time safety:** If your sample file contains no missing values in a column, but you know
169+
that production data may have missing values, set `AssumeMissingValues=true` to force the provider
170+
to treat all columns as nullable/optional.
171+
*)
172+
173+
// With AssumeMissingValues=true, all columns become nullable/optional
174+
// even if the sample has no missing values
175+
type SafeCsv = CsvProvider<"A,B\n1,2\n3,4", AssumeMissingValues=true>
176+
177+
// With PreferOptionals=true, all columns use 'option' instead of Nullable or NaN
178+
type OptionalsCsv = CsvProvider<"A,B\n1,2\n3,4", PreferOptionals=true>
179+
180+
(*** include-fsi-merged-output ***)
181+
182+
(**
183+
184+
### XML Provider
185+
186+
In XML, values can be missing at the attribute or element level:
187+
188+
- If an **attribute** is present in some sample elements but absent in others, it is
189+
inferred as `option<T>`.
190+
- If a **child element** is present in some samples but not all, it is inferred as optional.
191+
- If an attribute or element is **never present** in the sample, it cannot be accessed via the
192+
generated type at all (use `XElement.Attribute(...)` dynamically in that case).
193+
194+
*)
195+
196+
// 'born' attribute missing from one author → option<int>
197+
type Authors =
198+
XmlProvider<"""
199+
<authors>
200+
<author name="Karl Popper" born="1902" />
201+
<author name="Thomas Kuhn" />
202+
</authors>
203+
""">
204+
205+
let sample = Authors.GetSample()
206+
207+
for author in sample.Authors do
208+
printf "%s" author.Name
209+
210+
match author.Born with
211+
| Some year -> printfn " (born %d)" year
212+
| None -> printfn ""
213+
214+
(*** include-fsi-merged-output ***)
215+
216+
(**
217+
> **Note:** If an attribute or element is absent from *all* sample data but present at
218+
> runtime, it cannot be accessed through the generated type. You must include at least
219+
> one occurrence (possibly with a dummy value) in the sample to have the provider
220+
> generate an optional property.
221+
222+
## Heterogeneous Types
223+
224+
Sometimes a property can hold values of different types. The JSON type provider handles
225+
this by generating a type with multiple optional accessors — one per observed type.
226+
*)
227+
228+
// Value can be int or string → generates .Number and .String accessors
229+
type HetValues = JsonProvider<""" [{"value":94}, {"value":"hello"}] """>
230+
231+
for item in HetValues.GetSamples() do
232+
match item.Value.Number, item.Value.String with
233+
| Some n, _ -> printfn "Number: %d" n
234+
| _, Some s -> printfn "String: %s" s
235+
| _ -> ()
236+
237+
(*** include-fsi-merged-output ***)
238+
239+
(**
240+
## Design-Time vs Runtime Behaviour
241+
242+
The type providers perform inference **at compile time** using the sample document.
243+
At runtime, the actual data is parsed against the inferred schema. This has a few
244+
important implications:
245+
246+
1. **Properties that are required at design-time may be missing at runtime.** If a
247+
property is always present and non-null in your sample, the provider generates a
248+
non-optional accessor. If runtime data omits that property, a `KeyNotFoundException`
249+
is thrown when you access it.
250+
251+
2. **New properties in runtime data are ignored.** If runtime JSON has extra keys that
252+
are not in the sample, those keys are simply not accessible via the generated type.
253+
254+
3. **The sample should cover the full range of variability.** Include examples of all
255+
optional properties and heterogeneous value types in your sample. Use `SampleIsList=true`
256+
for JSON/XML when the root is an array of samples.
257+
258+
4. **Runtime errors are lazy.** The providers do not validate the entire document on load.
259+
A missing or mistyped field only causes an error when that specific property is accessed.
260+
261+
## Summary of Inference-Control Parameters
262+
263+
The following static parameters let you override the default inference behaviour:
264+
265+
| Parameter | Providers | Effect |
266+
|---|---|---|
267+
| `PreferOptionals` | CSV, JSON, XML | Use `T option` for all missing/null values instead of `Nullable<T>` or `Double.NaN` |
268+
| `AssumeMissingValues` | CSV | Treat every column as nullable/optional even if the sample has no missing values |
269+
| `MissingValues` | CSV | Comma-separated list of strings to recognise as missing (replaces defaults) |
270+
| `InferRows` | CSV | Number of rows to use for type inference (default 1000; 0 = all rows) |
271+
| `SampleIsList` | JSON, XML | Treat the top-level array as a list of sample objects, not a single sample |
272+
| `PreferDateOnly` | CSV, JSON, XML | Infer date-only strings as `DateOnly` on .NET 6+ (default `false`) |
273+
| `InferenceMode` | JSON, XML | Enable inline schema annotations (`ValuesAndInlineSchemasHints` or `ValuesAndInlineSchemasOverrides`) |
274+
| `Schema` | CSV | Override column names and/or types directly |
275+
276+
For full details on each parameter, see the individual provider documentation:
277+
[CSV](CsvProvider.html) · [JSON](JsonProvider.html) · [XML](XmlProvider.html) · [HTML](HtmlProvider.html)
278+
*)

0 commit comments

Comments
 (0)