|
| 1 | +(** |
| 2 | +--- |
| 3 | +category: Type Providers |
| 4 | +categoryindex: 1 |
| 5 | +index: 6 |
| 6 | +--- |
| 7 | +*) |
| 8 | +(*** condition: prepare ***) |
| 9 | +#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.Runtime.Utilities.dll" |
| 10 | +#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.Csv.Core.dll" |
| 11 | +#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.Json.Core.dll" |
| 12 | +#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.Http.dll" |
| 13 | +#r "../../src/FSharp.Data/bin/Release/netstandard2.0/FSharp.Data.dll" |
| 14 | +(*** condition: fsx ***) |
| 15 | +#if FSX |
| 16 | +#r "nuget: FSharp.Data,{{fsdocs-package-version}}" |
| 17 | +#endif |
| 18 | +(*** condition: ipynb ***) |
| 19 | +#if IPYNB |
| 20 | +#r "nuget: FSharp.Data,{{fsdocs-package-version}}" |
| 21 | + |
| 22 | +Formatter.SetPreferredMimeTypesFor(typeof<obj>, "text/plain") |
| 23 | +Formatter.Register(fun (x: obj) (writer: TextWriter) -> fprintfn writer "%120A" x) |
| 24 | +#endif |
| 25 | +(** |
| 26 | +
|
| 27 | +# Type Inference and Missing Values |
| 28 | +
|
| 29 | +This page describes the **type inference rules** used by the FSharp.Data type providers |
| 30 | +([CSV](CsvProvider.html), [JSON](JsonProvider.html), [XML](XmlProvider.html) and [HTML](HtmlProvider.html)). |
| 31 | +Understanding these rules helps you know what F# types to expect for each property, |
| 32 | +and how to handle missing, null, or optional values at runtime. |
| 33 | +
|
| 34 | +## Overview |
| 35 | +
|
| 36 | +All FSharp.Data type providers infer types from a **sample document** (or a list of samples) |
| 37 | +at compile time (design time). The generated F# types reflect the structure of the sample. |
| 38 | +At runtime, any document with a compatible structure can be read — but the generated types |
| 39 | +are fixed by the sample. |
| 40 | +
|
| 41 | +A key principle: **the sample should be representative.** If a property is present in the |
| 42 | +sample but absent from runtime data, it can raise a `KeyNotFoundException`. Conversely, |
| 43 | +if runtime data contains new properties not in the sample, they are not accessible via the |
| 44 | +generated type (though they may still be reachable through the underlying `JsonValue`, |
| 45 | +`XElement`, etc.). |
| 46 | +
|
| 47 | +## Numeric Type Inference |
| 48 | +
|
| 49 | +When inferring numeric types, the providers prefer the most precise type that can represent |
| 50 | +all values. The preference order (most preferred first) is: |
| 51 | +
|
| 52 | +1. `int` – 32-bit signed integer |
| 53 | +2. `int64` – 64-bit signed integer |
| 54 | +3. `decimal` – exact decimal arithmetic (preferred for financial/monetary values) |
| 55 | +4. `float` – 64-bit floating point (used when `decimal` cannot represent the value, |
| 56 | + or when missing values appear in a CSV column that would otherwise be `decimal`) |
| 57 | +
|
| 58 | +If values in a column or array mix two types, the provider automatically promotes to the |
| 59 | +wider type. For example, a JSON array `[1, 2, 3.14]` will produce `decimal` values. |
| 60 | +*) |
| 61 | + |
| 62 | +open FSharp.Data |
| 63 | + |
| 64 | +// int is inferred when all values are integers |
| 65 | +type IntsOnly = JsonProvider<""" [1, 2, 3] """> |
| 66 | + |
| 67 | +// decimal is inferred when any value has a fractional part |
| 68 | +type WithDecimal = JsonProvider<""" [1, 2, 3.14] """> |
| 69 | + |
| 70 | +(*** include-fsi-merged-output ***) |
| 71 | + |
| 72 | +(** |
| 73 | +## Boolean Inference (CSV) |
| 74 | +
|
| 75 | +In CSV files, columns whose values are exclusively drawn from the set |
| 76 | +`0`, `1`, `Yes`, `No`, `True`, `False` (case-insensitive) are inferred as `bool`. |
| 77 | +Any other values in the column cause it to be treated as a string. |
| 78 | +
|
| 79 | +## Date and Time Inference |
| 80 | +
|
| 81 | +The providers recognise date and time strings in standard ISO 8601 formats: |
| 82 | +
|
| 83 | +| Inferred Type | When Used | Example Value | |
| 84 | +|---|---|---| |
| 85 | +| `DateTime` | Date + time strings (default) | `"2023-06-15T12:00:00"` | |
| 86 | +| `DateTimeOffset` | Date + time + timezone offset | `"2023-06-15T12:00:00+02:00"` | |
| 87 | +| `DateOnly` (.NET 6+) | Date-only strings when `PreferDateOnly=true` | `"2023-06-15"` | |
| 88 | +| `TimeOnly` (.NET 6+) | Time-only strings when `PreferDateOnly=true` | `"12:00:00"` | |
| 89 | +
|
| 90 | +By default (`PreferDateOnly = false`), date-only strings such as `"2023-06-15"` are |
| 91 | +inferred as `DateTime` for backward compatibility. Set `PreferDateOnly = true` on |
| 92 | +.NET 6 and later to infer them as `DateOnly` instead. |
| 93 | +
|
| 94 | +If a column mixes `DateOnly` and `DateTime` values, they are unified to `DateTime`. |
| 95 | +
|
| 96 | +## Missing Values and Optionals |
| 97 | +
|
| 98 | +This is the most important topic for understanding how the providers behave at runtime. |
| 99 | +The rules differ slightly across providers. |
| 100 | +
|
| 101 | +### JSON Provider |
| 102 | +
|
| 103 | +In JSON, a property can be **absent** from an object, or its value can be **null** (`null` literal). |
| 104 | +Both cases are handled the same way by the JSON type provider: |
| 105 | +
|
| 106 | +- If a property is **missing in some samples**, it is inferred as `option<T>`. |
| 107 | +- If a property has a **null value** in some samples, it is inferred as `option<T>`. |
| 108 | +
|
| 109 | +This means `None` represents either a missing key or a `null` value at runtime. |
| 110 | +*) |
| 111 | + |
| 112 | +// 'age' is missing from the second record → inferred as option<int> |
| 113 | +type People = |
| 114 | + JsonProvider<""" |
| 115 | + [ { "name":"Alice", "age":30 }, |
| 116 | + { "name":"Bob" } ] """> |
| 117 | + |
| 118 | +for person in People.GetSamples() do |
| 119 | + printf "%s" person.Name |
| 120 | + |
| 121 | + match person.Age with |
| 122 | + | Some age -> printfn " (age %d)" age |
| 123 | + | None -> printfn " (age unknown)" |
| 124 | + |
| 125 | +(*** include-fsi-merged-output ***) |
| 126 | + |
| 127 | +(** |
| 128 | +> **Important runtime note:** If a property is present and non-null in *all* samples, it will be |
| 129 | +> inferred as a non-optional type. If such a property is then absent or null in runtime data, |
| 130 | +> accessing it will throw a runtime exception. Use multiple samples (or `SampleIsList=true`) |
| 131 | +> to ensure optional properties are correctly modelled. |
| 132 | +
|
| 133 | +#### Null values in JSON |
| 134 | +
|
| 135 | +A JSON `null` value that appears as the value of a typed property is treated as `None`. |
| 136 | +A `null` value in a heterogeneous context (e.g. an array of numbers and nulls) is |
| 137 | +represented via the `option` mechanism on the generated accessor. |
| 138 | +
|
| 139 | +### CSV Provider |
| 140 | +
|
| 141 | +CSV files do not have a native null/missing concept. Instead, certain string values are |
| 142 | +treated as missing. By default, the following strings (case-insensitive) are recognised |
| 143 | +as missing: `NaN`, `NA`, `N/A`, `#N/A`, `:`, `-`, `TBA`, `TBD` (and empty string `""`). |
| 144 | +
|
| 145 | +You can override this list with the `MissingValues` static parameter. |
| 146 | +
|
| 147 | +When a column has at least one missing value, the inferred type changes as follows: |
| 148 | +
|
| 149 | +| Base type | With missing values (default) | With `PreferOptionals=true` | |
| 150 | +|---|---|---| |
| 151 | +| `int` | `Nullable<int>` (`int?`) | `int option` | |
| 152 | +| `int64` | `Nullable<int64>` (`int64?`) | `int64 option` | |
| 153 | +| `decimal` | `float` (using `Double.NaN`) | `float option` | |
| 154 | +| `float` | `float` (using `Double.NaN`) | `float option` | |
| 155 | +| `bool` | `bool option` | `bool option` | |
| 156 | +| `DateTime` | `DateTime option` | `DateTime option` | |
| 157 | +| `DateTimeOffset` | `DateTimeOffset option` | `DateTimeOffset option` | |
| 158 | +| `DateOnly` | `Nullable<DateOnly>` | `DateOnly option` | |
| 159 | +| `Guid` | `Guid option` | `Guid option` | |
| 160 | +| `string` | `string` (empty string `""` for missing) | `string option` | |
| 161 | +
|
| 162 | +The key differences between the default and `PreferOptionals=true`: |
| 163 | +- In the default mode, integers use `Nullable<T>` and decimals are widened to `float` with `Double.NaN`. |
| 164 | +- With `PreferOptionals=true`, **all** types use `T option` and you never get `Double.NaN` or `Nullable<T>`. |
| 165 | +- Strings are never made into `string option` by default (empty string represents missing); use |
| 166 | + `PreferOptionals=true` to get `string option`. |
| 167 | +
|
| 168 | +**Design-time safety:** If your sample file contains no missing values in a column, but you know |
| 169 | +that production data may have missing values, set `AssumeMissingValues=true` to force the provider |
| 170 | +to treat all columns as nullable/optional. |
| 171 | +*) |
| 172 | + |
| 173 | +// With AssumeMissingValues=true, all columns become nullable/optional |
| 174 | +// even if the sample has no missing values |
| 175 | +type SafeCsv = CsvProvider<"A,B\n1,2\n3,4", AssumeMissingValues=true> |
| 176 | + |
| 177 | +// With PreferOptionals=true, all columns use 'option' instead of Nullable or NaN |
| 178 | +type OptionalsCsv = CsvProvider<"A,B\n1,2\n3,4", PreferOptionals=true> |
| 179 | + |
| 180 | +(*** include-fsi-merged-output ***) |
| 181 | + |
| 182 | +(** |
| 183 | +
|
| 184 | +### XML Provider |
| 185 | +
|
| 186 | +In XML, values can be missing at the attribute or element level: |
| 187 | +
|
| 188 | +- If an **attribute** is present in some sample elements but absent in others, it is |
| 189 | + inferred as `option<T>`. |
| 190 | +- If a **child element** is present in some samples but not all, it is inferred as optional. |
| 191 | +- If an attribute or element is **never present** in the sample, it cannot be accessed via the |
| 192 | + generated type at all (use `XElement.Attribute(...)` dynamically in that case). |
| 193 | +
|
| 194 | +*) |
| 195 | + |
| 196 | +// 'born' attribute missing from one author → option<int> |
| 197 | +type Authors = |
| 198 | + XmlProvider<""" |
| 199 | + <authors> |
| 200 | + <author name="Karl Popper" born="1902" /> |
| 201 | + <author name="Thomas Kuhn" /> |
| 202 | + </authors> |
| 203 | + """> |
| 204 | + |
| 205 | +let sample = Authors.GetSample() |
| 206 | + |
| 207 | +for author in sample.Authors do |
| 208 | + printf "%s" author.Name |
| 209 | + |
| 210 | + match author.Born with |
| 211 | + | Some year -> printfn " (born %d)" year |
| 212 | + | None -> printfn "" |
| 213 | + |
| 214 | +(*** include-fsi-merged-output ***) |
| 215 | + |
| 216 | +(** |
| 217 | +> **Note:** If an attribute or element is absent from *all* sample data but present at |
| 218 | +> runtime, it cannot be accessed through the generated type. You must include at least |
| 219 | +> one occurrence (possibly with a dummy value) in the sample to have the provider |
| 220 | +> generate an optional property. |
| 221 | +
|
| 222 | +## Heterogeneous Types |
| 223 | +
|
| 224 | +Sometimes a property can hold values of different types. The JSON type provider handles |
| 225 | +this by generating a type with multiple optional accessors — one per observed type. |
| 226 | +*) |
| 227 | + |
| 228 | +// Value can be int or string → generates .Number and .String accessors |
| 229 | +type HetValues = JsonProvider<""" [{"value":94}, {"value":"hello"}] """> |
| 230 | + |
| 231 | +for item in HetValues.GetSamples() do |
| 232 | + match item.Value.Number, item.Value.String with |
| 233 | + | Some n, _ -> printfn "Number: %d" n |
| 234 | + | _, Some s -> printfn "String: %s" s |
| 235 | + | _ -> () |
| 236 | + |
| 237 | +(*** include-fsi-merged-output ***) |
| 238 | + |
| 239 | +(** |
| 240 | +## Design-Time vs Runtime Behaviour |
| 241 | +
|
| 242 | +The type providers perform inference **at compile time** using the sample document. |
| 243 | +At runtime, the actual data is parsed against the inferred schema. This has a few |
| 244 | +important implications: |
| 245 | +
|
| 246 | +1. **Properties that are required at design-time may be missing at runtime.** If a |
| 247 | + property is always present and non-null in your sample, the provider generates a |
| 248 | + non-optional accessor. If runtime data omits that property, a `KeyNotFoundException` |
| 249 | + is thrown when you access it. |
| 250 | +
|
| 251 | +2. **New properties in runtime data are ignored.** If runtime JSON has extra keys that |
| 252 | + are not in the sample, those keys are simply not accessible via the generated type. |
| 253 | +
|
| 254 | +3. **The sample should cover the full range of variability.** Include examples of all |
| 255 | + optional properties and heterogeneous value types in your sample. Use `SampleIsList=true` |
| 256 | + for JSON/XML when the root is an array of samples. |
| 257 | +
|
| 258 | +4. **Runtime errors are lazy.** The providers do not validate the entire document on load. |
| 259 | + A missing or mistyped field only causes an error when that specific property is accessed. |
| 260 | +
|
| 261 | +## Summary of Inference-Control Parameters |
| 262 | +
|
| 263 | +The following static parameters let you override the default inference behaviour: |
| 264 | +
|
| 265 | +| Parameter | Providers | Effect | |
| 266 | +|---|---|---| |
| 267 | +| `PreferOptionals` | CSV, JSON, XML | Use `T option` for all missing/null values instead of `Nullable<T>` or `Double.NaN` | |
| 268 | +| `AssumeMissingValues` | CSV | Treat every column as nullable/optional even if the sample has no missing values | |
| 269 | +| `MissingValues` | CSV | Comma-separated list of strings to recognise as missing (replaces defaults) | |
| 270 | +| `InferRows` | CSV | Number of rows to use for type inference (default 1000; 0 = all rows) | |
| 271 | +| `SampleIsList` | JSON, XML | Treat the top-level array as a list of sample objects, not a single sample | |
| 272 | +| `PreferDateOnly` | CSV, JSON, XML | Infer date-only strings as `DateOnly` on .NET 6+ (default `false`) | |
| 273 | +| `InferenceMode` | JSON, XML | Enable inline schema annotations (`ValuesAndInlineSchemasHints` or `ValuesAndInlineSchemasOverrides`) | |
| 274 | +| `Schema` | CSV | Override column names and/or types directly | |
| 275 | +
|
| 276 | +For full details on each parameter, see the individual provider documentation: |
| 277 | +[CSV](CsvProvider.html) · [JSON](JsonProvider.html) · [XML](XmlProvider.html) · [HTML](HtmlProvider.html) |
| 278 | +*) |
0 commit comments