Skip to content

Commit dcdcd61

Browse files
perf: use HashSet<char> and String.exists for adorner detection in TextConversions
Two small improvements to TextConversions.RemoveAdorners: 1. Switch DefaultRemovableAdornerCharacters from F# Set<char> (O(log n) lookup) to System.Collections.Generic.HashSet<char> (O(1) lookup). The set has ~26 characters; while individual operations are fast, this function is called on every cell when parsing CSV/JSON/XML for AsInteger, AsDecimal, AsInteger64. 2. Replace the manual mutable-flag loop with String.exists, which short-circuits on the first adorner match. The previous loop always iterated to the end even after finding an adorner. For values like ',234' the adorner is at index 0, so early exit skips 5 unnecessary iterations. No API change — DefaultRemovableAdornerCharacters is private. The public DefaultNonCurrencyAdorners and DefaultCurrencyAdorners members remain Set<char>. All 2896 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent ef12364 commit dcdcd61

1 file changed

Lines changed: 10 additions & 9 deletions

File tree

src/FSharp.Data.Runtime.Utilities/TextConversions.fs

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
namespace FSharp.Data
66

77
open System
8+
open System.Collections.Generic
89
open System.Globalization
910
open System.Text.RegularExpressions
1011

@@ -85,21 +86,21 @@ type TextConversions private () =
8586
|> Set.ofArray
8687

8788
static member val private DefaultRemovableAdornerCharacters =
88-
Set.union TextConversions.DefaultNonCurrencyAdorners TextConversions.DefaultCurrencyAdorners
89+
// HashSet<char> for O(1) membership testing (vs O(log n) for F# Set)
90+
HashSet<char>(Set.union TextConversions.DefaultNonCurrencyAdorners TextConversions.DefaultCurrencyAdorners)
8991

9092
//This removes any adorners that might otherwise cause the inference to infer string. A notable a change is
9193
//Currency Symbols are now treated as an Adorner like a '%' sign thus are now independent
9294
//of the culture. Which is probably better since we have lots of scenarios where we want to
9395
//consume values prefixed with € or $ but in a different culture.
9496
static member private RemoveAdorners(value: string) =
95-
// Fast path: check if any adorners exist before doing expensive operations
96-
let mutable hasAdorners = false
97-
98-
for i = 0 to value.Length - 1 do
99-
if TextConversions.DefaultRemovableAdornerCharacters.Contains(value.[i]) then
100-
hasAdorners <- true
101-
102-
if not hasAdorners then
97+
// Fast path: short-circuit on first adorner found (String.exists stops early)
98+
if
99+
not (
100+
value
101+
|> String.exists (fun c -> TextConversions.DefaultRemovableAdornerCharacters.Contains(c))
102+
)
103+
then
103104
// No adorners found, return original string to avoid allocation
104105
value
105106
else

0 commit comments

Comments
 (0)