Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,58 @@
1.23.1 --> 1.23.2
=================
Performance: lexer/parser internals are now Text-native (Phase 1).
The Syntax AST stays String-valued, so this is a non-breaking change
for AST consumers. Phase 2 (Text-valued AST + lazy String compat
shim) is a separate later change.

What changed:
* Lexer input tape, all Token payloads, the keyword/operator/pragma
lookup tables, and the numeric/identifier/escape/raw-pragma lex
workers operate on Data.Text directly.
* Strict 'discard' and a single-pass 'lexWhileT' (using T.span +
foldl' for line/column) replace the previous lazy formulations,
removing a per-character thunk chain in the tokenizer hot loop.
* Token payload fields are strict (!Text), so the [Char]-cons
accumulator inside lexString is materialized to a strict Text and
freed at token yield rather than being kept alive by the token
stream.
* lexString uses a scan-and-splice strategy: each plain run is
sliced from the input Text via T.span, and only the parsed
escape characters allocate new Text records. Per-token
allocation is O(escapes), not O(chars).

What's new:
* Language.Haskell.Exts.Parser.Text exposing Text-input parser
entry points (parseModuleText, parseExpText, ...) that skip the
eager Data.Text.pack at the String boundary.
* lexTokenStreamText / lexTokenStreamTextWithMode in
Language.Haskell.Exts.Lexer, the lexer-only counterparts.

Measurements (15-trial bench-mutex 2σ-gated, GHC 9.10, -O), Text
API vs master parseModule:

Issue #478 stress (1 x 3 MB string literal):
master: 234 MB residency, 1156 MB allocated, 1.0 s
Text API: 52 KB residency, 7 MB allocated, 13 ms
delta: -99.98% / -99.40% / -98.71%

Multi-literal stress (200 x 50 kB literals, 9.6 MB):
master: 628 MB residency, 3858 MB allocated, 2.9 s
Text API: 6 MB residency, 30 MB allocated, 47 ms
delta: -99.05% / -99.22% / -98.39%

Identifier-heavy stress (5.1 MB, 1M ids / 100 unique):
no measurable change (the AST scaffolding dominates).
Atom table at lex time would address this; not in this PR.

Real haskell-src-exts library files via the Text API:
InternalLexer.hs (58 kB): resid -74%, alloc -27%, time -51%
ParseSyntax.hs (17 kB): resid -54%, alloc +13% (small file
T.unpack overhead at AST construction;
eliminated by Phase 2)
Build.hs, Comments.hs, SrcLoc.hs (small): resid neutral to -38%,
alloc +11-13% (same small-file T.unpack overhead).

1.23.0 --> 1.23.1
=================
* show instance for SrcLoc and SrcSpan renders "(-1)" instead of "-1"
Expand Down
4 changes: 3 additions & 1 deletion haskell-src-exts.cabal
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Name: haskell-src-exts
Version: 1.23.1
Version: 1.23.2
License: BSD3
License-File: LICENSE
Build-Type: Simple
Expand Down Expand Up @@ -50,6 +50,7 @@ Library
Build-Tools: happy >= 1.19
Build-Depends: array >= 0.1, pretty >= 1.0,
base >= 4.5 && < 5,
text >= 1.2,
-- this is needed to access GHC.Generics on GHC 7.4
ghc-prim
-- this is needed to access Data.Semigroup and Control.Monad.Fail on GHCs
Expand All @@ -70,6 +71,7 @@ Library
Language.Haskell.Exts.Fixity,
Language.Haskell.Exts.ExactPrint,
Language.Haskell.Exts.Parser,
Language.Haskell.Exts.Parser.Text,
Language.Haskell.Exts.Comments

Other-modules: Language.Haskell.Exts.ExtScheme,
Expand Down
Loading