-
Notifications
You must be signed in to change notification settings - Fork 71
Expand file tree
/
Copy pathParser.hs
More file actions
286 lines (264 loc) · 9.59 KB
/
Parser.hs
File metadata and controls
286 lines (264 loc) · 9.59 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
{-# LANGUAGE CPP #-}
-- |
-- Module : Streamly.Data.Parser
-- Copyright : (c) 2020 Composewell Technologies
-- License : BSD-3-Clause
-- Maintainer : streamly@composewell.com
-- Stability : pre-release
-- Portability : GHC
--
-- Parsers are more powerful but less general than 'Streamly.Data.Fold.Fold's:
--
-- * folds cannot fail but parsers can fail and backtrack.
-- * folds can be composed as a Tee but parsers cannot.
-- * folds can be converted to parsers.
--
-- Streamly parsers support all operations offered by popular Haskell parser
-- libraries. Unlike other parser libraries, (1) streamly parsers can operate
-- on any Haskell type as input - not just bytes, (2) natively support
-- streaming, (3) and are faster.
--
-- == High Performance by Static Parser Fusion
--
-- Like folds, parsers are designed to utilize stream fusion, compiling to
-- efficient low-level code comparable to the speed of C. Parsers are suitable
-- for high-performance parsing of streams.
--
-- Operations in this module are designed to be composed statically rather than
-- dynamically. They are inlined to enable static fusion. More importantly,
-- they are not designed to be used recursively. Recursive use will break
-- fusion and lead to quadratic performance slowdown. For dynamic and
-- recursive compositions use the continuation passing style (CPS) operations
-- from the "Streamly.Data.ParserK" module. 'Parser' and
-- 'Streamly.Data.ParserK.ParserK' types are interconvertible.
--
-- == How to parse a stream?
--
-- Parser combinators can be used to create a pipeline of parsers such
-- that the next parser consumes the result of the previous parser.
-- Such a composed pipeline of parsers can then be driven by one of many parser
-- drivers available in the Stream and Array modules.
--
-- Use Streamly.Data.Stream.'Streamly.Data.Stream.parse' or
-- Streamly.Data.Stream.'Streamly.Data.Stream.parseBreak' to run a parser on an
-- input stream and return the parsed result.
--
-- Use Streamly.Data.Stream.'Streamly.Data.Stream.parseMany' or
-- Streamly.Data.Stream.'Streamly.Data.Stream.parseIterate' to transform an
-- input data stream to an output stream of parsed data elements using a
-- parser.
--
-- == Parser vs ParserK
--
-- There are two functionally equivalent parsing modules,
-- "Streamly.Data.Parser" (this module) and "Streamly.Data.ParserK". The latter
-- is a CPS based wrapper over the former, and can be used for parsing in
-- general. "Streamly.Data.Parser" enables stream fusion and where possible it should be
-- preferred over "Streamly.Data.ParserK" for high performance stream parsing
-- use cases. However, there are a few cases where this module is not
-- suitable and ParserK should be used instead. As a thumb rule, when recursion
-- or heavy nesting is needed use ParserK.
--
-- === Parser: suitable for non-recursive static fusion
--
-- The 'Parser' type is suitable only for non-recursive static fusion. It could
-- be problematic for recursive definitions. To enable static fusion, parser
-- combinators use strict pattern matching on arguments of type Parser. This
-- leads to infinte loop when a parser is defined recursively, due to strict
-- evaluation of the recursive call. For example, the following implementation
-- loops infinitely because of the recursive use of parser @p@ in the @*>@
-- combinator:
--
-- >>> import Streamly.Data.Parser (Parser)
-- >>> import qualified Streamly.Data.Fold as Fold
-- >>> import qualified Streamly.Data.Parser as Parser
-- >>> import qualified Streamly.Data.Stream as Stream
-- >>> import Control.Applicative ((<|>))
--
-- >>> :{
-- >>> p, p1, p2 :: Monad m => Parser Char m String
-- >>> p1 = Parser.satisfy (== '(') *> p
-- >>> p2 = Parser.fromFold Fold.toList
-- >>> p = p1 <|> p2
-- >>> :}
--
-- Another limitation of Parser type quadratic performance slowdown when too
-- many nested compositions are used. Especially Applicative, Monad,
-- Alternative instances, and sequenced parsing operations (e.g. nested 'one',
-- and 'splitWith') exhibit quadratic slowdown (O(n^2) complexity) when
-- combined @n@ times, roughly 8 or less sequenced parsers usually work fine.
-- READ THE DOCS OF APPLICATIVE, MONAD AND ALTERNATIVE INSTANCES.
--
-- === ParserK: suitable for recursive definitions
--
-- ParserK is suitable for recursive definitions:
--
-- >>> import Streamly.Data.ParserK (ParserK)
-- >>> import Streamly.Data.StreamK (toParserK)
-- >>> import qualified Streamly.Data.StreamK as StreamK
--
-- >>> :{
-- >>> p, p1, p2 :: Monad m => ParserK Char m String
-- >>> p1 = toParserK (Parser.satisfy (== '(')) *> p
-- >>> p2 = toParserK (Parser.fromFold Fold.toList)
-- >>> p = p1 <|> p2
-- >>> :}
--
-- >>> StreamK.parse p $ StreamK.fromStream $ Stream.fromList "hello"
-- Right "hello"
--
-- For this reason Applicative, Alternative or Monad compositions with
-- recursion cannot be used with the 'Parser' type. Alternative type class based
-- operations like 'asum' and Alternative based generic parser combinators use
-- recursion. Similarly, Applicative type class based operations like
-- 'Prelude.sequence' use recursion. Custom implementations of many such
-- operations are provided in this module (e.g. 'some', 'many'), and those
-- should be used instead.
--
-- == Parsers Galore!
--
-- Streamly provides all the parsing functionality provided by popular parsing
-- libraries, and much more with higher performance.
-- This module provides most of the elementary parsers and parser combinators.
-- Additionally,
--
-- * all the folds from the "Streamly.Data.Fold" module can be converted to
-- parsers using 'fromFold'.
-- * "Streamly.Unicode.Parser" module provides Char stream parsers.
-- * all the combinators from the
-- <https://hackage.haskell.org/package/parser-combinators parser-combinators>
-- package can be used with streamly ParserK.
-- * See "Streamly.Internal.Data.Parser" for many more unreleased but useful APIs.
--
-- == Generic Parser Combinators
--
-- With 'Streamly.Data.ParserK.ParserK' you can use the 'Applicative' and
-- 'Control.Applicative.Alternative' type class based generic parser
-- combinators from the
-- <https://hackage.haskell.org/package/parser-combinators parser-combinators>
-- library or similar. However, if available, we recommend that you use the
-- equivalent functionality from this module where performance and streaming
-- behavior matters.
-- Firstly, the combinators in this module are faster due to stream fusion.
-- Secondly, these are streaming in nature as the results can be passed
-- directly to other stream consumers (folds or parsers). The Alternative type
-- class based parsers would end up buffering all the results in lists before
-- they can be consumed.
--
-- == Error Reporting
--
-- There are two types of parser drivers available, @parse@ and @parseBreak@
-- drivers do not track stream position, whereas @parsePos@ and @parseBreakPos@
-- drivers track and report stream position information with slightly more
-- performance overhead.
--
-- When an error occurs the stream position is reported, in case of byte streams
-- or unboxed array streams this is the byte position, in case of generic
-- element parsers or generic array parsers this is the element position in the
-- stream.
--
-- These parsers do not report a case specific error context (e.g. line number
-- or column). If you need line number or column information you can read the
-- stream again (if it is immutable) and this count the lines to translate the
-- reported byte position to line number and column. More elaborate support for
-- building arbitrary and custom error context information is planned to be
-- added in future.
--
-- == Monad Transformer Stack
--
-- 'MonadTrans' instance is not provided. If the 'Parser' type is the top most
-- layer (which should be the case almost always) you can just use 'fromEffect'
-- to execute the lower layer monad effects.
--
-- == Experimental APIs
--
-- Please refer to "Streamly.Internal.Data.Parser" for functions that have not
-- yet been released.
--
module Streamly.Data.Parser
(
-- * Setup
-- | To execute the code examples provided in this module in ghci, please
-- run the following commands first.
--
-- $setup
-- * Parser Type
Parser
, ParseError(..)
, ParseErrorPos(..)
-- -- * Downgrade to Fold
-- , toFold
-- * Elementary Parsers
-- ** From Folds
, fromFold
-- ** Without Input
-- , fromFoldMaybe
, fromPure
, fromEffect
, die
-- , dieM
, peek
, eof
-- ** Single Elements
-- All of these can be expressed in terms of either
, one
-- , oneEq
-- , oneNotEq
, oneOf
, noneOf
, satisfy
-- , maybe
-- , either
-- ** Sequences
, streamEqBy
, listEqBy
, listEq
-- * Transformations
-- Mapping on output
-- , rmapM
-- ** Map on input
, lmap
, lmapM
-- ** Map on output
, rmapM
-- ** Filtering
, filter
-- ** Look Ahead
, lookAhead
-- * Tokenizing Combinators
-- ** Tokenize by length
-- , takeBetween
, takeEQ
-- , takeGE
-- , takeP
-- ** Tokenize by predicate
-- , takeWhileP
, takeWhile
, takeWhile1
, dropWhile
-- , takeEndBy
-- , takeEndByEsc
-- , takeStartBy
, wordBy
-- ** Grouping
, groupBy
, groupByRolling
, groupByRollingEither
-- ** Framing
-- , wordFramedBy
, wordWithQuotes
-- , wordProcessQuotes
-- , wordKeepQuotes
-- -- * Alternative
-- , alt
-- * Splitting
, many
, some
, manyTill
-- * De-interleaving
, deintercalate
)
where
import Streamly.Internal.Data.Parser
import Prelude hiding (dropWhile, takeWhile, filter)
#include "DocTestDataParser.hs"