So I want to do some bounded space parsing. I'm collecting a bunch of tokens (in my case Word8, but it doesn't really matter) and combining them into a new token.
I wanted to just write my parser logic independently, and then "wrap" the parser in a way that transforms it into a new parser that fails if the input length is too large before generating a token. I don't want this "limit" logic to be part of every parser I write, I want to separate these concerns.
So I came up with something like this (excuse the verboseness, I've been quite explicit about the types):
import qualified Streamly.Internal.Data.Parser as StreamlyParser
-- Little optimisation so the state remains strict and getting the count doesn't need following a pointer.
-- This is not exposed in the interface of the function
data StrictIntPair a = StrictIntPair {-# UNPACK #-} !Int !a
limitParserInput :: forall m a b. (Applicative m) => Int -> StreamlyParser.Parser a m b -> StreamlyParser.Parser (Int, a) m (Int, b)
limitParserInput maxInputSize (StreamlyParser.Parser (stepFIn :: s -> a -> m (StreamlyParser.Step s b)) (initialIn :: m (StreamlyParser.Initial s b)) (extractFIn :: s -> m (StreamlyParser.Step s b))) =
StreamlyParser.Parser stepF initial extractF where
stepF :: StrictIntPair s -> (Int, a) -> m (StreamlyParser.Step (StrictIntPair s) (Int, b))
stepF (StrictIntPair currentSize state) (inputSize, input) = let nextSize = currentSize + inputSize in case nextSize <= maxInputSize of
True -> BiFunctor.bimap (StrictIntPair nextSize) (nextSize,) <$> stepFIn state input
False -> pure $ StreamlyParser.Error "Hit parser size limit" -- this should be more informative
initial :: m (StreamlyParser.Initial (StrictIntPair s) (Int, b))
initial = BiFunctor.bimap (StrictIntPair 0) (0,) <$> initialIn
extractF :: StrictIntPair s -> m (StreamlyParser.Step (StrictIntPair s) (Int, b))
extractF (StrictIntPair currentSize state) = BiFunctor.bimap (StrictIntPair currentSize) (currentSize,) <$> extractFIn state
So, basically, given a Parser a m b, I can generate a Parser (Int, a) m (Int, b) which means I can give a "size" to all my input elements, and set a "max size" for the output.
So far so good. Well almost. The issue is that if Parser (Int, a) m (Int, b) fails, I just get back a string, but if the underlying Parser a m b fails, I also just get a string. Short of hacky pattern matching on the string, I don't know where each one came from.
So then I tried this, instead writing a Parser wrapper that never fails, but instead throws it's error in ExceptT. Now I can type the error:
data StreamSizeLimitError = StreamSizeLimitError -- this should probably have some parameters detailing the error
limitParserInputTypedErr :: forall m a b. (Monad m) => Int -> StreamlyParser.Parser a m b -> StreamlyParser.Parser (Int, a) (ExceptT StreamSizeLimitError m) (Int, b)
limitParserInputTypedErr maxInputSize (StreamlyParser.Parser (stepFIn :: s -> a -> m (StreamlyParser.Step s b)) (initialIn :: m (StreamlyParser.Initial s b)) (extractFIn :: s -> m (StreamlyParser.Step s b))) =
StreamlyParser.Parser stepF initial extractF where
stepF :: StrictIntPair s -> (Int, a) -> ExceptT StreamSizeLimitError m (StreamlyParser.Step (StrictIntPair s) (Int, b))
stepF (StrictIntPair currentSize state) (inputSize, input) = let nextSize = currentSize + inputSize in case nextSize <= maxInputSize of
True -> lift $ BiFunctor.bimap (StrictIntPair nextSize) (nextSize,) <$> stepFIn state input
False -> throwE StreamSizeLimitError -- this should be more informative
initial :: ExceptT StreamSizeLimitError m (StreamlyParser.Initial (StrictIntPair s) (Int, b))
initial = lift $ BiFunctor.bimap (StrictIntPair 0) (0,) <$> initialIn
extractF :: StrictIntPair s -> ExceptT StreamSizeLimitError m (StreamlyParser.Step (StrictIntPair s) (Int, b))
extractF (StrictIntPair currentSize state) = lift $ BiFunctor.bimap (StrictIntPair currentSize) (currentSize,) <$> extractFIn state
But now I've got a parser transformer that only fails if the underlying parser fails. But because the transformer itself doesn't introduce failures, maybe I could write this as a fold transformer. And I can:
import qualified Streamly.Internal.Data.Fold as StreamlyFold
limitFoldInput :: forall m a b. (Monad m) => Int -> StreamlyFold.Fold m a b -> StreamlyFold.Fold (ExceptT StreamSizeLimitError m) (Int, a) (Int, b)
limitFoldInput maxInputSize (StreamlyFold.Fold (stepIn :: s -> a -> m (StreamlyFold.Step s b)) (initialIn :: m (StreamlyFold.Step s b)) (extractIn :: s -> m b) (extractFinal :: s -> m b)) =
StreamlyFold.Fold step initial extract finish where
step :: StrictIntPair s -> (Int, a) -> ExceptT StreamSizeLimitError m (StreamlyFold.Step (StrictIntPair s) (Int, b))
step (StrictIntPair currentSize state) (inputSize, input) = let nextSize = currentSize + inputSize in case nextSize <= maxInputSize of
True -> lift $ BiFunctor.bimap (StrictIntPair nextSize) (nextSize,) <$> stepIn state input
False -> throwE StreamSizeLimitError -- this should be more informative
initial :: ExceptT StreamSizeLimitError m (StreamlyFold.Step (StrictIntPair s) (Int, b))
initial = lift $ BiFunctor.bimap (StrictIntPair 0) (0,) <$> initialIn
extract :: StrictIntPair s -> ExceptT StreamSizeLimitError m (Int, b)
extract (StrictIntPair size state) = lift $ (size,) <$> extractFinal state
finish :: StrictIntPair s -> ExceptT StreamSizeLimitError m (Int, b)
finish (StrictIntPair size state) = lift $ (size,) <$> extractFinal state
But now I can only use this to transform folds. Although I feel like there should be a function like:
mapFoldToParser :: (Fold m a b -> Fold m c d) -> Parser a m b -> Parser c m d
Although I haven't written one myself to prove it exists, and I couldn't find it in the docs.
So I think I've just got a bunch of vague questions, and would appreciate some guidance:
- Am I totally going about this in the wrong way for what I want to achieve?
- Am I completely destroying my efficiency by moving to
ExceptT, and are there any sneaky space leak issues I need to watch out for?
- Should perhaps
Parser a m b be instead Parser e a m b, and the constructor Error String be instead Error e? This would allow one to write (I think):
limitParserInput :: (Applicative m) => Int -> Parser e a m b -> Parser (StreamSizeLimitError, e) (Int, a) m (Int, b) and perhaps this is better for efficiency/ergonomics?
My apologies for the somewhat verbose question, I just thought the best way to not completely go down the wrong rabbit hole was just to put it all out there the way I've been going.
So I want to do some bounded space parsing. I'm collecting a bunch of tokens (in my case
Word8, but it doesn't really matter) and combining them into a new token.I wanted to just write my parser logic independently, and then "wrap" the parser in a way that transforms it into a new parser that fails if the input length is too large before generating a token. I don't want this "limit" logic to be part of every parser I write, I want to separate these concerns.
So I came up with something like this (excuse the verboseness, I've been quite explicit about the types):
So, basically, given a
Parser a m b, I can generate aParser (Int, a) m (Int, b)which means I can give a "size" to all my input elements, and set a "max size" for the output.So far so good. Well almost. The issue is that if
Parser (Int, a) m (Int, b)fails, I just get back a string, but if the underlyingParser a m bfails, I also just get a string. Short of hacky pattern matching on the string, I don't know where each one came from.So then I tried this, instead writing a
Parserwrapper that never fails, but instead throws it's error inExceptT. Now I can type the error:But now I've got a parser transformer that only fails if the underlying parser fails. But because the transformer itself doesn't introduce failures, maybe I could write this as a fold transformer. And I can:
But now I can only use this to transform folds. Although I feel like there should be a function like:
Although I haven't written one myself to prove it exists, and I couldn't find it in the docs.
So I think I've just got a bunch of vague questions, and would appreciate some guidance:
ExceptT, and are there any sneaky space leak issues I need to watch out for?Parser a m bbe insteadParser e a m b, and the constructorError Stringbe insteadError e? This would allow one to write (I think):limitParserInput :: (Applicative m) => Int -> Parser e a m b -> Parser (StreamSizeLimitError, e) (Int, a) m (Int, b)and perhaps this is better for efficiency/ergonomics?My apologies for the somewhat verbose question, I just thought the best way to not completely go down the wrong rabbit hole was just to put it all out there the way I've been going.