Skip to content

Question (and perhaps suggestion): transforming parsers to limit input size, the "proper way"? #2895

@clintonmead

Description

@clintonmead

So I want to do some bounded space parsing. I'm collecting a bunch of tokens (in my case Word8, but it doesn't really matter) and combining them into a new token.

I wanted to just write my parser logic independently, and then "wrap" the parser in a way that transforms it into a new parser that fails if the input length is too large before generating a token. I don't want this "limit" logic to be part of every parser I write, I want to separate these concerns.

So I came up with something like this (excuse the verboseness, I've been quite explicit about the types):

import qualified Streamly.Internal.Data.Parser as StreamlyParser

-- Little optimisation so the state remains strict and getting the count doesn't need following a pointer.
-- This is not exposed in the interface of the function
data StrictIntPair a = StrictIntPair {-# UNPACK #-} !Int !a

limitParserInput :: forall m a b. (Applicative m) => Int -> StreamlyParser.Parser a m b -> StreamlyParser.Parser (Int, a) m (Int, b)
limitParserInput maxInputSize (StreamlyParser.Parser (stepFIn :: s -> a -> m (StreamlyParser.Step s b)) (initialIn :: m (StreamlyParser.Initial s b)) (extractFIn :: s -> m (StreamlyParser.Step s b))) = 
  StreamlyParser.Parser stepF initial extractF where
    stepF :: StrictIntPair s -> (Int, a) -> m (StreamlyParser.Step (StrictIntPair s) (Int, b))
    stepF (StrictIntPair currentSize state) (inputSize, input) = let nextSize = currentSize + inputSize in case nextSize <= maxInputSize of 
      True -> BiFunctor.bimap (StrictIntPair nextSize) (nextSize,) <$> stepFIn state input
      False -> pure $ StreamlyParser.Error "Hit parser size limit" -- this should be more informative
    initial :: m (StreamlyParser.Initial (StrictIntPair s) (Int, b))
    initial = BiFunctor.bimap (StrictIntPair 0) (0,) <$> initialIn
    extractF :: StrictIntPair s -> m (StreamlyParser.Step (StrictIntPair s) (Int, b))
    extractF (StrictIntPair currentSize state) = BiFunctor.bimap (StrictIntPair currentSize) (currentSize,) <$> extractFIn state

So, basically, given a Parser a m b, I can generate a Parser (Int, a) m (Int, b) which means I can give a "size" to all my input elements, and set a "max size" for the output.

So far so good. Well almost. The issue is that if Parser (Int, a) m (Int, b) fails, I just get back a string, but if the underlying Parser a m b fails, I also just get a string. Short of hacky pattern matching on the string, I don't know where each one came from.

So then I tried this, instead writing a Parser wrapper that never fails, but instead throws it's error in ExceptT. Now I can type the error:

data StreamSizeLimitError = StreamSizeLimitError -- this should probably have some parameters detailing the error

limitParserInputTypedErr :: forall m a b. (Monad m) => Int -> StreamlyParser.Parser a m b -> StreamlyParser.Parser (Int, a) (ExceptT StreamSizeLimitError m) (Int, b)
limitParserInputTypedErr maxInputSize (StreamlyParser.Parser (stepFIn :: s -> a -> m (StreamlyParser.Step s b)) (initialIn :: m (StreamlyParser.Initial s b)) (extractFIn :: s -> m (StreamlyParser.Step s b))) = 
  StreamlyParser.Parser stepF initial extractF where
    stepF :: StrictIntPair s -> (Int, a) -> ExceptT StreamSizeLimitError m (StreamlyParser.Step (StrictIntPair s) (Int, b))
    stepF (StrictIntPair currentSize state) (inputSize, input) = let nextSize = currentSize + inputSize in case nextSize <= maxInputSize of 
      True -> lift $ BiFunctor.bimap (StrictIntPair nextSize) (nextSize,) <$> stepFIn state input
      False -> throwE StreamSizeLimitError -- this should be more informative
    initial :: ExceptT StreamSizeLimitError m (StreamlyParser.Initial (StrictIntPair s) (Int, b))
    initial = lift $ BiFunctor.bimap (StrictIntPair 0) (0,) <$> initialIn
    extractF :: StrictIntPair s -> ExceptT StreamSizeLimitError m (StreamlyParser.Step (StrictIntPair s) (Int, b))
    extractF (StrictIntPair currentSize state) = lift $ BiFunctor.bimap (StrictIntPair currentSize) (currentSize,) <$> extractFIn state

But now I've got a parser transformer that only fails if the underlying parser fails. But because the transformer itself doesn't introduce failures, maybe I could write this as a fold transformer. And I can:

import qualified Streamly.Internal.Data.Fold as StreamlyFold

limitFoldInput :: forall m a b. (Monad m) => Int -> StreamlyFold.Fold m a b -> StreamlyFold.Fold (ExceptT StreamSizeLimitError m) (Int, a) (Int, b)
limitFoldInput maxInputSize (StreamlyFold.Fold (stepIn :: s -> a -> m (StreamlyFold.Step s b)) (initialIn :: m (StreamlyFold.Step s b)) (extractIn :: s -> m b) (extractFinal :: s -> m b)) = 
  StreamlyFold.Fold step initial extract finish where
    step :: StrictIntPair s -> (Int, a) -> ExceptT StreamSizeLimitError m (StreamlyFold.Step (StrictIntPair s) (Int, b))
    step (StrictIntPair currentSize state) (inputSize, input) = let nextSize = currentSize + inputSize in case nextSize <= maxInputSize of 
      True -> lift $ BiFunctor.bimap (StrictIntPair nextSize) (nextSize,) <$> stepIn state input
      False -> throwE StreamSizeLimitError -- this should be more informative
    initial :: ExceptT StreamSizeLimitError m (StreamlyFold.Step (StrictIntPair s) (Int, b))
    initial = lift $ BiFunctor.bimap (StrictIntPair 0) (0,) <$> initialIn
    extract :: StrictIntPair s -> ExceptT StreamSizeLimitError m (Int, b)
    extract (StrictIntPair size state) = lift $ (size,) <$> extractFinal state
    finish :: StrictIntPair s -> ExceptT StreamSizeLimitError m (Int, b)
    finish (StrictIntPair size state) = lift $ (size,) <$> extractFinal state 

But now I can only use this to transform folds. Although I feel like there should be a function like:

mapFoldToParser :: (Fold m a b -> Fold m c d) -> Parser a m b -> Parser c m d

Although I haven't written one myself to prove it exists, and I couldn't find it in the docs.

So I think I've just got a bunch of vague questions, and would appreciate some guidance:

  1. Am I totally going about this in the wrong way for what I want to achieve?
  2. Am I completely destroying my efficiency by moving to ExceptT, and are there any sneaky space leak issues I need to watch out for?
  3. Should perhaps Parser a m b be instead Parser e a m b, and the constructor Error String be instead Error e? This would allow one to write (I think):
    limitParserInput :: (Applicative m) => Int -> Parser e a m b -> Parser (StreamSizeLimitError, e) (Int, a) m (Int, b) and perhaps this is better for efficiency/ergonomics?

My apologies for the somewhat verbose question, I just thought the best way to not completely go down the wrong rabbit hole was just to put it all out there the way I've been going.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions