Skip to content

splitWhen & splitWhenM for Foldable#4855

Open
TheBugYouCantFix wants to merge 5 commits intotypelevel:mainfrom
TheBugYouCantFix:foldable-split-when
Open

splitWhen & splitWhenM for Foldable#4855
TheBugYouCantFix wants to merge 5 commits intotypelevel:mainfrom
TheBugYouCantFix:foldable-split-when

Conversation

@TheBugYouCantFix
Copy link
Copy Markdown

@TheBugYouCantFix TheBugYouCantFix commented Apr 22, 2026

This PR implements a splitWhen method requested in #4543 for Foldable
Additionally it implements its monadic version splitWhenM
The behaviour is aimed to be identical to that of haskell's splitWhen

scala> Foldable[List].splitWhen(List(1, 2, 3, 1, 4, 5))(_ == 1)
val res0: cats.data.NonEmptyList[List[Int]] = NonEmptyList(List(), List(2, 3), List(4, 5))

scala> Foldable[List].splitWhen(List(1,1))(_ == 1)
val res1: cats.data.NonEmptyList[List[Int]] = NonEmptyList(List(), List(), List())

@TheBugYouCantFix TheBugYouCantFix force-pushed the foldable-split-when branch 3 times, most recently from 898c9a7 to c50d1d5 Compare April 22, 2026 10:02
@TheBugYouCantFix
Copy link
Copy Markdown
Author

TheBugYouCantFix commented Apr 22, 2026

I guess the methods could return NonEmptyList[List[A]] for more type safety. I noticed that no method of Foldable returns NonEmptyList so I decided to make it a usual List too but it'd be no problem for me to patch it to NonEmptyList

@TheBugYouCantFix TheBugYouCantFix force-pushed the foldable-split-when branch 2 times, most recently from 36fae12 to 319cd2d Compare April 22, 2026 10:43
Copy link
Copy Markdown
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a useful function but it seems to be it belongs on List not on Foldable.

A major motivation for type classes is that the functions can be optimized for the different instances, but here just call toList and then execute the List implementation.

There is no override for efficiency on any other types.

Comment thread core/src/main/scala/cats/Foldable.scala Outdated
@TheBugYouCantFix
Copy link
Copy Markdown
Author

A major motivation for type classes is that the functions can be optimized for the different instances, but here just call toList and then execute the List implementation.

Makes sense. I could rewrite it to be in List then

@TheBugYouCantFix TheBugYouCantFix changed the title splitWhen & splitWhenM for Foldable Draft: splitWhen & splitWhenM for Foldable Apr 22, 2026
@TheBugYouCantFix TheBugYouCantFix changed the title Draft: splitWhen & splitWhenM for Foldable splitWhen & splitWhenM for List Apr 22, 2026
@TheBugYouCantFix TheBugYouCantFix force-pushed the foldable-split-when branch 6 times, most recently from 1dd205f to 5ee3767 Compare April 22, 2026 13:09
@TheBugYouCantFix
Copy link
Copy Markdown
Author

@johnynek, I have rewritten it to List, please have another look at the changes

Comment thread core/src/main/scala/cats/syntax/list.scala Outdated
Comment thread tests/shared/src/test/scala/cats/tests/ListSuite.scala Outdated
@TheBugYouCantFix TheBugYouCantFix force-pushed the foldable-split-when branch 2 times, most recently from c3f8951 to 4654690 Compare April 22, 2026 23:39
@satorg
Copy link
Copy Markdown
Contributor

satorg commented Apr 23, 2026

Thanks @TheBugYouCantFix , @johnynek !

I have a couple of questions regarding this PR:

  1. Is it possible to make this method generic?

    For example, this POC code seems working:

    implicit final class FOps[F[_], A](private val self: F[A]) extends AnyVal {
      def splitWhen(f: A => Boolean)(implicit
        FF: Foldable[F],
        FA: Alternative[F]
      ): NonEmptyList[F[A]] = {
        self.foldRight(Eval.now(NonEmptyList.one(FA.empty[A]))) {
          case (a, acc) if f(a) => acc.map(FA.empty[A] :: _)
          case (a, acc) => acc.map(nel => NonEmptyList(nel.head.prependK(a), nel.tail))
        }
      }.value
    }

    It could be a method of Foldable with Alternative constraint. Then it could be further optimized and specialized for List, Vector, etc.

  2. Does splitWhen have to "swallow" the matched item?

    If the predicate is just == then it probably doesn't matter. But if it's something more sophisticated, then it can be convenient in some cases to "observe" every matched item at the head of every sub-sequence. For example, it could work somewhat like this:

    scala> List(1, 2, 3, 4, 5).splitWhen(_ % 2 == 0)
    var res1: cats.data.NonEmptyList[List[Int]] = NonEmptyList(List(1), List(2, 3), List(4, 5))
    
    scala> List(1, 2, 3, 4, 5).splitWhen(_ % 2 != 0)
    var res2: cats.data.NonEmptyList[List[Int]] = NonEmptyList(List(), List(1, 2), List(3, 4), List(5))
    

    i.e. the idea here is that every inner F[X] except the very first one starts with the matching item so we can always know what is that.

    Besides, the existing splitAt doesn't swallow the item at the given index.

@TheBugYouCantFix
Copy link
Copy Markdown
Author

TheBugYouCantFix commented Apr 23, 2026

@satorg

  1. Sounds like a good idea, I'll try adding it to Foldable then
  2. I guess there is no right answer to this question. As I mentioned, I referenced haskell's splitWhen behaviour. But I think the best solution in this case would be to add a flag with a default value, e.g. swallow: Boolean = true

P.S. I'm not sure if splitAt is a good example here since it always splits the collection in 2 halves and an obvious counterexample is string's splitBy which is the same as splitWhen but only limited to a == predicate. Now I'm leaning towards just leaving it "swallowing"

@TheBugYouCantFix TheBugYouCantFix changed the title splitWhen & splitWhenM for List splitWhen & splitWhenM for Foldable Apr 23, 2026
Comment thread core/src/main/scala/cats/Foldable.scala
): G[NonEmptyList[F[A]]] = {
foldRight(fa, Eval.now(M.pure(NonEmptyList.one(FA.empty[A])))) { (a, evalGnel) =>
evalGnel.map { gnel =>
M.flatMap(f(a)) { isDelimiter =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the loss of tailRecM here means this isn't stack safe and will likely cause pain for users when they try to use it with Reader/Writer/State/Kleisli/etc...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if a truly stack-safe implementation is possible here inside Foldable yet I may be wrong. But if it turns out to be true then we'd probably have to either remove that method or bring it all back to list...
@satorg what do you think?

Copy link
Copy Markdown
Contributor

@satorg satorg Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be getting it wrong, but I couldn't come up with an example where we'd blow the stack here.

  1. The method uses Foldable.foldRight on F[_], which engages Eval, which, in turn, is a StackSafeMonad itself:
    implicit val catsBimonadForEval: Bimonad[Eval] & CommutativeMonad[Eval] =
    new FlatMap.AbstractFlatMap[Eval] with Bimonad[Eval] with StackSafeMonad[Eval] with CommutativeMonad[Eval] {
    According to the docs, if it uses map or flatMap over Eval only, then it's probably safe.
  2. When it comes to type G[_], then it doesn't have to be a Monad at all – Applicative should be enough here:
    def splitWhenM[G[_], A](fa: F[A])(f: A => G[Boolean])(implicit
      FA: Alternative[F],
      G: Applicative[G]
    ): G[NonEmptyList[F[A]]] = {
      foldRight(fa, Eval.now(M.pure(NonEmptyList.one(FA.empty[A])))) { (a, evalGnel) =>
        evalGnel.map { gnel =>
          G.map2(f(a), gnel) { case (isDelimiter, nel) =>
            if (isDelimiter) FA.empty[A] :: nel
            else NonEmptyList(FA.prependK(a, nel.head), nel.tail)
          }
        }
      }.value
    }
    Therefore, the transformation is flat – it should be safe as long as the user-provided f is safe.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @johnynek mentioned, it'd blow the stack if G is Kleisli, Writer or Reader. I tried running it with Kleisli on my machine and just 10K element list was enough to get a StackOverflowError

@satorg
Copy link
Copy Markdown
Contributor

satorg commented Apr 24, 2026

I'm not sure if splitAt is a good example here since it always splits the collection in 2 halves and an obvious counterexample is string's splitBy which is the same as splitWhen but only limited to a == predicate. Now I'm leaning towards just leaving it "swallowing".

Apparently, splitAt splits into 2 parts only because it takes a specific index as a parameter.

String's split, on the other hand, is a regex-based operation optimized for extracting substrings by some delimiter. There's def split(ch: Char) as well, but it is just a wrapper arough the regex version. Therefore, splitting and keeping the matching part doesn't make a lot of sense, because the latter can have any length and there is no simple way to separate it further.

On the contrary, splitWhen would only keep 1 matching item for each sub-sequence and it should be fairly simple to get rid of it, if necessary.

That said, I don't have a strict opinion on that – just a feeling that the keep-the-match version would be more generic and re-usable. But it's not a "deal-breaker" for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants