Combining folds using semigroups
In today's post, I will show how to combine many recursion-scheme folds using semigroups. For seasoned Haskellers using the library, this trick might be well known or obvious, but I decided to write it down anyway for the rest of us mere mortals 😃. I will assume a little familiarity with recursion-schemes though. If this isn't the case, be sure to check out the (recently revised) recursion-schemes README for an introduction to the library.
The naive or straightforward approach
For our running example, let's take the idea of "static analysis" in a compiler. Here, the compiler executes many checks to make sure a program is semantically valid. Implemented in a straightforward way, each check is an (effectful) function that does a full traversal of the program AST to collect the analysis results. If we simplify this idea, we end up with the following:
import Data.Functor.Foldable import Data.Functor.Foldable.TH -- Our AST data type: data Expr = Constant Int | Add Expr Expr makeBaseFunctor ''Expr -- 1 + 2 + 3 expr :: Expr expr = Constant 1 `Add` Constant 2 `Add` Constant 3 -- An analysis pass is an effectful function (here IO is used -- to print to the terminal): -- Two effectful functions that we want to combine: function1, function2 :: ExprF (IO ()) -> IO () function1 = function "function1:" function2 = function "function2:" -- An effectful function, prints out a prefix + a textual description -- of each node it encounters. function :: String -> ExprF (IO ()) -> IO () function prefix = \case ConstantF x -> putStrLn $ unwords [prefix, "Constant", show x] AddF action1 action2 -> do action1 putStrLn $ unwords [prefix, "Add"] action2 main :: IO () main = do putStrLn "Running function1:" results1 <- cata function1 expr putStrLn "Running function2:" results2 <- cata function2 expr -- ... do something with results
Running the snippet above gives us the following output:
You can see that each traversal happens one after the other. Ideally though, we would like to collect all analysis results using a single traversal/fold. Let's try and figure out how we can achieve that.
Attempt 1: Using function composition
Others have already thought of ways to compose folds using recursion schemes. Tim Williams has a great presentation about recursion schemes where he shows techniques for composing them. First of, there's sequential composition:
-- Note: not valid Haskell, '=' is mathematical equality cata f . cata g = cata (f `comp` g) where -- 'comp' first applies the function 'y', unwraps a single layer of -- the Fix datastructure, and then finally applies the function 'x' comp :: (f (Fix f) -> a) -> (g (Fix f) -> Fix f) -> g (Fix f) -> a comp x y = x . unfix . y
However, this approach is not applicable for our static analysis running example due to the types not lining up. The result of the first function (
y) needs to return a datastructure wrapped in
Fix (but we return a
IO () value).
The second approach Tim mentions (using the
Control.Arrow) is applicable:
For "simple" values, this works great. However, this approach has the same problem as before when you return a function or monadic result instead. To illustrate this, let's run the snippet below:
This gives us the following output:
This is the same behavior as we had before. We perform only one fold now, but then we still have to run the actions separately. That's not what we want! Back to the drawing board!
Attempt 2: Using semigroups
We tried some forms of composition so far that didn't work out, but luckily Haskell has many more ways of composing expressions together. One that is often used is
mappend from the
Semigroup typeclass. Here's what that looks like:
Effectively, this calls the 2 functions in a row, and then combines the results. Running this code shows all effectful actions are interleaved as we'd expect:
Great! This is exactly what we want. We do only one fold over the datastructure and all actions are interleaved!
Other monads than IO
While the approach using semigroups is short and sweet, it is not always possible to use it directly. This is because the monad that is being used needs to have a Semigroup instance (and IO just so happens to have an implementation for it). With a little more code, we can work around this limitation:
compose :: (Monad m, Semigroup b) => (a -> m b) -> (a -> m b) -> a -> m b compose f g a = do result1 <- f a result2 <- g a pure $ result1 <> result2 -- Or using applicative notation: compose' :: (Applicative f, Semigroup b) => (a -> f b) -> (a -> f b) -> a -> f b compose' f g a = (<>) <$> f a <*> g a -- If you only care about side effects, you could use (*>) -- instead of (<>) as well. main :: IO () main = cata alg expr where alg = compose function1 function2
With the "compose" function, you can now also use the same approach for monads that don't implement Semigroup (such as Reader/State/...). Nice!
In this article I showed 3 approaches to combining many smaller folds into one fused fold. Though not all approaches may be applicable for a specific situation, there are always multiple options to choose from when combining functions in this way.
I need to point out that while I used the recursion-schemes library, the same approach is possible if you use "normal" recursive functions (but maybe with a little more work).
If you are interested in more content like this, follow me on Twitter. Feel free to contact me if you have any questions or comments about this topic.