Controlling Fusion In Haskell

Last updated on February 29, 2016 by Jean-Philippe Bernardy
Tags: Haskell, continuations


In this post I explain how to write Haskell code in a way that lets the programmer control when fusion happens. The key idea is to polarize the type to deforest.

Explicit Fusion

From a sufficiently high-level point of view, the idea behind short-cut fusion is to:

  1. Convert functions on Lists to a representation which does not involve explicit (general) recursion. Let’s call this representation List'.
  2. Let the inliner simplify expressions. The absence of general recursion means that it can do a reliable job, only using well-known and predictable means. This simplification process will fuse some functions together.
  3. Revert the eventual non-eliminated occurrences of List'.

The above modus operandi effectively means that the programmer works with a representation of lists which is not that being used by the optimizer. Consequently, it can be hard to predict when fusion occurs in any given program. One way to circumvent this issue is to work directly with a non-recursive representation.

Mu and Nu Lists

But which representation should be chosen? There are two obvious choices: either the least fixpoint or the greatest fixpoint; called respectively Mu (μ) and Nu (ν).

newtype Mu f = Mu {fold :: forall x. (f x -> x) -> x}

data Nu f where
  Unfold :: (x -> f x) -> x -> Nu f

The representation chosen by Gill et al. are Mu lists. This choice yields the shortcut fusion system used in GHC, also known as the ‘foldr/build’ rule. Nu lists are used by Svenningsson and yield the ‘destroy/unfoldr’ rule. Both these representations use the same, standard functor for lists:

data F a x = Stop | More a x
  deriving Functor

(The ‘stream fusion’ of Coutts et al. use another functor; I’m not going to get into why here.)

The choice of either Mu or Nu lists influences the set of functions which can be fused. When the choice is automatic (as in all the user-facing shortcut fusion facilities currently available), the programmer is at the mercy of the choice made by the implementer. On the other hand, if the programmer controls the choice, they can pick the best representation for the job at hand. However, to make a sensible choice one must have a good grasp of which representation is good for what. Let me try to help by giving an intuitive description of Mu and Nu types.

Least fixpoints (Mu) are computations that can consume the structure, in one go. That is, they want to control when to consume each level of the structure. They are naturally masters. We say that they are negative types.

Greatest fixpoints (Nu) are computations that can generate the structure, on demand. That is, they can respond to requests to produce the next level of the structure. They are naturally slaves. We say that they are positive types.

Good citizens

We can now characterize the functions which ‘behave well’ given a selection of polarity for their arguments and results. Intuitively, there must be a master which drives the order the computation, and some slaves which follow that order.

More precisely, a computation can fuse with its context if and only if the context contains exactly one master. This means that to be fusible with its context, any given function must have exactly one negative argument — and for this purpose, one should count results as arguments of the opposite polarity. (This rule is a generalization of the ‘good consumer/good producer’ rules of thumb explained in the GHC manual.)

Let’s take a look at a few simple examples. First, a map.

mapMu :: (a -> b) -> Mu (F a) -> Mu (F b)

In this case we have one negative argument (master) and one negative result (slave). Thus, we can write a fusible implementation, as follows:

mapMu :: (a -> b) -> Mu (F a) -> Mu (F b)
mapMu f (Mu phi0) = Mu $ \phi1 -> phi0 (phi1 . mapF f)
    mapF _ Stop = Stop
    mapF f (More a x) = More (f a) x

(Note the lack of recursion.)

Let’s consider a zip-like function:

zipNu :: Nu (F a) -> Nu (F b) -> Nu (F (a,b))

We have two positive arguments (slaves) and one positive result (master). We can thus write:

zipNu (Unfold psi0 s0) (Unfold psi1 s1)
  = Unfold psi (s0,s1) where
    psi (s,t) = case (psi0 s, psi1 t) of
      (More a s', More b t') -> More (a,b) (s',t')
      _ -> Stop

(You may want to try your hand at writing other zip-like functions, mixing several Nu’s and Mu’s.)

Not so good citizens

Not all programs have a single master. One way to proceed is to provide conversion functions from Mu to Nu and vice-versa. These functions won’t be fusible, but the programmer will know exactly when fusion does not happen, since they’ll call those functions explicitly.

Nu to Mu

What happens when we convert a slave to a master? In this case, we effectively have to conjure-up an evaluation order. That is, write a loop. The only way to do this in Haskell is to use recursion:

loop :: Functor f => Nu f -> Mu f
loop (Unfold psi s0) = Mu $ \phi -> let go = phi . fmap go . psi
                                    in go s0

When loop is composed with a fusible function f, the transformations performed by f will be executed one bit at a time, as the lists are being consumed or produced by the loop.

Mu to Nu

What happens when we convert a master to a slave? Is such a function impossible to write? No! It just means that we have to allocate a buffer so that the master can proceed at its own pace. This in turn means that we have to allocate an intermediate data structure, and thus that fusion can’t happen.

alloc :: Mu f -> Nu f

The only means to do this in Haskell is also recursion; this time in the definition of a data type:

newtype Fix f = In {out :: f (Fix f)}

Allocating a structure of type Fix f from a Mu f is immediate:

muAlloc :: Mu f -> Fix f
muAlloc m = fold m In

Traversing the structure to get a Nu type is not harder:

nuWalk :: Fix f -> Nu f
nuWalk = Unfold out

We can thus define alloc as the composition of the above two functions:

alloc = nuWalk . muAlloc

For completeness, let’s write the two missing conversions between Fix and Mu/Nu. They correspond to folding and unfolding the Fix structure. Those are standard generic programming functions that have the following implementations:

fixFold :: Functor f => (f a -> a) -> Fix f -> a
fixFold f = f . fmap (fixFold f) . out

fixUnfold :: Functor f => (a -> f a) -> a -> Fix f
fixUnfold f = In . fmap (fixUnfold f) . f

And the conversions simply wrap those:

nuAlloc :: Functor f => Nu f -> Fix f
nuAlloc (Unfold psi s0) = fixUnfold psi s0

muWalk :: Functor f => Fix f -> Mu f
muWalk s = Mu $ \phi -> fixFold phi s

Because no Haskell blog post is complete without a little proof, we can finish by showing that loop = muWalk . nuAlloc (but fused). We start by unfolding the equality:

loop (Unfold psi s0) = Mu $ \phi -> (fixFold phi . fixUnfold psi) s0

Then we give a name to fixFold phi . fixUnfold psi and calculate:

go = fixFold phi . fixUnfold psi
   = phi . fmap (fixFold phi) . out . In . fmap (fixUnfold psi) . psi
   = phi . fmap (fixFold phi) . fmap (fixUnfold psi) . psi
   = phi . fmap (fixFold phi . fixUnfold psi) . psi
   = phi . fmap go . psi


In summary, my recommendations for getting reliable and flexible fusion are to:

  1. Roll out you own non-recursive types, or use a library which internally use non-recursive types;

  2. Select representations accordingly to the the behavior that they should have (master/slave). Library writer should expose both versions, for the user to select.

  3. Add non-fusible conversion functions at the appropriate places. Library writer should provide such conversion functions.

Further Reading

If you’re interested in digging into the concepts of polarity and fusion, I’ve developed them in two papers:

  1. In On the Duality of Streams, we investigate the role of polarity in defining effectful streams (pipes or conduits-like).
  2. In Composable Efficient Array Computations Using Linear Types, we see how we can take short-cut fusion to its limits, using linear types. In particular, we investigate a self-dual kind of list, which sits between Mu and Nu lists.