Last updated on February 29, 2016 by Jean-Philippe Bernardy

# Abstract

In this post I explain how to write Haskell code in a way that lets the programmer control when fusion happens. The key idea is to polarize the type to deforest.

# Explicit Fusion

From a sufficiently high-level point of view, the idea behind short-cut fusion is to:

1. Convert functions on Lists to a representation which does not involve explicit (general) recursion. Let’s call this representation List'.
2. Let the inliner simplify expressions. The absence of general recursion means that it can do a reliable job, only using well-known and predictable means. This simplification process will fuse some functions together.
3. Revert the eventual non-eliminated occurrences of List'.

The above modus operandi effectively means that the programmer works with a representation of lists which is not that being used by the optimizer. Consequently, it can be hard to predict when fusion occurs in any given program. One way to circumvent this issue is to work directly with a non-recursive representation.

# Mu and Nu Lists

But which representation should be chosen? There are two obvious choices: either the least fixpoint or the greatest fixpoint; called respectively Mu (μ) and Nu (ν).

newtype Mu f = Mu {fold :: forall x. (f x -> x) -> x}

data Nu f where
Unfold :: (x -> f x) -> x -> Nu f

The representation chosen by Gill et al. are Mu lists. This choice yields the shortcut fusion system used in GHC, also known as the ‘foldr/build’ rule. Nu lists are used by Svenningsson and yield the ‘destroy/unfoldr’ rule. Both these representations use the same, standard functor for lists:

data F a x = Stop | More a x
deriving Functor

(The ‘stream fusion’ of Coutts et al. use another functor; I’m not going to get into why here.)

The choice of either Mu or Nu lists influences the set of functions which can be fused. When the choice is automatic (as in all the user-facing shortcut fusion facilities currently available), the programmer is at the mercy of the choice made by the implementer. On the other hand, if the programmer controls the choice, they can pick the best representation for the job at hand. However, to make a sensible choice one must have a good grasp of which representation is good for what. Let me try to help by giving an intuitive description of Mu and Nu types.

Least fixpoints (Mu) are computations that can consume the structure, in one go. That is, they want to control when to consume each level of the structure. They are naturally masters. We say that they are negative types.

Greatest fixpoints (Nu) are computations that can generate the structure, on demand. That is, they can respond to requests to produce the next level of the structure. They are naturally slaves. We say that they are positive types.

# Good citizens

We can now characterize the functions which ‘behave well’ given a selection of polarity for their arguments and results. Intuitively, there must be a master which drives the order the computation, and some slaves which follow that order.

More precisely, a computation can fuse with its context if and only if the context contains exactly one master. This means that to be fusible with its context, any given function must have exactly one negative argument — and for this purpose, one should count results as arguments of the opposite polarity. (This rule is a generalization of the ‘good consumer/good producer’ rules of thumb explained in the GHC manual.)

Let’s take a look at a few simple examples. First, a map.

mapMu :: (a -> b) -> Mu (F a) -> Mu (F b)

In this case we have one negative argument (master) and one negative result (slave). Thus, we can write a fusible implementation, as follows:

mapMu :: (a -> b) -> Mu (F a) -> Mu (F b)
mapMu f (Mu phi0) = Mu $\phi1 -> phi0 (phi1 . mapF f) where mapF _ Stop = Stop mapF f (More a x) = More (f a) x (Note the lack of recursion.) Let’s consider a zip-like function: zipNu :: Nu (F a) -> Nu (F b) -> Nu (F (a,b)) We have two positive arguments (slaves) and one positive result (master). We can thus write: zipNu (Unfold psi0 s0) (Unfold psi1 s1) = Unfold psi (s0,s1) where psi (s,t) = case (psi0 s, psi1 t) of (More a s', More b t') -> More (a,b) (s',t') _ -> Stop (You may want to try your hand at writing other zip-like functions, mixing several Nu’s and Mu’s.) # Not so good citizens Not all programs have a single master. One way to proceed is to provide conversion functions from Mu to Nu and vice-versa. These functions won’t be fusible, but the programmer will know exactly when fusion does not happen, since they’ll call those functions explicitly. ## Nu to Mu What happens when we convert a slave to a master? In this case, we effectively have to conjure-up an evaluation order. That is, write a loop. The only way to do this in Haskell is to use recursion: loop :: Functor f => Nu f -> Mu f loop (Unfold psi s0) = Mu$ \phi -> let go = phi . fmap go . psi
in go s0

When loop is composed with a fusible function f, the transformations performed by f will be executed one bit at a time, as the lists are being consumed or produced by the loop.

## Mu to Nu

What happens when we convert a master to a slave? Is such a function impossible to write? No! It just means that we have to allocate a buffer so that the master can proceed at its own pace. This in turn means that we have to allocate an intermediate data structure, and thus that fusion can’t happen.

alloc :: Mu f -> Nu f

The only means to do this in Haskell is also recursion; this time in the definition of a data type:

newtype Fix f = In {out :: f (Fix f)}

Allocating a structure of type Fix f from a Mu f is immediate:

muAlloc :: Mu f -> Fix f
muAlloc m = fold m In

Traversing the structure to get a Nu type is not harder:

nuWalk :: Fix f -> Nu f
nuWalk = Unfold out

We can thus define alloc as the composition of the above two functions:

alloc = nuWalk . muAlloc

For completeness, let’s write the two missing conversions between Fix and Mu/Nu. They correspond to folding and unfolding the Fix structure. Those are standard generic programming functions that have the following implementations:

fixFold :: Functor f => (f a -> a) -> Fix f -> a
fixFold f = f . fmap (fixFold f) . out

fixUnfold :: Functor f => (a -> f a) -> a -> Fix f
fixUnfold f = In . fmap (fixUnfold f) . f

And the conversions simply wrap those:

nuAlloc :: Functor f => Nu f -> Fix f
nuAlloc (Unfold psi s0) = fixUnfold psi s0

muWalk :: Functor f => Fix f -> Mu f
muWalk s = Mu $\phi -> fixFold phi s Because no Haskell blog post is complete without a little proof, we can finish by showing that loop = muWalk . nuAlloc (but fused). We start by unfolding the equality: loop (Unfold psi s0) = Mu$ \phi -> (fixFold phi . fixUnfold psi) s0

Then we give a name to fixFold phi . fixUnfold psi and calculate:

go = fixFold phi . fixUnfold psi
= phi . fmap (fixFold phi) . out . In . fmap (fixUnfold psi) . psi
= phi . fmap (fixFold phi) . fmap (fixUnfold psi) . psi
= phi . fmap (fixFold phi . fixUnfold psi) . psi
= phi . fmap go . psi

# Conclusion

In summary, my recommendations for getting reliable and flexible fusion are to:

1. Roll out you own non-recursive types, or use a library which internally use non-recursive types;

2. Select representations accordingly to the the behavior that they should have (master/slave). Library writer should expose both versions, for the user to select.

3. Add non-fusible conversion functions at the appropriate places. Library writer should provide such conversion functions.