A dual view of testing of (polymorphic) programs

Last updated on January 3, 2017
Tags: Haskell


One can define testing as the activity of generating an environment ρ for a program, and then verifying if the running the program in ρ yields the expected results.

In a typed programming language, testing can be automated in the following way. Assuming that we have a function f taking an argument of type A, then we can generate mechanically elements of the type A, feed them to f and check that it behaves well. Even though in general mechanical testing will not test all the possible values for A, in practice it is often useful to test programs for a few small example environments.

For simplification purposes, let us assume that we are only interested in testing that f terminates without crashing. In this context we can further assume that f does not return any meaningful result (besides termination). Writing the type of terminating programs as , we can then write this fact as f : A → ⊥. As we have seen, in such a case, generating the environment is as simple as finding inhabitants of A.

Proposition and types and generalization of environments

In classical logic, every proposition T has a dual, which we write  ∼ T. As logicians have shown, one can interpret propositions as types, and proofs as programs. For classical logics, this interpretation may be achieved via Girard’s linear logic. One can also use double-negation embeddings. Given such an interpretation, if we have both a program p of type T and an environment e of type  ∼ T then e is a valid environment for p. That is, p should terminate when run in environment e, if both p and e are bug-free.

We define the dual of a type by structural induction, as follows:

 ∼ (AB)  = A ×  ∼ B
 ∼ (A×B)  = A →  ∼ B
 ∼ (∀α.A)  = ∃α. ∼ A
 ∼ (∃α.A)  = ∀α. ∼ A
 ∼ (⊥)  = 1
 ∼ (1)  = ⊥

Example: assuming f : Int → ⊥, then the corresponding environment has type Int × 1. In testing terms, it means that if we generate an integer and a unit value, f should terminate on that input. (The unit value is only there to check for termination.) If it turns out that f is intended to accept only a subset of integers, then this contract should be reflected in its type.

Remark: for higher-order languages the types of programs and environments are the same.

(First-rank) Polymorphism

For languages without polymorphism (even higher-order languages), the problem of type-inhabitation is decidable: one can simply try all possibilities. So is the generation of environments: and one can use a similar strategy to generate environments for testing.

Yet, one often wants to test programs whose type have type-variables. In many cases, these are universally quantified at the top-level only, and so the type-inhabitation problem remains the same. Unfortunately, by duality, the type-variables of the environment are not quantified unverstally, but existentially. Thus, we face another problem: the (automatic) generation of types.

Fortunately, the type-generation problem is also an opportunity. Indeed, the type to generate is abstract for the program under test. This abstraction property means that the program can never examine the values generated, instead it can only feed them back to the environment itself for further processing. The opportunity is to make the generated type to contain as little information as possible, inorder to cover as many concrete tests cases in one go1.

The task is thus to find a type which can contain all the possible environments. In the following I propose a new method to do so. For every existentially quantified type-variable α, we define a type Ty(α) as the smallest type inductively generated by a number of constructors. A sufficient set of constructors is generated by structural induction over types, as follows:

⟦∃α.TΓ  = ⟦TΓ
⟦∀α.TΓ  = ⟦TΓ
A × BΓ  = ⟦AΓ ∪ ⟦BΓ
A → BΓ  = ⟦ ∼ AΓ ∪ ⟦B(Γ,x:A)
αΓ  = {κ : Γ → α}
⟦ ∼ αΓ  = {}
⟦1⟧Γ  = {}
⟦⊥⟧Γ  = {}

The key idea is the following. During the generation of an environment, we may need to generate a value of type Ty(α) given a context Γ of variables in scope. But, we also know that this value can only be consumed later by another part of the environment. Thus, instead of generating all possible programs, generating many possible types, and store that in Ty(α), we instead store Γ itself, and let further parts of the environment (those taking Ty(α) as input) deal with the generation of suitable values of concrete (known) types. Any program that we can possibly generate from the values in the environment will thus be captured.

Technically, whenever a function is encountered, we add the type to a context Γ. When we reach a type variable, we generate a constructor (κ) allowing to store the whole context. Crucially, to deal with a function type A → B, we dualize the domain A in the recursive case.

Example: consider p : ∀α.(αα) → α → ⊥. We should generate e : ∃α.(αα) × α × 1. We obtain two constructors, as follows:

⟦(αα) × α × 1⟧  = ⟦α → α⟧ ∪ ⟦α⟧ ∪  × ⟦1⟧
 = ⟦α → α⟧ ∪ {κ₂ : α} ∪  × {}
 = ⟦ ∼ α⟧ ∪ ⟦α(x:α) ∪ {κ₂ : α}
 = {} ∪ ⟦α(x:α) ∪ {κ₂ : α}
 = {κ₁ : α → α} ∪ {κ₂ : α}
 = {κ₁ : α → α, κ₂ : α}

Thus Ty(α) is isomorphic to inductive naturals.

One can add a couple of remarks.

  1. The rule for  ∼ α does not generate any constructor, because it corresponds to the consumption of α.

  2. The above method works even for for arbitrary-rank polymorphism (hence the rules for ∃ and ∀).

    However, for such languages, finding a single value for an arbitrary type A is in general undecidable, so it is not clear if such a generalization is very useful.

Instead of generating constructors of an inductive datatype, we could also generate co-patterns of a co-inductive datatype, dualizing the whole process. The dualization may introduce interesting testing trade-offs. In general, if there are many ways to construct α, but few ways to convert it back to a concrete value, it is best to generate inductive types. On the contrary, if there are few ways to construct α, but many ways to convert it back to a concrete value, a co-inductive type is better.

Related work

In earlier work, Patrik, Koen and I have exposed how the above works on a special case. While this note generalises the idea, I recommend the earlier paper2 for motivation, intution and further applications.

Recently Xia Li-yao has implemented a monomorphizer for testing functions, citing our work as inspiration. Their work appears to work by structural induction over types, yet at the time of writing the relation with the method presented here is unclear.


  1. Testing Polymorphic Properties, Bernardy, Jansson and Claessen.↩︎

  2. Testing Polymorphic Properties, Bernardy, Jansson and Claessen.↩︎