9. Background¶
9.1. Why not Variant?¶
Extensible ADT (EADT) is a solution based on Variant that supports recursive datatypes. Indeed if we try to define a recursive datatype (e.g., a list) by using Variants, we get the following error:
data Cons a l = Cons a l
data Nil = Nil
> type L a = V '[Cons a (L a), Nil]
<interactive>:19:2: error:
Cycle in type synonym declarations:
<interactive>:19:2-34: type L a = V '[Cons a (L a), Nil]
We could introduce ad-hoc datatypes (e.g., newtype L a = L (V '[Cons a (L
a),Nil])
) but this would defeat our purpose because the datatype wouldn’t be
generic anymore. Instead with EADTs we just have to write the following code to
declare a List
EADT:
data ConsF a l = ConsF a l deriving (Functor)
data NilF l = NilF deriving (Functor)
type List a = EADT '[ConsF a, NilF]
9.2. History¶
9.2.1. The expression problem (1998)¶
In 1998, Philip Wadler defined the Expression Problem as follow:
The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety
In Haskell it is straightforward to add new functions over an ADT. Suppose we have the following arithmetic expression ADT:
data Expr = Val Int | Add Expr Expr
We can independently add an evaluator function, potentially in another module:
eval :: Expr -> Int
eval (Val x) = x
eval (Add x y) = eval x + eval y
However if we want to add a new constructor to the ADT (say support for multiplication), we have to modify both the ADT definition and the functions using it:
data Expr = .... | Mul Expr Expr
eval :: Expr -> Int
....
eval (Mul x y) = eval x * eval y
What we want is to be able to add a new independent module containing both the
Mul
constructor and the code to handle it, without modifying the other
modules defining the other constructors and the other code to handle them!
9.2.2. Data types à la carte (2008)¶
Ten years later (in 2008), Wouter Swierstra described a technique to handle this in his well-known Data types à la carte paper. The first idea is to define data constructors independently of the ADT and to use a type parameter to leave open the ADT they are part of.
-- Independent data constructors. Parameter `e` represents the ADT they
-- will be part of. It is required even if it is not used in the right hand
-- side.
data Val e = Val Int deriving (Functor)
data Add e = Add e e deriving (Functor)
Defining a new independent constructor is easy:
data Mul e = Mul e e deriving (Functor)
The second idea is to use a combinator data type :+:
:
data (f :+: g) e = Inl (f e) | Inr (g e)
instance (Functor f, Functor g) => Functor (f :+: g) where ...
It is similar to Either
except that it passes the same additional type
parameter to both f
and g
type constructors. It can be used to compose
independent data constructors without creating a new data type:
type ExprF = Val :+: Add
ExprF
has kind Type -> Type
and its type parameter is used as the e
parameter of the independent data constructors. We can set it to arbitrary types
such as Int
to build valid values:
y = Inr (Add 5 8) :: ExprF Int
However the main use of this parameter should be to indicate the type of the
expression data type we want to build, say Expr
. Hence we would like to
write something like this:
type Expr = ExprF Expr
>error:
Cycle in type synonym declarations:
<interactive>:12:1-22: type Expr = ExprF Expr
Oops, we can’t build this cyclic (infinite) type. This leads us to the third idea: use another data type to handle the recursive nature of the expression type:
newtype Expr = Expr (ExprF Expr)
We can abstract over it to use the same data type for different expression types:
-- `Fix` type as defined in Data.Functor.Foldable for instance
newtype Fix f = Fix (f (Fix f))
type Expr = Fix ExprF
In summary, the approach uses 3 different sorts of data types:
Constructor data types:
Val
,Add
,Mul
…Combinator data type:
:+:
Recursivity handling data type:
Fix
By using these different data types we have untangled the construction of ADTs (algebraic data types) and we can freely add new constructor data types and mix them into different algebraic data types.
Operations on these algebraic data types can be defined independently by using type-classes and recursion schemes.
9.2.3. EADT - Extensible ADT (2018)¶
The EADT approach builds on the Swierstra’s one but it replaces the combinator
data type :+:
with the VariantF
one based on Variant.
Similarly to the :+:
combinator data type, VariantF
passes its e
parameter to all of its “member” types and has an instance of the Functor
class.
newtype VariantF (xs :: [* -> *]) e = VariantF (Variant (ApplyAll e xs))
-- ApplyAll e '[f,g,h] ==> '[f e, g e, h e]
instance Functor (VariantF xs) where ....
Now instead of writing f :+: g :+: h :+: i
to combine constructor data types
to form an ADT we can write VariantF '[f,g,h,i]
. Just like using
Variant
is more efficient – O(1) memory usage and (de)construction – than
using a nest of Either
, using VariantF
is more efficient than using a
nest of :+:
.
Finally an EADT is just Fix (VariantF xs)
except that we use our own
EADT
newtype instead of Fix
in order to define our own additional (and
non-orphan) type-classes. EADT
implements Recursive
and CoRecursive
type-classes so usual Fix
functions should work on EADT
too.
newtype EADT xs = EADT (VariantF xs)
With modern Haskell we can define bidirectional pattern synonyms that make the manipulation of EADT values very similar to the manipulation of usual ADTs.
In summary EADTs provide a nicer interface and a better asymptotic implementation in both memory and runtime execution than Data types à la carte.