This chapter was thoroughly changed Feb 2003.
data T a = MkT !(a,a) !(T a) | Nil f x = case x of MkT p q -> MkT p (q+1) Nil -> NilThe user's source program mentions only the constructors MkT and Nil. However, these constructors actually do something in addition to building a data value. For a start, MkT evaluates its arguments. Secondly, with the flag -funbox-strict-fields GHC will flatten (or unbox) the strict fields. So we may imagine that there's the source constructor MkT and the representation constructor MkT, and things start to get pretty confusing.
GHC now generates three unique Names for each data constructor:
---- OccName ------ String Name space Used for --------------------------------------------------------------------------- The "source data con" MkT DataName The DataCon itself The "worker data con" MkT VarName Its worker Id aka "representation data con" The "wrapper data con" $WMkT VarName Its wrapper Id (optional)Recall that each occurrence name (OccName) is a pair of a string and a name space (see The truth about names), and two OccNames are considered the same only if both components match. That is what distinguishes the name of the name of the DataCon from the name of its worker Id. To keep things unambiguous, in what follows we'll write "MkT{d}" for the source data con, and "MkT{v}" for the worker Id. (Indeed, when you dump stuff with "-ddumpXXX", if you also add "-dppr-debug" you'll get stuff like "Foo {- d rMv -}". The "d" part is the name space; the "rMv" is the unique key.)
Each of these three names gets a distinct unique key in GHC's name cache.
data T a = MkT !(a,a) !Int | Nil f x = case x of Nil -> Nil MkT p q -> MkT p (q+1)When the parser reads it in, it decides which name space each lexeme comes from, thus:
data T a = MkT{d} !(a,a) !Int | Nil{d} f x = case x of Nil{d} -> Nil{d} MkT{d} p q -> MkT{d} p (q+1)Notice that in the Haskell source all data contructors are named via the "source data con" MkT{d}, whether in pattern matching or in expressions.
In the translated source produced by the type checker (-ddump-tc), the program looks like this:
f x = case x of Nil{d} -> Nil{v} MkT{d} p q -> $WMkT p (q+1)Notice that the type checker replaces the occurrence of MkT by the wrapper, but the occurrence of Nil by the worker. Reason: Nil doesn't have a wrapper because there is nothing to do in the wrapper (this is the vastly common case).
Though they are not printed out by "-ddump-tc", behind the scenes, there are also the following: the data type declaration and the wrapper function for MkT.
data T a = MkT{d} a a Int# | Nil{d} $WMkT :: (a,a) -> T a -> T a $WMkT p t = case p of (a,b) -> seq t (MkT{v} a b t)Here, the wrapper $WMkT evaluates and takes apart the argument p, evaluates the argument t, and builds a three-field data value with the worker constructor MkT{v}. (There are more notes below about the unboxing of strict fields.) The worker $WMkT is called an implicit binding, because it's introduced implicitly by the data type declaration (record selectors are also implicit bindings, for example). Implicit bindings are injected into the code just before emitting code or External Core.
After desugaring into Core (-ddump-ds), the definition of f looks like this:
f x = case x of Nil{d} -> Nil{v} MkT{d} a b r -> let { p = (a,b); q = I# r } in $WMkT p (q+1)Notice the way that pattern matching has been desugared to take account of the fact that the "real" data constructor MkT has three fields.
By the time the simplifier has had a go at it, f will be transformed to:
f x = case x of Nil{d} -> Nil{v} MkT{d} a b r -> MkT{v} a b (r +# 1#)Which is highly cool.
map MkT xsthen $WMkT will not be inlined (because it is not applied to anything). That is why we generate real top-level bindings for the wrapper functions, and generate code for them.
MkT{v} = \ p q r -> MkT{v} p q rThis is a real hack. The occurrence on the RHS is saturated, so the code generator (both the one that generates abstract C and the byte-code generator) treats it as a special case and allocates a MkT; it does not make a recursive call! So now there's a top-level curried version of the worker which is available to anyone who wants it.
This strange defintion is not emitted into External Core. Indeed, you might argue that we should instead pass the list of TyCons to the code generator and have it generate magic bindings directly. As it stands, it's a real hack: see the code in CorePrep.mkImplicitBinds.
data T a = MkT a a Int# | Nil{d} $WMkT :: (a,a) -> T a -> T a $WMkT p t = case p of (a,b) -> seq t (MkT a b t) f x = case x of Nil -> Nil MkT a b r -> MkT a b (r +# 1#)Notice that it makes perfect sense as a program all by itself. Constructors look like constructors (albeit not identical to the original Haskell ones).
When reading in External Core, the parser is careful to read it back in just as it was before it was spat out, namely:
data T a = MkT{d} a a Int# | Nil{d} $WMkT :: (a,a) -> T a -> T a $WMkT p t = case p of (a,b) -> seq t (MkT{v} a b t) f x = case x of Nil{d} -> Nil{v} MkT{d} a b r -> MkT{v} a b (r +# 1#)
case e of MkT p t -> ..p..t..GHC will desugar this to the following Core code:
case e of MkT a b t -> let p = (a,b) in ..p..t..The local let-binding reboxes the pair because it may be mentioned in the case alternative. This may well be a bad idea, which is why -funbox-strict-fields is an experimental feature.
It's essential that when importing a type T defined in some external module M, GHC knows what representation was used for that type, and that in turn depends on whether module M was compiled with -funbox-strict-fields. So when writing an interface file, GHC therefore records with each data type whether its strict fields (if any) should be unboxed.
Every data constructor Chas two info tables: