Probably the most important phase in the frontend is the type checker,
which is located at fptools/ghc/compiler/typecheck/
.
GHC type checks programs in their original Haskell form before the
desugarer converts them into Core code. This complicates the type
checker as it has to handle the much more verbose Haskell AST, but it
improves error messages, as those message are based on the same
structure that the user sees.
GHC defines the abstract syntax of Haskell programs in HsSyn
using a structure that abstracts over the concrete representation of
bound occurences of identifiers and patterns. The module TcHsSyn
defines a number of helper function required by the type checker. Note
that the type TcRnTypes
.TcId
used to represent identifiers in some signatures during type checking
is, in fact, nothing but a synonym for a plain
Id
.
It is also noteworthy, that the representations of types changes during
type checking from HsType
to TypeRep.Type
.
The latter is a hybrid type representation that
is used to type Core, but still contains sufficient information to
recover source types. In particular, the type checker maintains and
compares types in their Type
form.
The interface of the type checker (and renamer) to the rest of the compiler is provided
by TcRnDriver
.
Entire modules are processed by calling tcRnModule
and GHCi
uses tcRnStmt
, tcRnExpr
, and
tcRnType
to typecheck statements and expressions, and to
kind check types, respectively. Moreover, tcRnExtCore
is
provided to typecheck external Core code. Moreover,
tcTopSrcDecls
is used by Template Haskell - more
specifically by TcSplice.tc_bracket
- to type check the contents of declaration brackets.
The function tcRnModule
controls the complete static
analysis of a Haskell module. It sets up the combined renamer and type
checker monad, resolves all import statements, initiates the actual
renaming and type checking process, and finally, wraps off by processing
the export list.
The actual type checking and renaming process is initiated via
TcRnDriver.tcRnSrcDecls
, which uses a helper called
tc_rn_src_decls
to implement the iterative renaming and
type checking process required by Template
Haskell. However, before it invokes tc_rn_src_decls
,
it takes care of hi-boot files; afterwards, it simplifies type
constraints and zonking (see below regarding the later).
The function tc_rn_src_decls
partitions static analysis of
a whole module into multiple rounds, where the initial round is followed
by an additional one for each toplevel splice. It collects all
declarations up to the next splice into an HsDecl.HsGroup
to rename and type check that declaration group by calling
TcRnDriver.tcRnGroup
. Afterwards, it executes the
splice (if there are any left) and proceeds to the next group, which
includes the declarations produced by the splice.
The function tcRnGroup
, finally, gets down to invoke the
actual renaming and type checking via
TcRnDriver.rnTopSrcDecls
and
TcRnDriver.tcTopSrcDecls
, respectively. The renamer, apart
from renaming, computes the global type checking environment, of type
TcRnTypes.TcGblEnv
, which is stored in the type checking
monad before type checking commences.
The type checking of a declaration group, performed by
tcTopSrcDecls
starts by processing of the type and class
declarations of the current module, using the function
TcTyClsDecls.tcTyAndClassDecls
. This is followed by a
first round over instance declarations using
TcInstDcls.tcInstDecls1
, which in particular generates all
additional bindings due to the deriving process. Then come foreign
import declarations (TcForeign.tcForeignImports
) and
default declarations (TcDefaults.tcDefaults
).
Now, finally, toplevel value declarations (including derived ones) are
type checked using TcBinds.tcTopBinds
. Afterwards,
TcInstDcls.tcInstDecls2
traverses instances for the second
time. Type checking concludes with processing foreign exports
(TcForeign.tcForeignExports
) and rewrite rules
(TcRules.tcRules
). Finally, the global environment is
extended with the new bindings.
Type and class declarations are type checked in a couple of phases that
contain recursive dependencies - aka knots. The first knot
encompasses almost the whole type checking of these declarations and
forms the main piece of TcTyClsDecls.tcTyAndClassDecls
.
Inside this big knot, the first main operation is kind checking, which
again involves a knot. It is implemented by kcTyClDecls
,
which performs kind checking of potentially recursively-dependent type
and class declarations using kind variables for initially unknown kinds.
During processing the individual declarations some of these variables
will be instantiated depending on the context; the rest gets by default
kind *
(during zonking of the kind signatures).
Type synonyms are treated specially in this process, because they can
have an unboxed type, but they cannot be recursive. Hence, their kinds
are inferred in dependency order. Moreover, in contrast to class
declarations and other type declarations, synonyms are not entered into
the global environment as a global TyThing
.
(TypeRep.TyThing
is a sum type that combines the various
flavours of typish entities, such that they can be stuck into type
environments and similar.)
During type checking type variables are represented by mutable variables
- cf. the variable story. Consequently,
unification can instantiate type variables by updating those mutable
variables. This process of instantiation is (for reasons that elude me)
called zonking
in GHC's sources. The zonking routines for the various forms of Haskell
constructs are responsible for most of the code in the module TcHsSyn
,
whereas the routines that actually operate on mutable types are defined
in TcMType
;
this includes the zonking of type variables and type terms, routines to
create mutable structures and update them as well as routines that check
constraints, such as that type variables in function signatures have not
been instantiated during type checking. The actual type unification
routine is uTys
in the module TcUnify
.
All type variables that may be instantiated (those in signatures
may not), but haven't been instantiated during type checking, are zonked
to ()
, so that after type checking all mutable variables
have been eliminated.
The representation of types is fixed in the module TcRep
and exported as the data type Type
. As explained in TcType
,
GHC supports rank-N types, but, in the type checker, maintains the
restriction that type variables cannot be instantiated to quantified
types (i.e., the type system is predicative). The type checker floats
universal quantifiers outside and maintains types in prenex form.
(However, quantifiers can, of course, not float out of negative
positions.) Overall, we have
sigma -> forall tyvars. phi phi -> theta => rho rho -> sigma -> rho | tau tau -> tyvar | tycon tau_1 .. tau_n | tau_1 tau_2 | tau_1 -> tau_2
where sigma
is in prenex form; i.e., there is never a
forall to the right of an arrow in a phi
type. Moreover, a
type of the form tau
never contains a quantifier (which
includes arguments to type constructors).
Of particular interest are the variants SourceTy
and
NoteTy
of TypeRep
.Type
.
The constructor SourceTy :: SourceType -> Type
represents a
type constraint; that is, a predicate over types represented by a
dictionary. The type checker treats a SourceTy
as opaque,
but during the translation to core it will be expanded into its concrete
representation (i.e., a dictionary type) by the function Type
.sourceTypeRep
.
Note that newtypes are not covered by SourceType
s anymore,
even if some comments in GHC still suggest this. Instead, all newtype
applications are initially represented as a NewTcApp
, until
they are eliminated by calls to Type
.newTypeRep
.
The NoteTy
constructor is used to add non-essential
information to a type term. Such information has the type
TypeRep.TyNote
and is either the set of free type variables
of the annotated expression or the unexpanded version of a type synonym.
Free variables sets are cached as notes to save the overhead of
repeatedly computing the same set for a given term. Unexpanded type
synonyms are useful for generating comprehensible error messages, but
have no influence on the process of type checking.
During type checking, GHC maintains a type environment whose
type definitions are fixed in the module TcRnTypes
with the operations defined in
TcEnv
.
Among other things, the environment contains all imported and local
instances as well as a list of global entities (imported and
local types and classes together with imported identifiers) and
local entities (locally defined identifiers). This environment
is threaded through the type checking monad, whose support functions
including initialisation can be found in the module TcRnMonad
.
Expressions are type checked by TcExpr
.
Usage occurences of identifiers are processed by the function
tcId
whose main purpose is to instantiate
overloaded identifiers. It essentially calls
TcInst.instOverloadedFun
once for each universally
quantified set of type constraints. It should be noted that overloaded
identifiers are replaced by new names that are first defined in the LIE
(Local Instance Environment?) and later promoted into top-level
bindings.
GHC implements overloading using so-called dictionaries. A dictionary is a tuple of functions -- one function for each method in the class of which the dictionary implements an instance. During type checking, GHC replaces each type constraint of a function with one additional argument. At runtime, the extended function gets passed a matching class dictionary by way of these additional arguments. Whenever the function needs to call a method of such a class, it simply extracts it from the dictionary.
This sounds simple enough; however, the actual implementation is a bit
more tricky as it wants to keep track of all the instances at which
overloaded functions are used in a module. This information is useful
to optimise the code. The implementation is the module Inst.lhs
.
The function instOverloadedFun
is invoked for each
overloaded usage occurence of an identifier, where overloaded means that
the type of the idendifier contains a non-trivial type constraint. It
proceeds in two steps: (1) Allocation of a method instance
(newMethodWithGivenTy
) and (2) instantiation of functional
dependencies. The former implies allocating a new unique identifier,
which replaces the original (overloaded) identifier at the currently
type-checked usage occurrence.
The new identifier (after being threaded through the LIE) eventually
will be bound by a top-level binding whose rhs contains a partial
application of the original overloaded identifier. This papp applies
the overloaded function to the dictionaries needed for the current
instance. In GHC lingo, this is called a method. Before
becoming a top-level binding, the method is first represented as a value
of type Inst.Inst
, which makes it easy to fold multiple
instances of the same identifier at the same types into one global
definition. (And probably other things, too, which I haven't
investigated yet.)
Note: As of 13 January 2001 (wrt. to the code in the
CVS HEAD), the above mechanism interferes badly with RULES pragmas
defined over overloaded functions. During instantiation, a new name is
created for an overloaded function partially applied to the dictionaries
needed in a usage position of that function. As the rewrite rule,
however, mentions the original overloaded name, it won't fire anymore
-- unless later phases remove the intermediate definition again. The
latest CVS version of GHC has an option
-fno-method-sharing
, which avoids sharing instantiation
stubs. This is usually/often/sometimes sufficient to make the rules
fire again.
Last modified: Thu May 12 22:52:46 EST 2005