The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like

Variables

The Var type, defined in basicTypes/Var.lhs, represents variables, both term variables and type variables:
    data Var
      = Var {
	    varName    :: Name,
	    realUnique :: FastInt,
	    varType    :: Type,
	    varDetails :: VarDetails,
	    varInfo    :: IdInfo		
	}

It's often fantastically convenient to have term variables and type variables share a single data type. For example,

  exprFreeVars :: CoreExpr -> VarSet
If there were two types, we'd need to return two sets. Simiarly, big lambdas and little lambdas use the same constructor in Core, which is extremely convenient.

We define a couple of type synonyms:

  type Id    = Var  -- Term variables
  type TyVar = Var  -- Type variables
just to help us document the occasions when we are expecting only term variables, or only type variables.

The VarDetails field

The VarDetails field tells what kind of variable this is:
data VarDetails
  = LocalId 		-- Used for locally-defined Ids (see NOTE below)
	LocalIdDetails

  | GlobalId 		-- Used for imported Ids, dict selectors etc
	GlobalIdDetails

  | TyVar
  | MutTyVar (IORef (Maybe Type)) 	-- Used during unification;
	     TyVarDetails

Type variables (TyVar)

The TyVar case is self-explanatory. The MutTyVar case is used only during type checking. Then a type variable can be unified, using an imperative update, with a type, and that is what the IORef is for. The TcType.TyVarDetails field records the sort of type variable we are dealing with. It is defined as

data TyVarDetails = SigTv | ClsTv | InstTv | VanillaTv
SigTv marks type variables that were introduced when instantiating a type signature prior to matching it against the inferred type of a definition. The variants ClsTv and InstTv mark scoped type variables introduced by class and instance heads, respectively. These first three sorts of type variables are skolem variables (tested by the predicate isSkolemTyVar); i.e., they must not be instantiated. All other type variables are marked as VanillaTv.

For a long time I tried to keep mutable Vars statically type-distinct from immutable Vars, but I've finally given up. It's just too painful. After type checking there are no MutTyVars left, but there's no static check of that fact.

Term variables (Id)

A term variable (of type Id) is represented either by a LocalId or a GlobalId:

A GlobalId is

A LocalId is:

The key thing about LocalIds is that the free-variable finder typically treats them as candidate free variables. That is, it ignores GlobalIds such as imported constants, data contructors, etc.

An important invariant is this: All the bindings in the module being compiled (whether top level or not) are LocalIds until the CoreTidy phase. In the CoreTidy phase, all externally-visible top-level bindings are made into GlobalIds. This is the point when a LocalId becomes "frozen" and becomes a fixed, immutable GlobalId.

(A binding is "externally-visible" if it is exported, or mentioned in the unfolding of an externally-visible Id. An externally-visible Id may not have an unfolding, either because it is too big, or because it is the loop-breaker of a recursive group.)

Global Ids and implicit Ids

GlobalIds are further categorised by their GlobalIdDetails. This type is defined in basicTypes/IdInfo, because it mentions other structured types like DataCon. Unfortunately it is *used* in Var.lhs so there's a hi-boot knot to get it there. Anyway, here's the declaration:
data GlobalIdDetails
  = NotGlobalId			-- Used as a convenient extra return value 
                                -- from globalIdDetails

  | VanillaGlobal		-- Imported from elsewhere

  | PrimOpId PrimOp		-- The Id for a primitive operator
  | FCallId ForeignCall		-- The Id for a foreign call

  -- These next ones are all "implicit Ids"
  | RecordSelId FieldLabel	-- The Id for a record selector
  | DataConId DataCon		-- The Id for a data constructor *worker*
  | DataConWrapId DataCon	-- The Id for a data constructor *wrapper*
				-- [the only reasons we need to know is so that
				--  a) we can  suppress printing a definition in the interface file
				--  b) when typechecking a pattern we can get from the
				--     Id back to the data con]
The GlobalIdDetails allows us to go from the Id for a record selector, say, to its field name; or the Id for a primitive operator to the PrimOp itself.

Certain GlobalIds are called "implicit" Ids. An implicit Id is derived by implication from some other declaration. So a record selector is derived from its data type declaration, for example. An implicit Ids is always a GlobalId. For most of the compilation, the implicit Ids are just that: implicit. If you do -ddump-simpl you won't see their definition. (That's why it's true to say that until CoreTidy all Ids in this compilation unit are LocalIds.) But at CorePrep, a binding is added for each implicit Id defined in this module, so that the code generator will generate code for the (curried) function.

Implicit Ids carry their unfolding inside them, of course, so they may well have been inlined much earlier; but we generate the curried top-level defn just in case its ever needed.

LocalIds

The LocalIdDetails gives more info about a LocalId:
data LocalIdDetails 
  = NotExported	-- Not exported
  | Exported	-- Exported
  | SpecPragma	-- Not exported, but not to be discarded either
		-- It's unclean that this is so deeply built in
From this we can tell whether the LocalId is exported, and that tells us whether we can drop an unused binding as dead code.

The SpecPragma thing is a HACK. Suppose you write a SPECIALIZE pragma:

   foo :: Num a => a -> a
   {-# SPECIALIZE foo :: Int -> Int #-}
   foo = ...
The type checker generates a dummy call to foo at the right types:
   $dummy = foo Int dNumInt
The Id $dummy is marked SpecPragma. Its role is to hang onto that call to foo so that the specialiser can see it, but there are no calls to $dummy. The simplifier is careful not to discard SpecPragma Ids, so that it reaches the specialiser. The specialiser processes the right hand side of a SpecPragma Id to find calls to overloaded functions, and then discards the SpecPragma Id. So SpecPragma behaves a like Exported, at least until the specialiser.

ExternalNames and InternalNames

Notice that whether an Id is a LocalId or GlobalId is not the same as whether the Id has an ExternaName or an InternalName (see "The truth about Names"): Last modified: Fri Sep 12 15:17:18 BST 2003