The GHC Commentary - The Glorious Driver

The Glorious Driver (GD) is the part of GHC that orchestrates the interaction of all the other pieces that make up GHC. It supersedes the Evil Driver (ED), which was a Perl script that served the same purpose and was in use until version 4.08.1 of GHC. Simon Marlow eventually slayed the ED and instated the GD. The GD is usually called the Compilation Manager these days.

The GD has been substantially extended for GHCi, i.e., the interactive variant of GHC that integrates the compiler with a (meta-circular) interpreter since version 5.00. Most of the driver is located in the directory fptools/ghc/compiler/main/.

Command Line Options

GHC's many flavours of command line options make the code interpreting them rather involved. The following provides a brief overview of the processing of these options. Since the addition of the interactive front-end to GHC, there are two kinds of options: static options and dynamic options. The former can only be set when the system is invoked, whereas the latter can be altered in the course of an interactive session. A brief explanation on the difference between these options and related matters is at the start of the module CmdLineOpts. The same module defines the enumeration DynFlag, which contains all dynamic flags. Moreover, there is the labelled record DynFlags that collects all the flag-related information that is passed by the compilation manager to the compiler proper, hsc, whenever a compilation is triggered. If you like to find out whether an option is static, use the predicate isStaticHscFlag in the same module.

The second module that contains a lot of code related to the management of flags is DriverFlags.hs. In particular, the module contains two association lists that map the textual representation of the various flags to a data structure that tells the driver how to parse the flag (e.g., whether it has any arguments) and provides its internal representation. All static flags are contained in static_flags. A whole range of -f flags can be negated by adding a -f-no- prefix. These flags are contained in the association list fFlags.

The driver uses a nasty hack based on IORefs that permits the rest of the compiler to access static flags as CAFs; i.e., there is a family of toplevel variable definitions in CmdLineOpts, below the literate section heading Static options, each of which contains the value of one static option. This is essentially realised via global variables (in the sense of C-style, updatable, global variables) defined via an evil pre-processor macro named GLOBAL_VAR, which is defined in a particularly ugly corner of GHC, namely the C header file HsVersions.h.

What Happens When

Inside the Haskell compiler proper (hsc), a whole series of stages (``passes'') are executed in order to transform your Haskell program into C or native code. This process is orchestrated by main/HscMain.hscMain and its relative hscReComp. The latter directly invokes, in order, the parser, the renamer, the typechecker, the desugarer, the simplifier (Core2Core), the CoreTidy pass, the CorePrep pass, conversion to STG (CoreToStg), the interface generator (MkFinalIface), the code generator, and code output. The simplifier is the most complex of these, and is made up of many sub-passes. These are controlled by buildCoreToDo, as described below.

Scheduling Optimisations Phases

GHC has a large variety of optimisations at its disposal, many of which have subtle interdependencies. The overall plan for program optimisation is fixed in DriverState.hs. First of all, there is the variable hsc_minusNoO_flags that determines the -f options that you get without -O (aka optimisation level 0) as well as hsc_minusO_flags and hsc_minusO2_flags for -O and -O2.

However, most of the strategic decisions about optimisations on the intermediate language Core are encoded in the value produced by buildCoreToDo, which is a list with elements of type CoreToDo. Each element of this list specifies one step in the sequence of core optimisations executed by the Mighty Simplifier. The type CoreToDo is defined in CmdLineOpts.lhs. The actual execution of the optimisation plan produced by buildCoreToDo is performed by SimpleCore.doCorePasses. Core optimisation plans consist of a number of simplification phases (currently, three for optimisation levels of 1 or higher) with decreasing phase numbers (the lowest, corresponding to the last phase, namely 0). Before and after these phases, optimisations such as specialisation, let floating, worker/wrapper, and so on are executed. The sequence of phases is such that the synergistic effect of the phases is maximised -- however, this is a fairly fragile arrangement.

There is a similar construction for optimisations on STG level stored in the variable buildStgToDo :: [StgToDo]. However, this is a lot less complex than the arrangement for Core optimisations.

Linking the `RTS` and `libHSstd`

Since the RTS and HSstd refer to each other, there is a Cunning Hack to avoid putting them each on the command-line twice or thrice (aside: try asking for `plaice and chips thrice' in a fish and chip shop; bet you only get two lots). The hack involves adding the symbols that the RTS needs from libHSstd, such as PrelWeak_runFinalizzerBatch_closure and __stginit_Prelude, to the link line with the -u flag. The standard library appears before the RTS on the link line, and these options cause the corresponding symbols to be picked up even so the linked might not have seen them being used as the RTS appears later on the link line. As a result, when the RTS is also scanned, these symbols are already resolved. This avoids the linker having to read the standard library and RTS multiple times.

This does, however, leads to a complication. Normal Haskell programs do not have a main() function, so this is supplied by the RTS (in the file Main.c). It calls startupHaskell, which itself calls __stginit_PrelMain, which is therefore, since it occurs in the standard library, one of the symbols passed to the linker using the -u option. This is fine for standalone Haskell programs, but as soon as the Haskell code is only used as part of a program implemented in a foreign language, the main() function of that foreign language should be used instead of that of the Haskell runtime. In this case, the previously described arrangement unfortunately fails as __stginit_PrelMain had better not be linked in, because it tries to call __stginit_Main, which won't exist. In other words, the RTS's main() refers to __stginit_PrelMain which in turn refers to __stginit_Main. Although the RTS's main() might not be linked in if the program provides its own, the driver will normally force __stginit_PrelMain to be linked in anyway, using -u, because it's a back-reference from the RTS to HSstd. This case is coped with by the -no-hs-main flag, which suppresses passing the corresonding -u option to the linker -- although in some versions of the compiler (e.g., 5.00.2) it didn't work. In addition, the driver generally places the C program providing the main() that we want to use before the RTS on the link line. Therefore, the RTS's main is never used and without the -u the label __stginit_PrelMain will not be linked.

Last modified: Tue Feb 19 11:09:00 UTC 2002

The GHC Commentary - The Glorious Driver

Command Line Options

What Happens When

Scheduling Optimisations Phases

Linking the RTS and libHSstd

Linking the `RTS` and `libHSstd`