The GHC Commentary - Supporting multi-threaded interoperation

Authors: sof@galois.com, simonmar@microsoft.com
Date: April 2002

This document presents the implementation of an extension to Concurrent Haskell that provides two enhancements:

A Concurrent Haskell thread may call an external (e.g., C) function in a manner that's transparent to the execution/evaluation of other Haskell threads. Section Calling out" covers this.
OS threads may safely call Haskell functions concurrently. Section "Calling in" covers this.

The problem: foreign calls that block

When a Concurrent Haskell(CH) thread calls a 'foreign import'ed function, the runtime system(RTS) has to handle this in a manner transparent to other CH threads. That is, they shouldn't be blocked from making progress while the CH thread executes the external call. Presently, all threads will block.

Clearly, we have to rely on OS-level threads in order to support this kind of concurrency. The implementation described here defines the (abstract) OS threads interface that the RTS assumes. The implementation currently provides two instances of this interface, one for POSIX threads (pthreads) and one for the Win32 threads.

Multi-threading the RTS

A simple and efficient way to implement non-blocking foreign calls is like this:

Invariant: only one OS thread is allowed to execute code inside of the GHC runtime system. [There are alternate designs, but I won't go into details on their pros and cons here.] We'll call the OS thread that is currently running Haskell threads the Current Haskell Worker Thread.
The Current Haskell Worker Thread repeatedly grabs a Haskell thread, executes it until its time-slice expires or it blocks on an MVar, then grabs another, and executes that, and so on.
When the Current Haskell Worker comes to execute a potentially blocking 'foreign import', it leaves the RTS and ceases being the Current Haskell Worker, but before doing so it makes certain that another OS worker thread is available to become the Current Haskell Worker. Consequently, even if the external call blocks, the new Current Haskell Worker continues execution of the other Concurrent Haskell threads. When the external call eventually completes, the Concurrent Haskell thread that made the call is passed the result and made runnable again.
A pool of OS threads are constantly trying to become the Current Haskell Worker. Only one succeeds at any moment. If the pool becomes empty, the RTS creates more workers.
The OS worker threads are regarded as interchangeable. A given Haskell thread may, during its lifetime, be executed entirely by one OS worker thread, or by more than one. There's just no way to tell.
If a foreign program wants to call a Haskell function, there is always a thread switch involved. The foreign program uses thread-safe mechanisms to create a Haskell thread and make it runnable; and the current Haskell Worker Thread exectutes it. See Section Calling in.

The rest of this section describes the mechanics of implementing all this. There's two parts to it, one that describes how a native (OS) thread leaves the RTS to service the external call, the other how the same thread handles returning the result of the external call back to the Haskell thread.

Making the external call

Presently, GHC handles 'safe' C calls by effectively emitting the following code sequence:

    ...save thread state...
    t = suspendThread();
    r = foo(arg1,...,argn);
    resumeThread(t);
    ...restore thread state...
    return r;

After having squirreled away the state of a Haskell thread, Schedule.c:suspendThread() is called which puts the current thread on a list [Schedule.c:suspended_ccalling_threads] containing threads that are currently blocked waiting for external calls to complete (this is done for the purposes of finding roots when garbage collecting).

In addition to putting the Haskell thread on suspended_ccalling_threads, suspendThread() now also does the following:

Instructs the Task Manager to make sure that there's a another native thread waiting in the wings to take over the execution of Haskell threads. This might entail creating a new worker thread or re-using one that's currently waiting for more work to do. The Task Manager section presents the functionality provided by this subsystem.
Releases its capability to execute within the RTS. By doing so, another worker thread will become unblocked and start executing code within the RTS. See the Capability section for details.
suspendThread() returns a token which is used to identify the Haskell thread that was added to suspended_ccalling_threads. This is done so that once the external call has completed, we know what Haskell thread to pull off the suspended_ccalling_threads list.

Upon return from suspendThread(), the OS thread is free of its RTS executing responsibility, and can now invoke the external call. Meanwhile, the other worker thread that have now gained access to the RTS will continue executing Concurrent Haskell code. Concurrent 'stuff' is happening!

Returning the external result

When the native thread eventually returns from the external call, the result needs to be communicated back to the Haskell thread that issued the external call. The following steps takes care of this:

The returning OS thread calls Schedule.c:resumeThread(), passing along the token referring to the Haskell thread that made the call we're returning from.
The OS thread then tries to grab hold of a returning worker capability, via Capability.c:grabReturnCapability(). Until granted, the thread blocks waiting for RTS permissions. Clearly we don't want the thread to be blocked longer than it has to, so whenever a thread that is executing within the RTS enters the Scheduler (which is quite often, e.g., when a Haskell thread context switch is made), it checks to see whether it can give up its RTS capability to a returning worker, which is done by calling Capability.c:yieldToReturningWorker().
If a returning worker is waiting (the code in Capability.c keeps a counter of the number of returning workers that are currently blocked waiting), it is woken up and the given the RTS execution priviledges/capabilities of the worker thread that gave up its.
The thread that gave up its capability then tries to re-acquire the capability to execute RTS code; this is done by calling Capability.c:waitForWorkCapability().
The returning worker that was woken up will continue execution in resumeThread(), removing its associated Haskell thread from the suspended_ccalling_threads list and start evaluating that thread, passing it the result of the external call.

RTS execution

If a worker thread inside the RTS runs out of runnable Haskell threads, it goes to sleep waiting for the external calls to complete. It does this by calling waitForWorkCapability

The availability of new runnable Haskell threads is signalled when:

When an external call is set up in suspendThread().
When a new Haskell thread is created (e.g., whenever Concurrent.forkIO is called from within Haskell); this is signalled in Schedule.c:scheduleThread_().
Whenever a Haskell thread is removed from a 'blocking queue' attached to an MVar (only?).

Calling in

Providing robust support for having multiple OS threads calling into Haskell is not as involved as its dual.

The OS thread issues the call to a Haskell function by going via the Rts API (as specificed in RtsAPI.h).
Making the function application requires the construction of a closure on the heap. This is done in a thread-safe manner by having the OS thread lock a designated block of memory (the 'Rts API' block, which is part of the GC's root set) for the short period of time it takes to construct the application.
The OS thread then creates a new Haskell thread to execute the function application, which (eventually) boils down to calling Schedule.c:createThread()
Evaluation is kicked off by calling Schedule.c:scheduleExtThread(), which asks the Task Manager to possibly create a new worker (OS) thread to execute the Haskell thread.
After the OS thread has done this, it blocks waiting for the Haskell thread to complete the evaluation of the Haskell function.
The reason why a separate worker thread is made to evaluate the Haskell function and not the OS thread that made the call-in via the Rts API, is that we want that OS thread to return as soon as possible. We wouldn't be able to guarantee that if the OS thread entered the RTS to (initially) just execute its function application, as the Scheduler may side-track it and also ask it to evaluate other Haskell threads.

Note: As of 20020413, the implementation of the RTS API only serializes access to the allocator between multiple OS threads wanting to call into Haskell (via the RTS API.) It does not coordinate this access to the allocator with that of the OS worker thread that's currently executing within the RTS. This weakness/bug is scheduled to be tackled as part of an overhaul/reworking of the RTS API itself.

Subsystems introduced/modified

These threads extensions affect the Scheduler portions of the runtime system. To make it more manageable to work with, the changes introduced a couple of new RTS 'sub-systems'. This section presents the functionality and API of these sub-systems.

Capabilities

A Capability represent the token required to execute STG code, and all the state an OS thread/task needs to run Haskell code: its STG registers, a pointer to its TSO, a nursery etc. During STG execution, a pointer to the capabilitity is kept in a register (BaseReg).

Only in an SMP build will there be multiple capabilities, for the threaded RTS and other non-threaded builds, there is only one global capability, namely MainCapability.

The Capability API is as follows:

/* Capability.h */
extern void initCapabilities(void);

extern void grabReturnCapability(Mutex* pMutex, Capability** pCap);
extern void waitForWorkCapability(Mutex* pMutex, Capability** pCap, rtsBool runnable);
extern void releaseCapability(Capability* cap);

extern void yieldToReturningWorker(Mutex* pMutex, Capability* cap);

extern void grabCapability(Capability** cap);

initCapabilities() initialises the subsystem.
grabReturnCapability() is called by worker threads returning from an external call. It blocks them waiting to gain permissions to do so.
waitForWorkCapability() is called by worker threads already inside the RTS, but without any work to do. It blocks them waiting for there to new work to become available.
releaseCapability() hands back a capability. If a 'returning worker' is waiting, it is signalled that a capability has become available. If not, releaseCapability() tries to signal worker threads that are blocked waiting inside waitForWorkCapability() that new work might now be available.
yieldToReturningWorker() is called by the worker thread that's currently inside the Scheduler. It checks whether there are other worker threads waiting to return from making an external call. If so, they're given preference and a capability is transferred between worker threads. One of the waiting 'returning worker' threads is signalled and made runnable, with the other, yielding, worker blocking to re-acquire a capability.

The condition variables used to implement the synchronisation between worker consumers and providers are local to the Capability implementation. See source for details and comments.

The Task Manager

The Task Manager API is responsible for managing the creation of OS worker RTS threads. When a Haskell thread wants to make an external call, the Task Manager is asked to possibly create a new worker thread to take over the RTS-executing capability of the worker thread that's exiting the RTS to execute the external call.

The Capability subsystem keeps track of idle worker threads, so making an informed decision about whether or not to create a new OS worker thread is easy work for the task manager. The Task manager provides the following API:

/* Task.h */
extern void startTaskManager ( nat maxTasks, void (*taskStart)(void) );
extern void stopTaskManager ( void );

extern void startTask ( void (*taskStart)(void) );

startTaskManager() and stopTaskManager() starts up and shuts down the subsystem. When starting up, you have the option to limit the overall number of worker threads that can be created. An unbounded (modulo OS thread constraints) number of threads is created if you pass '0'.
startTask() is called when a worker thread calls suspendThread() to service an external call, asking another worker thread to take over its RTS-executing capability. It is also called when an external OS thread invokes a Haskell function via the Rts API.

Native threads API

To hide OS details, the following API is used by the task manager and the scheduler to interact with an OS' threads API:

/* OSThreads.h */
typedef ..OS specific.. Mutex;
extern void initMutex    ( Mutex* pMut );
extern void grabMutex    ( Mutex* pMut );
extern void releaseMutex ( Mutex* pMut );
  
typedef ..OS specific.. Condition;
extern void    initCondition      ( Condition* pCond );
extern void    closeCondition     ( Condition* pCond );
extern rtsBool broadcastCondition ( Condition* pCond );
extern rtsBool signalCondition    ( Condition* pCond );
extern rtsBool waitCondition      ( Condition* pCond, 
				    Mutex* pMut );

extern OSThreadId osThreadId      ( void );
extern void shutdownThread        ( void );
extern void yieldThread           ( void );
extern int  createOSThread        ( OSThreadId* tid,
				    void (*startProc)(void) );

User-level interface

To signal that you want an external call to be serviced by a separate OS thread, you have to add the attribute threadsafe to a foreign import declaration, i.e.,

foreign import "bigComp" threadsafe largeComputation :: Int -> IO ()

The distinction between 'safe' and thread-safe C calls is made so that we may call external functions that aren't re-entrant but may cause a GC to occur.

The threadsafe attribute subsumes safe.

Building the GHC RTS

The multi-threaded extension isn't currently enabled by default. To have it built, you need to run the fptools configure script with the extra option --enable-threaded-rts turned on, and then proceed to build the compiler as per normal.

Last modified: Wed Apr 10 14:21:57 Pacific Daylight Time 2002