[[project @ 2002-10-15 08:56:50 by simonpj]
simonpj**20021015085655
Some reorganising. Simon and I agreed to leave out most of the (inaccurate) section
on primitive operations, pointing people instead to the Real Truth in primops.txt
and the External Core document.
Also: bugs section added
] {
addfile ./ghc/docs/users_guide/bugs.sgml
hunk ./ghc/docs/users_guide/bugs.sgml 1
+
+ Known bugs and infelicities
+
+
+
+ Haskell 98 vs. Glasgow Haskell: language non-compliance
+
+
+ GHC vs the Haskell 98 language
+ Haskell 98 language vs GHC
+
+ This section lists Glasgow Haskell infelicities in its
+ implementation of Haskell 98. See also the “when things
+ go wrong” section () for information
+ about crashes, space leaks, and other undesirable phenomena.
+
+ The limitations here are listed in Haskell Report order
+ (roughly).
+
+
+ Divergence from Haskell 98
+
+
+
+ Lexical syntax
+
+
+
+ The Haskell report specifies that programs may be
+ written using Unicode. GHC only accepts the ISO-8859-1
+ character set at the moment.
+
+
+
+ Certain lexical rules regarding qualified identifiers
+ are slightly different in GHC compared to the Haskell
+ report. When you have
+ module.reservedop,
+ such as M.\, GHC will interpret it as a
+ single qualified operator rather than the two lexemes
+ M and .\.
+
+
+
+ When is on, GHC
+ reserves several keywords beginning with two underscores.
+ This is due to the fact that GHC uses the same lexical
+ analyser for interface file parsing as it does for source
+ file parsing, and these keywords are used in interface
+ files. Do not use any identifiers beginning with a double
+ underscore in mode.
+
+
+
+
+
+ Context-free syntax
+
+
+
+ GHC doesn't do fixity resolution in expressions during
+ parsing. For example, according to the Haskell report, the
+ following expression is legal Haskell:
+
+ let x = 42 in x == 42 == True
+ and parses as:
+
+ (let x = 42 in x == 42) == True
+
+ because according to the report, the let
+ expression extends as far to the right as
+ possible. Since it can't extend past the second
+ equals sign without causing a parse error
+ (== is non-fix), the
+ let-expression must terminate there. GHC
+ simply gobbles up the whole expression, parsing like this:
+
+ (let x = 42 in x == 42 == True)
+
+ The Haskell report is arguably wrong here, but nevertheless
+ it's a difference between GHC & Haskell 98.
+
+
+
+
+
+ Expressions and patterns
+
+
+
+ Very long String constants:
+
+ May not go through. If you add a “string
+ gap” every few thousand characters, then the strings
+ can be as long as you like.
+
+ Bear in mind that string gaps and the
+
+ option don't mix very well (see
+ ).
+
+
+
+
+
+
+
+ Declarations and bindings
+
+ None known.
+
+
+
+
+ Module system and interface files
+
+
+
+
+ Namespace pollution
+
+ Several modules internal to GHC are visible in the
+ standard namespace. All of these modules begin with
+ Prel, so the rule is: don't use any
+ modules beginning with Prel in your
+ program, or you may be comprehensively screwed.
+
+
+
+
+
+
+
+ Numbers, basic types, and built-in classes
+
+
+
+ Multiply-defined array elements—not checked:
+
+ This code fragment should
+ elicit a fatal error, but it does not:
+
+
+main = print (array (1,1) [(1,2), (1,3)])
+
+
+
+
+
+
+
+
+
+ In Prelude support
+
+
+
+ The Char type
+ Charsize
+ of
+
+ The Haskell report says that the
+ Char type holds 16 bits. GHC follows
+ the ISO-10646 standard a little more closely:
+ maxBound :: Char in GHC is
+ 0x10FFFF.
+
+
+
+
+ Arbitrary-sized tuples:
+
+ Tuples are currently limited to size 61. HOWEVER:
+ standard instances for tuples (Eq,
+ Ord, Bounded,
+ IxRead, and
+ Show) are available
+ only up to 5-tuples.
+
+ This limitation is easily subvertible, so please ask
+ if you get stuck on it.
+
+
+
+
+
+
+
+ GHC's interpretation of undefined behaviour in
+ Haskell 98
+
+ This section documents GHC's take on various issues that are
+ left undefined or implementation specific in Haskell 98.
+
+
+
+ Sized integral types
+ Intsize of
+
+
+
+ In GHC the Int type follows the
+ size of an address on the host architecture; in other words
+ it holds 32 bits on a 32-bit machine, and 64-bits on a
+ 64-bit machine.
+
+ Arithmetic on Int is unchecked for
+ overflowoverflowInt
+ , so all operations on Int happen
+ modulo
+ 2n
+ where n is the size in bits of
+ the Int type.
+
+ The fromIntegerfromInteger
+ function (and hence
+ also fromIntegralfromIntegral
+ ) is a special case when
+ converting to Int. The value of
+ fromIntegral x :: Int is given by taking
+ the lower n bits of (abs
+ x), multiplied by the sign of x
+ (in 2's complement n-bit
+ arithmetic). This behaviour was chosen so that for example
+ writing 0xffffffff :: Int preserves the
+ bit-pattern in the resulting Int.
+
+
+ Negative literals, such as -3, are
+ specified by (a careful reading of) the Haskell Report as
+ meaning Prelude.negate (Prelude.fromInteger 3).
+ So -2147483648 means negate (fromInteger 2147483648).
+ Since fromInteger takes the lower 32 bits of the representation,
+ fromInteger (2147483648::Integer), computed at type Int is
+ -2147483648::Int. The negate operation then
+ overflows, but it is unchecked, so negate (-2147483648::Int) is just
+ -2147483648. In short, one can write minBound::Int as
+ a literal with the expected meaning (but that is not in general guaranteed.
+
+
+ The fromIntegral function also
+ preserves bit-patterns when converting between the sized
+ integral types (Int8,
+ Int16, Int32,
+ Int64 and the unsigned
+ Word variants), see the modules
+ Data.Int and Data.Word
+ in the library documentation.
+
+
+
+
+ Unchecked float arithmetic
+
+ Operations on Float and
+ Double numbers are
+ unchecked for overflow, underflow, and
+ other sad occurrences. (note, however that some
+ architectures trap floating-point overflow and
+ loss-of-precision and report a floating-point exception,
+ probably terminating the
+ program)floating-point
+ exceptions.
+
+
+
+
+
+
+
+
+
+
+ Known bugs
+
+GHC has the following known bugs:
+
+
+
+GHC's inliner can be persuaded into non-termination using the standard way to encode
+recursion via a data type:
+
+ data U = MkU (U -> Bool)
+
+ russel :: U -> Bool
+ russel u@(MkU p) = not $ p u
+
+ x :: Bool
+ x = russel (MkU russel)
+
+We have never found another program, other than this contrived one, that makes GHC
+diverge, and fixing the problem would impose an extra overhead on every compilation. So the
+bug remains un-fixed. There is more background in
+
+Secrets of the GHC inliner.
+
+
+
+
+
+
+
hunk ./ghc/docs/users_guide/glasgow_exts.sgml 155
-&primitives;
+
+
+ Unboxed types and primitive operations
+
+GHC is built on a raft of primitive data types and operations.
+While you really can use this stuff to write fast code,
+ we generally find it a lot less painful, and more satisfying in the
+ long run, to use higher-level language features and libraries. With
+ any luck, the code you write will be optimised to the efficient
+ unboxed version in any case. And if it isn't, we'd like to know
+ about it.
+
+We do not currently have good, up-to-date documentation about the
+primitives, perhaps because they are mainly intended for internal use.
+There used to be a long section about them here in the User Guide, but it
+became out of date, and wrong information is worse than none.
+
+The Real Truth about what primitive types there are, and what operations
+work over those types, is held in the file
+fptools/ghc/compiler/prelude/primops.txt.
+This file is used directly to generate GHC's primitive-operation definitions, so
+it is always correct! It is also intended for processing into text.
+
+ Indeed,
+the result of such processing is part of the description of the
+ External
+ Core language.
+So that document is a good place to look for a type-set version.
+We would be very happy if someone wanted to volunteer to produce an SGML
+back end to the program that processes primops.txt so that
+we could include the results here in the User Guide.
+
+What follows here is a brief summary of some main points.
+
+
+Unboxed types
+
+
+
+Unboxed types (Glasgow extension)
+
+
+Most types in GHC are boxed, which means
+that values of that type are represented by a pointer to a heap
+object. The representation of a Haskell Int, for
+example, is a two-word heap object. An unboxed
+type, however, is represented by the value itself, no pointers or heap
+allocation are involved.
+
+
+
+Unboxed types correspond to the “raw machine” types you
+would use in C: Int# (long int),
+Double# (double), Addr#
+(void *), etc. The primitive operations
+(PrimOps) on these types are what you might expect; e.g.,
+(+#) is addition on
+Int#s, and is the machine-addition that we all
+know and love—usually one instruction.
+
+
+
+Primitive (unboxed) types cannot be defined in Haskell, and are
+therefore built into the language and compiler. Primitive types are
+always unlifted; that is, a value of a primitive type cannot be
+bottom. We use the convention that primitive types, values, and
+operations have a # suffix.
+
+
+
+Primitive values are often represented by a simple bit-pattern, such
+as Int#, Float#,
+Double#. But this is not necessarily the case:
+a primitive value might be represented by a pointer to a
+heap-allocated object. Examples include
+Array#, the type of primitive arrays. A
+primitive array is heap-allocated because it is too big a value to fit
+in a register, and would be too expensive to copy around; in a sense,
+it is accidental that it is represented by a pointer. If a pointer
+represents a primitive value, then it really does point to that value:
+no unevaluated thunks, no indirections…nothing can be at the
+other end of the pointer than the primitive value.
+
+
+
+There are some restrictions on the use of primitive types, the main
+one being that you can't pass a primitive value to a polymorphic
+function or store one in a polymorphic data type. This rules out
+things like [Int#] (i.e. lists of primitive
+integers). The reason for this restriction is that polymorphic
+arguments and constructor fields are assumed to be pointers: if an
+unboxed integer is stored in one of these, the garbage collector would
+attempt to follow it, leading to unpredictable space leaks. Or a
+seq operation on the polymorphic component may
+attempt to dereference the pointer, with disastrous results. Even
+worse, the unboxed value might be larger than a pointer
+(Double# for instance).
+
+
+
+Nevertheless, A numerically-intensive program using unboxed types can
+go a lot faster than its “standard”
+counterpart—we saw a threefold speedup on one example.
+
+
+
+
+
+Unboxed Tuples
+
+
+
+Unboxed tuples aren't really exported by GHC.Exts,
+they're available by default with . An
+unboxed tuple looks like this:
+
+
+
+
+
+(# e_1, ..., e_n #)
+
+
+
+
+
+where e_1..e_n are expressions of any
+type (primitive or non-primitive). The type of an unboxed tuple looks
+the same.
+
+
+
+Unboxed tuples are used for functions that need to return multiple
+values, but they avoid the heap allocation normally associated with
+using fully-fledged tuples. When an unboxed tuple is returned, the
+components are put directly into registers or on the stack; the
+unboxed tuple itself does not have a composite representation. Many
+of the primitive operations listed in this section return unboxed
+tuples.
+
+
+
+There are some pretty stringent restrictions on the use of unboxed tuples:
+
+
+
+
+
+
+
+
+ Unboxed tuple types are subject to the same restrictions as
+other unboxed types; i.e. they may not be stored in polymorphic data
+structures or passed to polymorphic functions.
+
+
+
+
+
+
+ Unboxed tuples may only be constructed as the direct result of
+a function, and may only be deconstructed with a case expression.
+eg. the following are valid:
+
+
+
+f x y = (# x+1, y-1 #)
+g x = case f x x of { (# a, b #) -> a + b }
+
+
+
+but the following are invalid:
+
+
+
+f x y = g (# x, y #)
+g (# x, y #) = x + y
+
+
+
+
+
+
+
+
+ No variable can have an unboxed tuple type. This is illegal:
+
+
+
+f :: (# Int, Int #) -> (# Int, Int #)
+f x = x
+
+
+
+because x has an unboxed tuple type.
+
+
+
+
+
+
+
+
+
+Note: we may relax some of these restrictions in the future.
+
+
+
+The IO and ST monads use unboxed
+tuples to avoid unnecessary allocation during sequences of operations.
+
+
+
+
+
hunk ./ghc/docs/users_guide/lang.sgml 6
-&vs-hs
hunk ./ghc/docs/users_guide/ug-book.sgml 21
+&bugs;
hunk ./ghc/docs/users_guide/ug-ent.sgml 20
-
+
hunk ./ghc/docs/users_guide/vs_haskell.sgml 1
-
- Haskell 98 vs. Glasgow Haskell: language non-compliance
-
-
- GHC vs the Haskell 98 language
- Haskell 98 language vs GHC
-
- This section lists Glasgow Haskell infelicities in its
- implementation of Haskell 98. See also the “when things
- go wrong” section () for information
- about crashes, space leaks, and other undesirable phenomena.
-
- The limitations here are listed in Haskell Report order
- (roughly).
-
-
- Divergence from Haskell 98
-
-
-
- Lexical syntax
-
-
-
- The Haskell report specifies that programs may be
- written using Unicode. GHC only accepts the ISO-8859-1
- character set at the moment.
-
-
-
- Certain lexical rules regarding qualified identifiers
- are slightly different in GHC compared to the Haskell
- report. When you have
- module.reservedop,
- such as M.\, GHC will interpret it as a
- single qualified operator rather than the two lexemes
- M and .\.
-
-
-
- When is on, GHC
- reserves several keywords beginning with two underscores.
- This is due to the fact that GHC uses the same lexical
- analyser for interface file parsing as it does for source
- file parsing, and these keywords are used in interface
- files. Do not use any identifiers beginning with a double
- underscore in mode.
-
-
-
-
-
- Context-free syntax
-
-
-
- GHC doesn't do fixity resolution in expressions during
- parsing. For example, according to the Haskell report, the
- following expression is legal Haskell:
-
- let x = 42 in x == 42 == True
- and parses as:
-
- (let x = 42 in x == 42) == True
-
- because according to the report, the let
- expression extends as far to the right as
- possible. Since it can't extend past the second
- equals sign without causing a parse error
- (== is non-fix), the
- let-expression must terminate there. GHC
- simply gobbles up the whole expression, parsing like this:
-
- (let x = 42 in x == 42 == True)
-
- The Haskell report is arguably wrong here, but nevertheless
- it's a difference between GHC & Haskell 98.
-
-
-
-
-
- Expressions and patterns
-
-
-
- Very long String constants:
-
- May not go through. If you add a “string
- gap” every few thousand characters, then the strings
- can be as long as you like.
-
- Bear in mind that string gaps and the
-
- option don't mix very well (see
- ).
-
-
-
-
-
-
-
- Declarations and bindings
-
- None known.
-
-
-
-
- Module system and interface files
-
-
-
-
- Namespace pollution
-
- Several modules internal to GHC are visible in the
- standard namespace. All of these modules begin with
- Prel, so the rule is: don't use any
- modules beginning with Prel in your
- program, or you may be comprehensively screwed.
-
-
-
-
-
-
-
- Numbers, basic types, and built-in classes
-
-
-
- Multiply-defined array elements—not checked:
-
- This code fragment should
- elicit a fatal error, but it does not:
-
-
-main = print (array (1,1) [(1,2), (1,3)])
-
-
-
-
-
-
-
-
-
- In Prelude support
-
-
-
- The Char type
- Charsize
- of
-
- The Haskell report says that the
- Char type holds 16 bits. GHC follows
- the ISO-10646 standard a little more closely:
- maxBound :: Char in GHC is
- 0x10FFFF.
-
-
-
-
- Arbitrary-sized tuples:
-
- Tuples are currently limited to size 61. HOWEVER:
- standard instances for tuples (Eq,
- Ord, Bounded,
- IxRead, and
- Show) are available
- only up to 5-tuples.
-
- This limitation is easily subvertible, so please ask
- if you get stuck on it.
-
-
-
-
-
-
-
- GHC's interpretation of undefined behaviour in
- Haskell 98
-
- This section documents GHC's take on various issues that are
- left undefined or implementation specific in Haskell 98.
-
-
-
- Sized integral types
- Intsize of
-
-
-
- In GHC the Int type follows the
- size of an address on the host architecture; in other words
- it holds 32 bits on a 32-bit machine, and 64-bits on a
- 64-bit machine.
-
- Arithmetic on Int is unchecked for
- overflowoverflowInt
- , so all operations on Int happen
- modulo
- 2n
- where n is the size in bits of
- the Int type.
-
- The fromIntegerfromInteger
- function (and hence
- also fromIntegralfromIntegral
- ) is a special case when
- converting to Int. The value of
- fromIntegral x :: Int is given by taking
- the lower n bits of (abs
- x), multiplied by the sign of x
- (in 2's complement n-bit
- arithmetic). This behaviour was chosen so that for example
- writing 0xffffffff :: Int preserves the
- bit-pattern in the resulting Int.
-
-
- Negative literals, such as -3, are
- specified by (a careful reading of) the Haskell Report as
- meaning Prelude.negate (Prelude.fromInteger 3).
- So -2147483648 means negate (fromInteger 2147483648).
- Since fromInteger takes the lower 32 bits of the representation,
- fromInteger (2147483648::Integer), computed at type Int is
- -2147483648::Int. The negate operation then
- overflows, but it is unchecked, so negate (-2147483648::Int) is just
- -2147483648. In short, one can write minBound::Int as
- a literal with the expected meaning (but that is not in general guaranteed.
-
-
- The fromIntegral function also
- preserves bit-patterns when converting between the sized
- integral types (Int8,
- Int16, Int32,
- Int64 and the unsigned
- Word variants), see the modules
- Data.Int and Data.Word
- in the library documentation.
-
-
-
-
- Unchecked float arithmetic
-
- Operations on Float and
- Double numbers are
- unchecked for overflow, underflow, and
- other sad occurrences. (note, however that some
- architectures trap floating-point overflow and
- loss-of-precision and report a floating-point exception,
- probably terminating the
- program)floating-point
- exceptions.
-
-
-
-
-
-
-
-
-
rmfile ./ghc/docs/users_guide/vs_haskell.sgml
}