[[project @ 2002-10-15 08:56:50 by simonpj] simonpj**20021015085655 Some reorganising. Simon and I agreed to leave out most of the (inaccurate) section on primitive operations, pointing people instead to the Real Truth in primops.txt and the External Core document. Also: bugs section added ] { addfile ./ghc/docs/users_guide/bugs.sgml hunk ./ghc/docs/users_guide/bugs.sgml 1 + + Known bugs and infelicities + + + + Haskell 98 vs. Glasgow Haskell: language non-compliance + + + GHC vs the Haskell 98 language + Haskell 98 language vs GHC + + This section lists Glasgow Haskell infelicities in its + implementation of Haskell 98. See also the “when things + go wrong” section () for information + about crashes, space leaks, and other undesirable phenomena. + + The limitations here are listed in Haskell Report order + (roughly). + + + Divergence from Haskell 98 + + + + Lexical syntax + + + + The Haskell report specifies that programs may be + written using Unicode. GHC only accepts the ISO-8859-1 + character set at the moment. + + + + Certain lexical rules regarding qualified identifiers + are slightly different in GHC compared to the Haskell + report. When you have + module.reservedop, + such as M.\, GHC will interpret it as a + single qualified operator rather than the two lexemes + M and .\. + + + + When is on, GHC + reserves several keywords beginning with two underscores. + This is due to the fact that GHC uses the same lexical + analyser for interface file parsing as it does for source + file parsing, and these keywords are used in interface + files. Do not use any identifiers beginning with a double + underscore in mode. + + + + + + Context-free syntax + + + + GHC doesn't do fixity resolution in expressions during + parsing. For example, according to the Haskell report, the + following expression is legal Haskell: + + let x = 42 in x == 42 == True + and parses as: + + (let x = 42 in x == 42) == True + + because according to the report, the let + expression extends as far to the right as + possible. Since it can't extend past the second + equals sign without causing a parse error + (== is non-fix), the + let-expression must terminate there. GHC + simply gobbles up the whole expression, parsing like this: + + (let x = 42 in x == 42 == True) + + The Haskell report is arguably wrong here, but nevertheless + it's a difference between GHC & Haskell 98. + + + + + + Expressions and patterns + + + + Very long String constants: + + May not go through. If you add a “string + gap” every few thousand characters, then the strings + can be as long as you like. + + Bear in mind that string gaps and the + + option don't mix very well (see + ). + + + + + + + + Declarations and bindings + + None known. + + + + + Module system and interface files + + + + + Namespace pollution + + Several modules internal to GHC are visible in the + standard namespace. All of these modules begin with + Prel, so the rule is: don't use any + modules beginning with Prel in your + program, or you may be comprehensively screwed. + + + + + + + + Numbers, basic types, and built-in classes + + + + Multiply-defined array elements—not checked: + + This code fragment should + elicit a fatal error, but it does not: + + +main = print (array (1,1) [(1,2), (1,3)]) + + + + + + + + + + In Prelude support + + + + The Char type + Charsize + of + + The Haskell report says that the + Char type holds 16 bits. GHC follows + the ISO-10646 standard a little more closely: + maxBound :: Char in GHC is + 0x10FFFF. + + + + + Arbitrary-sized tuples: + + Tuples are currently limited to size 61. HOWEVER: + standard instances for tuples (Eq, + Ord, Bounded, + Ix Read, and + Show) are available + only up to 5-tuples. + + This limitation is easily subvertible, so please ask + if you get stuck on it. + + + + + + + + GHC's interpretation of undefined behaviour in + Haskell 98 + + This section documents GHC's take on various issues that are + left undefined or implementation specific in Haskell 98. + + + + Sized integral types + Intsize of + + + + In GHC the Int type follows the + size of an address on the host architecture; in other words + it holds 32 bits on a 32-bit machine, and 64-bits on a + 64-bit machine. + + Arithmetic on Int is unchecked for + overflowoverflowInt + , so all operations on Int happen + modulo + 2n + where n is the size in bits of + the Int type. + + The fromIntegerfromInteger + function (and hence + also fromIntegralfromIntegral + ) is a special case when + converting to Int. The value of + fromIntegral x :: Int is given by taking + the lower n bits of (abs + x), multiplied by the sign of x + (in 2's complement n-bit + arithmetic). This behaviour was chosen so that for example + writing 0xffffffff :: Int preserves the + bit-pattern in the resulting Int. + + + Negative literals, such as -3, are + specified by (a careful reading of) the Haskell Report as + meaning Prelude.negate (Prelude.fromInteger 3). + So -2147483648 means negate (fromInteger 2147483648). + Since fromInteger takes the lower 32 bits of the representation, + fromInteger (2147483648::Integer), computed at type Int is + -2147483648::Int. The negate operation then + overflows, but it is unchecked, so negate (-2147483648::Int) is just + -2147483648. In short, one can write minBound::Int as + a literal with the expected meaning (but that is not in general guaranteed. + + + The fromIntegral function also + preserves bit-patterns when converting between the sized + integral types (Int8, + Int16, Int32, + Int64 and the unsigned + Word variants), see the modules + Data.Int and Data.Word + in the library documentation. + + + + + Unchecked float arithmetic + + Operations on Float and + Double numbers are + unchecked for overflow, underflow, and + other sad occurrences. (note, however that some + architectures trap floating-point overflow and + loss-of-precision and report a floating-point exception, + probably terminating the + program)floating-point + exceptions. + + + + + + + + + + + Known bugs + +GHC has the following known bugs: + + + +GHC's inliner can be persuaded into non-termination using the standard way to encode +recursion via a data type: + + data U = MkU (U -> Bool) + + russel :: U -> Bool + russel u@(MkU p) = not $ p u + + x :: Bool + x = russel (MkU russel) + +We have never found another program, other than this contrived one, that makes GHC +diverge, and fixing the problem would impose an extra overhead on every compilation. So the +bug remains un-fixed. There is more background in + +Secrets of the GHC inliner. + + + + + + + hunk ./ghc/docs/users_guide/glasgow_exts.sgml 155 -&primitives; + + + Unboxed types and primitive operations + +GHC is built on a raft of primitive data types and operations. +While you really can use this stuff to write fast code, + we generally find it a lot less painful, and more satisfying in the + long run, to use higher-level language features and libraries. With + any luck, the code you write will be optimised to the efficient + unboxed version in any case. And if it isn't, we'd like to know + about it. + +We do not currently have good, up-to-date documentation about the +primitives, perhaps because they are mainly intended for internal use. +There used to be a long section about them here in the User Guide, but it +became out of date, and wrong information is worse than none. + +The Real Truth about what primitive types there are, and what operations +work over those types, is held in the file +fptools/ghc/compiler/prelude/primops.txt. +This file is used directly to generate GHC's primitive-operation definitions, so +it is always correct! It is also intended for processing into text. + + Indeed, +the result of such processing is part of the description of the + External + Core language. +So that document is a good place to look for a type-set version. +We would be very happy if someone wanted to volunteer to produce an SGML +back end to the program that processes primops.txt so that +we could include the results here in the User Guide. + +What follows here is a brief summary of some main points. + + +Unboxed types + + + +Unboxed types (Glasgow extension) + + +Most types in GHC are boxed, which means +that values of that type are represented by a pointer to a heap +object. The representation of a Haskell Int, for +example, is a two-word heap object. An unboxed +type, however, is represented by the value itself, no pointers or heap +allocation are involved. + + + +Unboxed types correspond to the “raw machine” types you +would use in C: Int# (long int), +Double# (double), Addr# +(void *), etc. The primitive operations +(PrimOps) on these types are what you might expect; e.g., +(+#) is addition on +Int#s, and is the machine-addition that we all +know and love—usually one instruction. + + + +Primitive (unboxed) types cannot be defined in Haskell, and are +therefore built into the language and compiler. Primitive types are +always unlifted; that is, a value of a primitive type cannot be +bottom. We use the convention that primitive types, values, and +operations have a # suffix. + + + +Primitive values are often represented by a simple bit-pattern, such +as Int#, Float#, +Double#. But this is not necessarily the case: +a primitive value might be represented by a pointer to a +heap-allocated object. Examples include +Array#, the type of primitive arrays. A +primitive array is heap-allocated because it is too big a value to fit +in a register, and would be too expensive to copy around; in a sense, +it is accidental that it is represented by a pointer. If a pointer +represents a primitive value, then it really does point to that value: +no unevaluated thunks, no indirections…nothing can be at the +other end of the pointer than the primitive value. + + + +There are some restrictions on the use of primitive types, the main +one being that you can't pass a primitive value to a polymorphic +function or store one in a polymorphic data type. This rules out +things like [Int#] (i.e. lists of primitive +integers). The reason for this restriction is that polymorphic +arguments and constructor fields are assumed to be pointers: if an +unboxed integer is stored in one of these, the garbage collector would +attempt to follow it, leading to unpredictable space leaks. Or a +seq operation on the polymorphic component may +attempt to dereference the pointer, with disastrous results. Even +worse, the unboxed value might be larger than a pointer +(Double# for instance). + + + +Nevertheless, A numerically-intensive program using unboxed types can +go a lot faster than its “standard” +counterpart—we saw a threefold speedup on one example. + + + + + +Unboxed Tuples + + + +Unboxed tuples aren't really exported by GHC.Exts, +they're available by default with . An +unboxed tuple looks like this: + + + + + +(# e_1, ..., e_n #) + + + + + +where e_1..e_n are expressions of any +type (primitive or non-primitive). The type of an unboxed tuple looks +the same. + + + +Unboxed tuples are used for functions that need to return multiple +values, but they avoid the heap allocation normally associated with +using fully-fledged tuples. When an unboxed tuple is returned, the +components are put directly into registers or on the stack; the +unboxed tuple itself does not have a composite representation. Many +of the primitive operations listed in this section return unboxed +tuples. + + + +There are some pretty stringent restrictions on the use of unboxed tuples: + + + + + + + + + Unboxed tuple types are subject to the same restrictions as +other unboxed types; i.e. they may not be stored in polymorphic data +structures or passed to polymorphic functions. + + + + + + + Unboxed tuples may only be constructed as the direct result of +a function, and may only be deconstructed with a case expression. +eg. the following are valid: + + + +f x y = (# x+1, y-1 #) +g x = case f x x of { (# a, b #) -> a + b } + + + +but the following are invalid: + + + +f x y = g (# x, y #) +g (# x, y #) = x + y + + + + + + + + + No variable can have an unboxed tuple type. This is illegal: + + + +f :: (# Int, Int #) -> (# Int, Int #) +f x = x + + + +because x has an unboxed tuple type. + + + + + + + + + +Note: we may relax some of these restrictions in the future. + + + +The IO and ST monads use unboxed +tuples to avoid unnecessary allocation during sequences of operations. + + + + + hunk ./ghc/docs/users_guide/lang.sgml 6 -&vs-hs hunk ./ghc/docs/users_guide/ug-book.sgml 21 +&bugs; hunk ./ghc/docs/users_guide/ug-ent.sgml 20 - + hunk ./ghc/docs/users_guide/vs_haskell.sgml 1 - - Haskell 98 vs. Glasgow Haskell: language non-compliance - - - GHC vs the Haskell 98 language - Haskell 98 language vs GHC - - This section lists Glasgow Haskell infelicities in its - implementation of Haskell 98. See also the “when things - go wrong” section () for information - about crashes, space leaks, and other undesirable phenomena. - - The limitations here are listed in Haskell Report order - (roughly). - - - Divergence from Haskell 98 - - - - Lexical syntax - - - - The Haskell report specifies that programs may be - written using Unicode. GHC only accepts the ISO-8859-1 - character set at the moment. - - - - Certain lexical rules regarding qualified identifiers - are slightly different in GHC compared to the Haskell - report. When you have - module.reservedop, - such as M.\, GHC will interpret it as a - single qualified operator rather than the two lexemes - M and .\. - - - - When is on, GHC - reserves several keywords beginning with two underscores. - This is due to the fact that GHC uses the same lexical - analyser for interface file parsing as it does for source - file parsing, and these keywords are used in interface - files. Do not use any identifiers beginning with a double - underscore in mode. - - - - - - Context-free syntax - - - - GHC doesn't do fixity resolution in expressions during - parsing. For example, according to the Haskell report, the - following expression is legal Haskell: - - let x = 42 in x == 42 == True - and parses as: - - (let x = 42 in x == 42) == True - - because according to the report, the let - expression extends as far to the right as - possible. Since it can't extend past the second - equals sign without causing a parse error - (== is non-fix), the - let-expression must terminate there. GHC - simply gobbles up the whole expression, parsing like this: - - (let x = 42 in x == 42 == True) - - The Haskell report is arguably wrong here, but nevertheless - it's a difference between GHC & Haskell 98. - - - - - - Expressions and patterns - - - - Very long String constants: - - May not go through. If you add a “string - gap” every few thousand characters, then the strings - can be as long as you like. - - Bear in mind that string gaps and the - - option don't mix very well (see - ). - - - - - - - - Declarations and bindings - - None known. - - - - - Module system and interface files - - - - - Namespace pollution - - Several modules internal to GHC are visible in the - standard namespace. All of these modules begin with - Prel, so the rule is: don't use any - modules beginning with Prel in your - program, or you may be comprehensively screwed. - - - - - - - - Numbers, basic types, and built-in classes - - - - Multiply-defined array elements—not checked: - - This code fragment should - elicit a fatal error, but it does not: - - -main = print (array (1,1) [(1,2), (1,3)]) - - - - - - - - - - In Prelude support - - - - The Char type - Charsize - of - - The Haskell report says that the - Char type holds 16 bits. GHC follows - the ISO-10646 standard a little more closely: - maxBound :: Char in GHC is - 0x10FFFF. - - - - - Arbitrary-sized tuples: - - Tuples are currently limited to size 61. HOWEVER: - standard instances for tuples (Eq, - Ord, Bounded, - Ix Read, and - Show) are available - only up to 5-tuples. - - This limitation is easily subvertible, so please ask - if you get stuck on it. - - - - - - - - GHC's interpretation of undefined behaviour in - Haskell 98 - - This section documents GHC's take on various issues that are - left undefined or implementation specific in Haskell 98. - - - - Sized integral types - Intsize of - - - - In GHC the Int type follows the - size of an address on the host architecture; in other words - it holds 32 bits on a 32-bit machine, and 64-bits on a - 64-bit machine. - - Arithmetic on Int is unchecked for - overflowoverflowInt - , so all operations on Int happen - modulo - 2n - where n is the size in bits of - the Int type. - - The fromIntegerfromInteger - function (and hence - also fromIntegralfromIntegral - ) is a special case when - converting to Int. The value of - fromIntegral x :: Int is given by taking - the lower n bits of (abs - x), multiplied by the sign of x - (in 2's complement n-bit - arithmetic). This behaviour was chosen so that for example - writing 0xffffffff :: Int preserves the - bit-pattern in the resulting Int. - - - Negative literals, such as -3, are - specified by (a careful reading of) the Haskell Report as - meaning Prelude.negate (Prelude.fromInteger 3). - So -2147483648 means negate (fromInteger 2147483648). - Since fromInteger takes the lower 32 bits of the representation, - fromInteger (2147483648::Integer), computed at type Int is - -2147483648::Int. The negate operation then - overflows, but it is unchecked, so negate (-2147483648::Int) is just - -2147483648. In short, one can write minBound::Int as - a literal with the expected meaning (but that is not in general guaranteed. - - - The fromIntegral function also - preserves bit-patterns when converting between the sized - integral types (Int8, - Int16, Int32, - Int64 and the unsigned - Word variants), see the modules - Data.Int and Data.Word - in the library documentation. - - - - - Unchecked float arithmetic - - Operations on Float and - Double numbers are - unchecked for overflow, underflow, and - other sad occurrences. (note, however that some - architectures trap floating-point overflow and - loss-of-precision and report a floating-point exception, - probably terminating the - program)floating-point - exceptions. - - - - - - - - - rmfile ./ghc/docs/users_guide/vs_haskell.sgml }