[Haddock] [haddock] #20: We don't handle non-ASCII characters in doc comments

haddock haddock at projects.haskell.org
Mon Dec 5 09:19:27 GMT 2011


#20: We don't handle non-ASCII characters in doc comments
-------------------+--------------------------------------------------------
Reporter:  waern   |        Owner:     
    Type:  defect  |       Status:  new
Priority:  major   |    Milestone:     
 Version:          |   Resolution:     
Keywords:          |  
-------------------+--------------------------------------------------------

Comment(by simonmar):

 The comments from GHC are lexed again by Haddock using an Alex lexer, and
 I would expect that step to mangle the Unicode.  From `src\Lex.x`:

 {{{
 alexGetByte :: AlexInput -> Maybe (Word8,AlexInput)
 alexGetByte (p,c,[]) = Nothing
 alexGetByte (p,_,(c:s))  = let p' = alexMove p c
                               in p' `seq`  Just (fromIntegral (ord c),
 (p', c, s))

 -- for compat with Alex 2.x:
 alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
 alexGetChar i = case alexGetByte i of
                   Nothing     -> Nothing
                   Just (b,i') -> Just (chr (fromIntegral b), i')
 }}}

 You can see we apply `ord` in `alexGetByte` and `chr` again in
 `alexGetChar`, so Unicode should be squashed to the low 8 bits.

-- 
Ticket URL: <http://trac.haskell.org/haddock/ticket/20#comment:12>
haddock <http://www.haskell.org/haddock>
Haddock, The Haskell Documentation Tool


More information about the Haddock mailing list