Current ideas

Tue Nov 25 22:48:42 EST 2008

> Hi,
>
> Since LHC's creation, I have been doing a bit of work on it,
> mainly cleaning things and replacing things etc. - it's now much
> easier to build (with cabal,) has some cleaned up code, and general
> improvements over jhc I would say (at this point anyway.)
>
> Here's the current ChangeLog:
> http://code.haskell.org/lhc/ChangeLog
>
> Before any sort of announcement (and a version bump,) here are some
> things that have been on my mind:
>
>  * There could be various improvements in performance and memory usage
>   just about everywhere. I have some initial profiling results from running
>   the compiler over 'hello world' here:
>   http://thoughtpolice.stringsandints.com/code/lhc-tests/hello-world/
>
>   The files are:
>   - lhc.prof & lhc-cc.ps - detailed cost centres - we see about 40%
>     *total* time and allocation is spent in E.Binary on two different get
>     routines for Data.Binary.
>   - lhc-constr.ps & lhc-type.ps - most of the allocation goes to the
>     S data type used in the lambda lifter/CPR pass, as well as lazy
>     tuples.
>
>   So I think if we can target E.Binary usage, we can possibly cut
>   down GC and runtime considerably. We may be holding the GC from

Compiling HelloWorld is currently dominated by the time it takes to
load base-1.0.hl. Reducing the size of base-1.0.hl should benefit
compile times dramatically.

>   Note that these benchmarks were run before I replaced several other
>   parts of the code and removed a few things, notably replacing DrIFT
>   with derive - I noticed GC went up to about 61% from the average
>   58% for DrIFT. I should probably update them.
>
>   (Also, GHC HEAD has had recent improvements to the parallel garbage
>   collector, etc. so I would like to see if running lhc on top of it
>   would reduce garbage time with -N2 and -threaded.)

I tried it with on my dual core AMD. CPU usage did go above 100% but
it didn't make compiling base-1.0.hl any faster.

>  * I think we should continue with jhc's goal of sticking with the latest
>   GHC. I see potential in use in quasi-quoting, for perhaps replacing
>   the FlagDump, Name and PrimitiveOperators information that was part
>   of the autoconf build system.
>   We can easily construct a parser with parsec (or even a
>   regex lib; the perl scripts for this are in util/) and go from
>   there; this also replaces another external dependency
>   and makes cabal life better.

I like this idea.

>  * The region inference algorithm is currently buggy, and code leaks
>   pretty badly if it runs for a while. If you look here:
>   http://thoughtpolice.stringsandints.com/code/lhc-tests/bench
>   and build, loop and startup work fine, but recursive almost immediately
>   starts gobbling up 800mb+ of memory. This is something to be
>   addressed for sure.

What's the right thing to do here? Do we need a generational GC or
should we piggyback on GHC?

>  * Right now the parser is featureful and works, but we may also
>   be able to swap it out to say, haskell-src-exts.
>
>   Pros: way more extensions we can support out of the box, many
>         probably pretty easily with some knowledge of E and GRIN.
>   Cons: we lose pragma's entirely, and we must effectively put all
>         extensions on, all the time.
>
>   Losing pragmas is a problem, but it is a TODO on the
>   haskell-src-exts project, and I'm convinced it could be worked in
>   there in the interests of making lhc more robust.
>
>   I am also not convinced that losing the ability to turn language
>   extensions off is a particularly bad thing either; with
>   haskell-src-exts we immediately gain support for a large variety of
>   extensions making lhc compatible with a much more vast amount of
>   code immediately; although we will have to implement code for
>   extensions like associated types, GADTs, etc. etc..
>
>   I think contribuing back to haskell-src-exts would probably be a
>   good idea.

I like this idea. The parser we have now is slightly buggy.
Losing pragmas seems like a big problem, though. We might be able to
go without INLINE and SPECIALIZE, but RULES is pretty essential.

>  * We definitely need a testsuite - I think a good first suite to
>   target and fully compile would be nobench.
>
>  * We should eliminate the last bits of the old build system; I have just
>   eliminated cbits as we don't need it, and we need to replace the
>   Name, Flag and PrimitiveOperator generation routines, because right
>   now they're generically hard-coded in there.
>
>  * There should be something like a LHC commentary. Starting one on
>   http://lhc.seize.it seems reasonable.

Couldn't agree more.

>  * Should find a way to get lhc's base package on hackage so we can do
>   'cabal install lhc; cabal install base --lhc'

We'll have to talk with Duncan about this one.

> These are all of my thoughts, I would like feedback from anybody
> willing to contribute to the project. I realize all these things are
> something of a significant engineering effort, but these are a few
> core things that if put in place could greatly help LHC as a compiler
> and haskell programmers.
>
> Austin

--
Cheers,
 Lemmih