ANNOUNCE: brillig 0.3 - not quite the Brill tagger

Eric Y. Kow eric.kow at gmail.com
Sat Sep 3 09:32:56 BST 2011


Dear Haskell NLPers,

I like to announce the availability of brillig (on Hackage), which aims
to be, but falls short of, a Brill tagger implementation.  You can if
get it by running

     cabal update
     cabal install brillig

For now you may also want to get the unstable version

    darcs get --lazy http://darcsden.com/kowey/brillig

This is largely a seed-planting exercise, naive implementations of
simple algorithms so that we have something rather than nothing
(see also fullstop, a sentence segmenter).  Is there a Haskell NLTK?
No, but...

The good news:

  - comes with (hopefully) easy quick start instructions
  - improves accuracy over unigram baseline by 1%
  - available in library form with a permissive Free software license (BSD3)

The bad news:

 - only implements templates one tag back.  Needs to be generalised
 - not actually in use for anything.
   I mostly wrote this for the hell of it.
   New maintainer welcome!
 - top accuracy on a random tenth held out from the Brown corpus is
   87.4%... the unigram baseline reported in Jurafsky and Martin's
   textbook is something like 91%.  Oops.

   For what it's worth, my unigram baseline is 86.4% which is why I'm
   reporting a 1% improvement).   Hopefully, this is due to different
   data sets...

Anyway, it's out there. I hope somebody can run with it.
Have fun!  Go make it better!

Eric

PS. The toves, they are slithey

-- 
Eric Kow <http://erickow.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://projects.haskell.org/pipermail/nlp/attachments/20110903/e956f1ef/attachment.pgp>


More information about the NLP mailing list