ANNOUNCE: brillig 0.3 - not quite the Brill tagger

Rogan Creswick creswick at gmail.com
Wed Sep 7 17:40:43 BST 2011


2011/9/7 Eric Kow <eric.kow at gmail.com>:
> But hopefully I won't have to, because I was actually just saying
> something incredibly simple and non-technical, that the brillig
> executable could just provide a thin wrapper around different kinds of
> taggers (as alternatives to each other, completely disjoint).
> You know, files go in, tags come out...

Is anyone else interested in supporting the Apache UIMA CAS format(s)?
I'm not a *huge* fan of the gritty system design details in UIMA (it
seems absurdly difficult to actually use an analysis engine / pear in
an application) but at least the file format for annotations is
somewhat standardized.

It would also be nice to provide some sort of a bridge to another rich
set of NLP libraries, while the Haskell infrastructure is getting off
the ground.

(In a tangential note: This thread has been great for bringing some
tagging libraries to my attention... I didn't realize there were so
many options already!)

--Rogan

> but this was before I looked
> at the training file format and understood that this is what sequor
> provides.  Oh well, this probably makes brillig just a bit redundant in
> infrastructure terms. :-)
>
>> For what it's worth, I just trained Sequor  (using several spelling
>> features as encoded in the data/mlcomp2.features template) on the
>> initial 90% of the Brown corpus, and tested on the final 10%, and got
>> an accuracy of 96.2%. Training takes several hours, but tagging runs
>> at more than 3000 words/second.
>
> Cool!
>
> PS. can we have a small release with '-rtsopts'?
>
> --
> Eric Kow <http://erickow.com>
>
> _______________________________________________
> NLP mailing list
> NLP at projects.haskell.org
> http://projects.haskell.org/cgi-bin/mailman/listinfo/nlp
>
>



More information about the NLP mailing list