CLT Toolkit

Daniël de Kok me at danieldk.eu
Sat Nov 27 04:06:36 EST 2010


On Nov 27, 2010, at 7:21 AM, wren ng thornton wrote:
> algorithms. I started the project because the CCG supertaggers available 
> (C&C Tools; OpenCCG) are too integrated in their own projects to 
> facilitate doing my kind of research, and also because the current 
> standard for HMM tagging (TnT) is closed source. So the goal (as a 
> Haskell library) is to make it as openly reusable as possible.

For what it is worth, my Citar (C++) and Jitar (Java) taggers are nearly identical to TnT:

- They use a trigram HMM model.
- Linear interpolation smoothing is used for estimating the probability of trigrams.
- Unknown word probabilities are estimated using suffixes.
- There are some tricks that are not described in Brandt's paper that are necessary to achieve the same performance (e.g. use different estimators for unknown words that are capitalized/uncaptialized).

https://github.com/danieldk/citar
https://github.com/danieldk/jitar

Both are available under an opensource license.

Take care,
Daniël
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.haskell.org/pipermail/nlp/attachments/20101127/c2e3ac2b/attachment.htm 


More information about the NLP mailing list