[Haskell-cafe] NLP libraries and tools?

Dmitri O.Kondratiev dokondr at gmail.com
Fri Jul 8 11:13:14 BST 2011


Hi Grzegorz,
Thanks a lot for the info!
I am also looking for Haskell implementations of:
1) English sentence detector.
2) English word tokenizer for Twitter 140 char messages.
3) TreebankWordTokenizer, a word tokenizer that operates on sentences and
uses the same conventions as the Penn Treebank Project, similar to what NLTK
has:
http://nltk.googlecode.com/svn/trunk/doc/api/nltk.tokenize.treebank.TreebankWordTokenizer-class.html

Thanks!
Dmitri

2011/7/8 Grzegorz Chrupała <pitekus at gmail.com>

> Hi Dmitrii,
>
>
> > 1) End of Sentence (EOS) Detection. Break text into a collection of
> > meaningful sentences.
> FullStop http://hackage.haskell.org/package/fullstop splits texts into
> sentences, using some orthographical conventions (used in English and
> hopefully other languages).
>
> > 2) Part-of-Speech (POS) Tagging. Assign part-of-speech information to
> each
> > token.
>
> Morfette http://hackage.haskell.org/package/morfette is a Haskell
> program which does joint POS-tagging and lemmatization for
> morphologically rich languages. There are some pre-trained models
> available at
> https://sites.google.com/site/morfetteweb/download/pretrained-morfette-models
>
> > 3) Chunking. Analyze each tagged token within a sentence and assemble
> > compound tokens that express logical concepts. Define a custom grammar.
> > 4) Extraction. Analyze each chunk and further tag the chunks as named
> > entities, such as people, organizations, locations, etc.
>
> Sequor http://hackage.haskell.org/package/sequor is Haskell program
> which can be used for chunking or NE detection. You will need to train
> it on a labeled data set.
>
> Morfette and Sequor are tool based statistical learning, and don't
> currently expose a library interface.
>
> If you want a rule-based approach, GF
> http://hackage.haskell.org/package/gf might be worth looking into, but
> I don't know much about it.
>
> Hope this helps,
> --
> G.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://projects.haskell.org/pipermail/nlp/attachments/20110708/8d59dbe3/attachment.htm>


More information about the NLP mailing list