[Haskell-cafe] NLP libraries and tools?

Grzegorz Chrupała pitekus at gmail.com
Fri Jul 8 09:34:04 BST 2011


Hi Dmitrii,


> 1) End of Sentence (EOS) Detection. Break text into a collection of
> meaningful sentences.
FullStop http://hackage.haskell.org/package/fullstop splits texts into
sentences, using some orthographical conventions (used in English and
hopefully other languages).

> 2) Part-of-Speech (POS) Tagging. Assign part-of-speech information to each
> token.

Morfette http://hackage.haskell.org/package/morfette is a Haskell
program which does joint POS-tagging and lemmatization for
morphologically rich languages. There are some pre-trained models
available at https://sites.google.com/site/morfetteweb/download/pretrained-morfette-models

> 3) Chunking. Analyze each tagged token within a sentence and assemble
> compound tokens that express logical concepts. Define a custom grammar.
> 4) Extraction. Analyze each chunk and further tag the chunks as named
> entities, such as people, organizations, locations, etc.

Sequor http://hackage.haskell.org/package/sequor is Haskell program
which can be used for chunking or NE detection. You will need to train
it on a labeled data set.

Morfette and Sequor are tool based statistical learning, and don't
currently expose a library interface.

If you want a rule-based approach, GF
http://hackage.haskell.org/package/gf might be worth looking into, but
I don't know much about it.

Hope this helps,
--
G.



More information about the NLP mailing list