[Haskell-cafe] NLP libraries and tools?
Grzegorz Chrupała
pitekus at gmail.com
Fri Jul 8 09:34:04 BST 2011
Hi Dmitrii,
> 1) End of Sentence (EOS) Detection. Break text into a collection of
> meaningful sentences.
FullStop http://hackage.haskell.org/package/fullstop splits texts into
sentences, using some orthographical conventions (used in English and
hopefully other languages).
> 2) Part-of-Speech (POS) Tagging. Assign part-of-speech information to each
> token.
Morfette http://hackage.haskell.org/package/morfette is a Haskell
program which does joint POS-tagging and lemmatization for
morphologically rich languages. There are some pre-trained models
available at https://sites.google.com/site/morfetteweb/download/pretrained-morfette-models
> 3) Chunking. Analyze each tagged token within a sentence and assemble
> compound tokens that express logical concepts. Define a custom grammar.
> 4) Extraction. Analyze each chunk and further tag the chunks as named
> entities, such as people, organizations, locations, etc.
Sequor http://hackage.haskell.org/package/sequor is Haskell program
which can be used for chunking or NE detection. You will need to train
it on a labeled data set.
Morfette and Sequor are tool based statistical learning, and don't
currently expose a library interface.
If you want a rule-based approach, GF
http://hackage.haskell.org/package/gf might be worth looking into, but
I don't know much about it.
Hope this helps,
--
G.
More information about the NLP
mailing list