Ann: Chatter-0.5.0.0; IE patterns, chunking, POS tagging.

Rogan Creswick creswick at gmail.com
Tue Oct 21 17:48:34 BST 2014


I'm proud to announce Chatter-0.5.0.0! It's been a while since I actually
made a release announcement, so quite a bit has changed.

Hackage: http://hackage.haskell.org/package/chatter/
Github: https://github.com/creswick/chatter

Chatter is licensed under the BSD3 and it is pure Haskell.

Some new things of particular note:

** Phrase chunking

> import NLP.POS
> import NLP.Chunk
> tgr <- defaultTagger
> chk <- defaultChunker
> chunkStr tgr chk "The dog jumped over the cat"
"[NP The/DT dog/NN] [VP jumped/VBD] [NP over/IN the/DT cat/NN]"

** IE Patterns with Parsec
You can write Information Extraction patterns with Parsec that reason about
Chunks, POS tags, or literal token text. (For examples, see:
http://hackage.haskell.org/package/chatter/docs/NLP-Extraction-Examples-ParsecExamples.html
)

** Domain-specific terminology in Taggers

The POS taggers can be primed with "protected terms"; if you have a lexicon
of domain-specific terminology, you can load those in a LiteralTagger that
uses a backoff-tagger that's based on eg: Averaged Perceptrons and trained
on a corpus (chatter ships with two of these).

** Rich tag/chunk types

The types of structured results from POS taggers and Chunkers has been
completely redesigned to be more of a structured tree.  This was necessary
to support IE patterns in a halfway reasonable way, and this also allowed
for a set of typeclasses to be design for Tag and Chunks that abstract away
from the specific tagset and chunks used in a given system.  (eg: Brown vs.
Penn Treebank tagsets).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://projects.haskell.org/pipermail/nlp/attachments/20141021/1a40507e/attachment.htm>


More information about the NLP mailing list