NLP: the missing framework

Daniël de Kok me at danieldk.eu
Sat Jan 19 09:57:22 GMT 2013


On Jan 19, 2013, at 9:39 AM, Eric Kow <eric.kow at gmail.com> wrote:
> Just thought you might be interested in Edward Yang's call to arms if you haven't seen it already:
> 
> http://blog.ezyang.com/2013/01/nlp-the-missing-framework/
> 
> How can we push things a little bit more in the right direction in the Haskell NLP world? What is the right direction?

Summarized, I think there are currently two major problems:

* Interoperability between existing components. From different expectations about tokenization to different syntactic annotations.

* High-quality, annotated data is often not available under a permissive license.

Some projects, such as OpenNLP and NLTK, that aim to provide what the blog-post asks: ready to use, pre-trained NLP components. However, the blog post doesn't really lay out why these frameworks are not acceptable.

The question is whether the world is best of with yet another unfinished framework, this time written in Haskell, rather than focusing on improving existing frameworks. If somebody started from scratch, I think I would be more interested in a set of interoperable C libraries, since they could be integrated fairly easily in any language.

Another approach would be to combine different components by adopting Apache UIMA everywhere. However, I doubt that such pipelines would be easy to install or maintain.

-- Daniël


More information about the NLP mailing list