ANNOUNCE: brillig 0.3 - not quite the Brill tagger

Daniël de Kok me at danieldk.eu
Sat Sep 3 10:54:26 BST 2011


Hi Eric,

On Sep 3, 2011, at 10:32 AM, Eric Y. Kow wrote:
> This is largely a seed-planting exercise, naive implementations of
> simple algorithms so that we have something rather than nothing
> (see also fullstop, a sentence segmenter).  Is there a Haskell NLTK?
> No, but…

Thank you!  It's good to see some NLP infrastructure is being written. From a quick glance through the code, I wonder if transformation rules can be made more expressive easily. E.g. it would be nice if a rule has a list of atomic conditions that are checked. Something along these lines (String and less type classes used for brevity):

---
{-# LANGUAGE ExistentialQuantification #-}

import Data.List.Zipper (Zipper(..))

type Tag = String
type Word = String

data Context =
  Context {
    ctxWords :: Zipper Word,
    ctxTags  :: Zipper Tag
  }

class Condition c where
  conditionApplies :: c -> Context -> Bool

data CondBox = forall c . (Condition c, Show c) => MkCond c

instance Show CondBox where
  show (MkCond c) = show c

data Transform =
  Transform {
    conditions :: [CondBox],
    replace    :: Replacement
  } deriving Show

data Replacement =
  Replacement {
    from :: Tag,
    to   :: Tag
  } deriving (Show)
---

Now conditions can be defined briefly:

---
-- The Nth word to the left
data PrevWordCondition =
  PrevWordCondition Word Int
  deriving Show

instance Condition PrevWordCondition where
  conditionApplies (PrevWordCondition word n) (Context (Zip left _) _) =
    (left !! n) == word 

-- The Nth tag to the left
data PrevTagCondition =
  PrevTagCondition Tag Int
  deriving Show

instance Condition PrevTagCondition where
  conditionApplies (PrevTagCondition tag n) (Context _ (Zip left _)) =
    (left !! n) == tag
---

One sample rule:

---
rule1 :: Transform
rule1 =
  Transform
    [ MkCond $ PrevWordCondition "very" 1,
      MkCond $ PrevTagCondition "ADJ" 1]
    (Replacement "N" "ADJ")
---

The obvious next step would be to make generators for conditions, so that you can make rule templates.

Take care,
Daniël


More information about the NLP mailing list