NLP Digest, Vol 12, Issue 5

Dan Maftei ninestraycats at gmail.com
Sun Oct 30 00:04:12 BST 2011


On Tue, Oct 18, 2011 at 12:00 PM, <nlp-request at projects.haskell.org> wrote:

> Send NLP mailing list submissions to
>        nlp at projects.haskell.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://projects.haskell.org/cgi-bin/mailman/listinfo/nlp
> or, via email, send a message with subject or body 'help' to
>        nlp-request at projects.haskell.org
>
> You can reach the person managing the list at
>        nlp-owner at projects.haskell.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NLP digest..."
>
>
> Today's Topics:
>
>   1. Re: High-precision arithmetic for calculating N-gram
>      probabilities (wren ng thornton)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 17 Oct 2011 15:55:44 -0400
> From: wren ng thornton <wren at freegeek.org>
> Subject: Re: High-precision arithmetic for calculating N-gram
>        probabilities
> To: nlp at projects.haskell.org
> Message-ID: <4E9C8840.8040705 at freegeek.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 10/16/11 8:26 AM, Dan Maftei wrote:
> > Further, do we often sum probabilities in NLP?
>
> Depends who "we" are I suppose. To give an example, consider the
> approach of using HMMs for tagging.
>
> If we're only ever interested in the most likely tag sequence (or the
> probability thereof) then we'll use the Viterbi algorithm which is using
> the tropical semiring ---i.e., the (max,+) semiring for log-probs; or
> the exponential of the tropical semiring, (max,*), for normal
> probabilities.
>
> However, if we're interested in the probability of all possible tag
> sequences (which is useful for getting the n-best sequences) then we'll
> use the forward--backward algorithm which is using the (+,*) semiring
> for normal probabilities and (logSum,+) for log-probs.
>
> This is pretty common in all domains of statistical NLP: some problems
> use (max,*)/(max,+) whereas other problems use (+,*)/(logSum,+). It just
> depends on whether you want only the best widget or whether you want
> information about all the widgets. So, how often addition is used will
> depend on the specific problems you choose to tackle and your approach
> to tackling them.
>
> > 2 I'm unclear what log (_ + 1) and (exp _) - 1 refer to,
>
> FTFY.
>
> I mean the functions (\x -> log (1+x)) and (\x -> exp x - 1). They're
> inaccurate for very small x because of details about how Float/Double
> are represented. The details aren't too important, but just think about
> what happens when we add a really small number to 1. There are many
> values where 1+x == 1 because we can't represent 1+x with the available
> precision and 1 is the closest thing we can represent.
>
> Suffice it to say that (\x -> log (1+x)) will underflow around x=2**-53
> for Double, whereas we could implement a denotationally equivalent
> function (\x -> log1p x) which only underflows around x=2**-1074 for
> Double. The same idea goes for the naive (\x -> exp x - 1) vs a
> denotationally equivalent implementation (\x -> expm1 x).
>
> N.B., (log1p . expm1) == id, and (expm1 . log1p) == id
>

Perfect. That makes sense, and I tested on ghci too.

But why is adding 1 to a small number then taking its log so common? When
we move into logspace, we take logs of small numbers, and add or multiply
them. Where does (\x -> log (1+x)) come into play?


>
> > Same goes for what a and b represent in logSum (== a + log(1+ exp(b-a)).
>
> logSum :: Double -> Double -> Double
> logSum a b = a + log (1 + exp (b-a))
>
> that's the basic idea anyways. As I said, we actually want to check
> whether a or b is bigger; also we'll want to check to make sure it does
> the right thing when a or b is infinite; blah blah blah.
>

Well, I see why log1p is useful when both a and b are extremely small.
Wouldn't we do logSum a b = a + log1p (exp (b-a))?


>
> > If I want to multiply
> > probabilities, I use exp $ foldl' (+) 0 (map log probs), which is
> equivalent
> > to foldl' (*) 1 probs. Is there something inaccurate or incorrect about
> > that?
>
> Well, when you do the exp at the end you'll underflow just the same as
> if you did (foldl' (*) 1 probs). In general, once you put things into
> the log-domain you never want to take them back out, because you can
> lose information (underflow) when taking them out.
>
> But the real question is: how do I translate (foldl' (+) 0 probs) into
> the log domain? Getting (log 0) isn't too hard, though there are
> portability issues to beware of since some Haskell implementations will
> throw an error rather than give you negative infinity back. And we also
> have to figure out a function (log (+))--- i.e., figure out what
> normal-domain addition becomes when we pass it through the logarithmic
> homomorphism. The answer is logSum.
>

What I understand is that (foldl' logSum 0 probs) is a way to add
probabilities in the log domain. Further, on normalized probabilities, it
will result in 1 no matter what order the Doubles are summed.

But I just tested this:

> foldl' logSum 0 [0.5, 0.5]
1.45....
> foldl' logSum 0 [0.1, 0.9]
1.51....




>
> More generally, the logarithmic and exponential homomorphisms preserve
> the ordering structure, the monoid structures, and the semiring
> structures (and probably other structures too); all according to the
> following definitions:
>
>     -- On types:
>     log NonnegativeReal = Real
>     log ZeroToOneReal = NonpositiveReal
>     -- On values:
>     log 0 = negativeInfinity
>     log 1 = 0
>     log positiveInfinity = positiveInfinity
>     -- On functions:
>     log (*) = (+)
>     log (+) = logSum
>     log max = max
>     log min = min
>
>     -- On types:
>     exp Real = NonnegativeReal
>     exp NonpositiveReal = ZeroToOneReal
>     -- On values:
>     exp negativeInfinity = 0
>     exp 0 = 1
>     exp positiveInfinity = positiveInfinity
>     -- On functions:
>     exp (+) = (*)
>     exp logSum = (+)
>     exp max = max
>     exp min = min
>
> We can extend this homomorphism to types other than the reals. We just
> need for the type to support multiple monoids or semirings in a way that
> matches the above. E.g., any type with a semiring will have
> logarithmic/exponential monoid-homomorphisms between the additive monoid
> and the multiplicative monoid of that semiring.
>
> --
> Live well,
> ~wren
>
>
>
> ------------------------------
>
> _______________________________________________
> NLP mailing list
> NLP at projects.haskell.org
> http://projects.haskell.org/cgi-bin/mailman/listinfo/nlp
>
>
> End of NLP Digest, Vol 12, Issue 5
> **********************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://projects.haskell.org/pipermail/nlp/attachments/20111030/dccfce18/attachment-0001.htm>


More information about the NLP mailing list