High-precision arithmetic for calculating N-gram probabilities

Mon Oct 17 20:55:44 BST 2011

On 10/16/11 8:26 AM, Dan Maftei wrote:
> Further, do we often sum probabilities in NLP?

Depends who "we" are I suppose. To give an example, consider the 
approach of using HMMs for tagging.

If we're only ever interested in the most likely tag sequence (or the 
probability thereof) then we'll use the Viterbi algorithm which is using 
the tropical semiring ---i.e., the (max,+) semiring for log-probs; or 
the exponential of the tropical semiring, (max,*), for normal probabilities.

However, if we're interested in the probability of all possible tag 
sequences (which is useful for getting the n-best sequences) then we'll 
use the forward--backward algorithm which is using the (+,*) semiring 
for normal probabilities and (logSum,+) for log-probs.

This is pretty common in all domains of statistical NLP: some problems 
use (max,*)/(max,+) whereas other problems use (+,*)/(logSum,+). It just 
depends on whether you want only the best widget or whether you want 
information about all the widgets. So, how often addition is used will 
depend on the specific problems you choose to tackle and your approach 
to tackling them.

> 2 I'm unclear what log (_ + 1) and (exp _) - 1 refer to,

FTFY.

I mean the functions (\x -> log (1+x)) and (\x -> exp x - 1). They're 
inaccurate for very small x because of details about how Float/Double 
are represented. The details aren't too important, but just think about 
what happens when we add a really small number to 1. There are many 
values where 1+x == 1 because we can't represent 1+x with the available 
precision and 1 is the closest thing we can represent.

Suffice it to say that (\x -> log (1+x)) will underflow around x=2**-53 
for Double, whereas we could implement a denotationally equivalent 
function (\x -> log1p x) which only underflows around x=2**-1074 for 
Double. The same idea goes for the naive (\x -> exp x - 1) vs a 
denotationally equivalent implementation (\x -> expm1 x).

N.B., (log1p . expm1) == id, and (expm1 . log1p) == id

> Same goes for what a and b represent in logSum (== a + log(1+ exp(b-a)).

logSum :: Double -> Double -> Double
logSum a b = a + log (1 + exp (b-a))

that's the basic idea anyways. As I said, we actually want to check 
whether a or b is bigger; also we'll want to check to make sure it does 
the right thing when a or b is infinite; blah blah blah.

> If I want to multiply
> probabilities, I use exp $ foldl' (+) 0 (map log probs), which is equivalent
> to foldl' (*) 1 probs. Is there something inaccurate or incorrect about
> that?

Well, when you do the exp at the end you'll underflow just the same as 
if you did (foldl' (*) 1 probs). In general, once you put things into 
the log-domain you never want to take them back out, because you can 
lose information (underflow) when taking them out.

But the real question is: how do I translate (foldl' (+) 0 probs) into 
the log domain? Getting (log 0) isn't too hard, though there are 
portability issues to beware of since some Haskell implementations will 
throw an error rather than give you negative infinity back. And we also 
have to figure out a function (log (+))--- i.e., figure out what 
normal-domain addition becomes when we pass it through the logarithmic 
homomorphism. The answer is logSum.

More generally, the logarithmic and exponential homomorphisms preserve 
the ordering structure, the monoid structures, and the semiring 
structures (and probably other structures too); all according to the 
following definitions:

     -- On types:
     log NonnegativeReal = Real
     log ZeroToOneReal = NonpositiveReal
     -- On values:
     log 0 = negativeInfinity
     log 1 = 0
     log positiveInfinity = positiveInfinity
     -- On functions:
     log (*) = (+)
     log (+) = logSum
     log max = max
     log min = min

     -- On types:
     exp Real = NonnegativeReal
     exp NonpositiveReal = ZeroToOneReal
     -- On values:
     exp negativeInfinity = 0
     exp 0 = 1
     exp positiveInfinity = positiveInfinity
     -- On functions:
     exp (+) = (*)
     exp logSum = (+)
     exp max = max
     exp min = min

We can extend this homomorphism to types other than the reals. We just 
need for the type to support multiple monoids or semirings in a way that 
matches the above. E.g., any type with a semiring will have 
logarithmic/exponential monoid-homomorphisms between the additive monoid 
and the multiplicative monoid of that semiring.

-- 
Live well,
~wren