LCS timings
Ian Lynagh
igloo at earth.li
Sun Jan 18 06:53:47 EST 2009
Hi all,
Over xmas I did a quick Myers implementation, and also wrote some
performance testing stuff to compare:
* myers
* hunt (Hunt-Szymanski, as in the lcs package)
* mc (the same as myers, except it only works out the length of the LCS,
not what's in it. myers doesn't find the actual LCS particularly
efficiently)
* darcs
This work isn't finished, but given I don't expect to work on it again
for a while, and seeing as someone was asking about this, here's the
current status and code.
First, the output:
Running timings
myers hunt mc darcs
1 1 1 39 empty
1 1 1 39 singleton
8 719 9 98 All A
9 35 9 117 Both A1...
6544 26 1562 134 All A and all B
6483 31 1638 2859 A1... and B1...
6859 30 1620 3485 A1... and ...A1
8638 1134 2253 327 A,B.. and A,C..
8530 85 2281 6545 A1,B1.. and A1,C1..
The numbers are times in milliseconds (lowest of 3 runs). There's some
noise in some of these numbers (e.g. in another run hunts peaks were 660
and 784), but they're good enough.
The tests are just patterns of 4000 or 8000 lines, e.g. the most
complicated "A1,B1.. and A1,C1.." test is the LCS of
["A1","B1","A2","B2","A3","B3", ..., "B3999","A4000","B4000"] and
["A1","C1","A2","C2","A3","C3", ..., "C3999","A4000","C4000"]
More tests of interesting cases welcome. One I'd really like is the
real-world case that motivated darcs to switch algorithm - I think it
was a configure script. I had a quick look online, but couldn't find it.
The code is:
* All.hs Defines the tests, and runs them all 3 times each way
* DoTiming.hs Times how long the tests take (clock time). Checks
that they give a right answer (doesn't bother in the
case of darcs, as it doesn't return an LCS directly,
and ni the case of mc it only checks that the length
is right)
* TimeMe.hs Actually does the LCS in the appropriate way
* Myers.hs The myers implementation
* HuntSzymanski.hs The hunt implementation
* MyersCheat.hs The mc implementation
And run with:
$GHC --make -O -Wall -Werror TimeMe
$GHC --make -O -Wall -Werror DoTiming
$GHC --make -O -Wall -Werror All
echo Running timings
./All
Some things to note:
* the darcs test actually runs the darcs program on a repo, so it has
higher startup overhead, as evidenced by the "empty" and "singleton"
tests. Plus it was compiled with a different compiler, different flags
etc.
* I believe darcs is hashing all the lines in each file and, if a hash
appears in one file but not the other, removing the lines with that
hash. This turns all of the tests except for "A1... and ...A1" into
comparing two identical sequences, which is trivial for Myers (which
darcs uses). The large runtime in other cases is presumably the time
taken to hash everything, and to find out which hashes are only in one
file.
* The lines are all very short, so some of the benefits of darcs's
comparing hashes, rather than strings, when lcs'ing may be being
hidden.
TODO:
* Make a HashFirst lcs-transformer, so myers, hunt and mc can be tested
with and without darcs's pre-LCS hashing behaviour.
* Write a decent lcs-finder for the Myers implementation.
* Review/profile the hunt and myers code, to see if anything silly is
happening.
Thanks
Ian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: All.hs
Type: text/x-haskell
Size: 5778 bytes
Desc: not available
Url : http://projects.haskell.org/pipermail/camp/attachments/20090118/77c5b0c5/attachment-0006.hs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DoTiming.hs
Type: text/x-haskell
Size: 2200 bytes
Desc: not available
Url : http://projects.haskell.org/pipermail/camp/attachments/20090118/77c5b0c5/attachment-0007.hs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TimeMe.hs
Type: text/x-haskell
Size: 1436 bytes
Desc: not available
Url : http://projects.haskell.org/pipermail/camp/attachments/20090118/77c5b0c5/attachment-0008.hs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Myers.hs
Type: text/x-haskell
Size: 4170 bytes
Desc: not available
Url : http://projects.haskell.org/pipermail/camp/attachments/20090118/77c5b0c5/attachment-0009.hs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HuntSzymanski.hs
Type: text/x-haskell
Size: 8195 bytes
Desc: not available
Url : http://projects.haskell.org/pipermail/camp/attachments/20090118/77c5b0c5/attachment-0010.hs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MyersCheat.hs
Type: text/x-haskell
Size: 4186 bytes
Desc: not available
Url : http://projects.haskell.org/pipermail/camp/attachments/20090118/77c5b0c5/attachment-0011.hs
More information about the Camp
mailing list