Text.Regex.Lazy
Version 0.70 (2006-08-10)
By Chris Kuklewicz (TextRegexLazy (at) personal (dot) mightyreason (dot) com)
Changes from 0.66 to 0.70
- regex-tre added for libtre backend (Text.Regex.TRE), see http://laurikari.net/tre/
- regex-devel added for tests and benchmarks
- Text.Regex.*.Wrap APIs improved: the exported wrap* functions
never call fail or error under normal circumstances, and use Either
types to report errors. Allocation failures are reported with fail.
- Text.Regex.*.(ByteString|String) all should export
compile/execute/regexec functions which report errors using Either.
Changes from 0.55 to 0.66
- I broke this into many packages, regex-base for the interface and regex-pcre, regex-posix, regex-parsec, regex-dfa for the four backends and regex-compat to replace Text.Regex(.New)
- The top level Makefile now can drive setup and installation of all the packages at once.
Changes from 0.44 to 0.55
- JRegex has been assimilated: PCRE and PosixRE are here.
The JRegex-style API rocks, see below and Context.hs and Example.hs
- Haddock seems to run via ./setup haddock, but the documentation is very thin
- ./setup test runs TestTestRegexLazy binary if uncommented in cabal file
- default is now to compile with -Wall -Werror -O2
- You may need to point the cabal file's "Extra-Lib-Dirs" to point to pcre.
- You may or may not need a "-lpcre" option to ghc when building
projects that depend on Text.Regex.Lazy now.
Changes from 0.33 to 0.44
- Cabal
- Compile with -Wall -Werror
- Change DFAEngineFPS from Data.FastPackedString to Data.ByteString
See the LICENSE file for details on copyright. See README for building instructions.
The new API is very close to JRegex and supports 4 backends:
- Posix, the standard c regex library
- PCRE, the Perl Compatible Regular Expressions c library
- Full, the lazy Parsec based library (see old api below)
- DFA, the fast lazy matching library (see old api below)
And for all backends, there are two types that can be used as a source
of regular expressions or to match a regular expression against:
String, and ByteString. The ByteString library will be in the next
GHC and can be gotten
from http://www.cse.unsw.edu.au/~dons/fps.html.
For simplest use of the new API: import Text.Regex.Lazy and one of
import Text.Regex.PCRE((=~),(=~~))
import Text.Regex.Parsec((=~),(=~~))
import Text.Regex.DFA((=~),(=~~))
import Text.Regex.PosixRE((=~),(=~~))
import Text.Regex.TRE((=~),(=~~))
The things you can demand of (=~) and (=~~) are all
instance defined in Text.Regex.Impl.Context and they are used
in Example.hs as well.
You can redefine (=~) and (=~~) to use different options by using makeRegexOpts:
(=~) :: (RegexMaker Regex CompOption ExecOption source,RegexContext Regex source1 target) => source1 -> source -> target
(=~) x r = let q :: Regex
q = makeRegexOpts (some compoption) (some execoption) r
in match q x
(=~~) ::(RegexMaker Regex CompOption ExecOption source,RegexContext Regex source1 target,Monad m) => source1 -> source -> m target
(=~~) x r = let q :: Regex
q = makeRegexOpts (some compoption) (some execoption) r
in matchM q x
There is a medium level API with functions compile/execute/regexec in
all the Text.Regex.*.(String|ByteString) modules. These allow for
errors to be reported as Either types when compiling or running.
The low level APIs are in the Text.Regex.*.Wrap modules. For the
c-library backends these expose most of the c-api in wrap* functions
that make the type more Haskell-like: CString and CStingLen and
newtypes to specify compile and execute options. The actual foreign
calls are not exported; it does not export the raw c api.
Also, Text.Regex.PCRE.Wrap will let you query if it was compiled with
UTF8 suppor: configUTF8 :: Bool. But I do not provide a way
to marshall to or from UTF8. (If you have a UTF8 ByteString then you
would probably be able to make it work, assuming the indices PCRE uses
are in bytes, otherwise look at the wrap* functions which are a thin
layer over the pcreapi).
The old Text.Regex API is can be replaced. If you need to be drop in
compatible with Text.Regex then you can
import Text.Regex.New and report any infidelities as bugs.
Some advantages of Text.Regex.Parsec over Text.Regex:
- It does not marshal to and from c-code arrays, so it is much
faster on large input strings.
- It consumes the input String in a mostly lazy manner.
This makes streaming from input to output possible.
- It performs sanity checks so that subRegex
and splitRegex don't loop or go crazy if the pattern
matches an empty string -- it will just return the input.
- If the String regex does not parse then you get a nicer error
message.
Internally it uses Parsec to turn the string regex into
a Pattern data type, simplify the Pattern, then
transform the Pattern into a Parsec parser that
accepts matching strings and stores the sub-strings of parenthesized
groups.
All of this was motivated by the inability to use Text.Regex
to complete
the regex-dna
benchmark on The
Computer Language Shootout. The current entry there, by Don
Stewart and Alson Kemp and Chris Kuklewicz, does not use this Parsec
solution, but rather a custom DFA lexer from the CTK library.