The CorePattern module deconstructs the Pattern tree created by
ReadRegex.parseRegex and returns a simpler Q/P tree with
annotations at each Q node. This will be converted by the TNFA
module into a QNFA finite automata.
Of particular note, this Pattern to Q/P conversion creates and
assigns all the internal Tags that will be used during the matching
process, and associates the captures groups with the tags that
represent their starting and ending locations and with their
immediate parent group.
Each Maximize and Minimize tag is held as either a preTag or a
postTag by one and only one location in the Q/P tree. The Orbit
tags are each held by one and only one Star node. Tags that stop a
Group are also held in perhaps numerous preReset lists.
The additional nullQ::nullView field of Q records the potentially
complex information about what tests and tags must be used if the
pattern unQ::P matches 0 zero characters. There can be redundancy
in nullView, which is eliminated by cleanNullView.
Uses recursive do notation.
2009 XXX TODO: we can avoid needing tags in the part of the pattern
after the last capturing group (when right-associative). This is
flipped for left-associative where the front of the pattern before
the first capturing group needs no tags. The edge of these regions
is subtle: both case needs a Maximize tag. One ought to be able to
check the Pattern: if the root is PConcat then a scan from the end
(start) looking for the first with an embedded PGroup can be found
and the PGroup free elements can be wrapped in some new PNOTAG
semantic indicator.
|