REGI
AN INTERPRETER FOR EXTENDED REGULAR EXPRESSIONS
REGI (pronounced 'reggae') is an attempt to build an interpreter
for extended regular expressions in Prolog. That is, expressions are not
compiled into finite-state machines, but rather interpreted directly in
Prolog. Here's why I think it's worthwhile:
-
While it is certainly true that interpretation can become very inefficient,
it is sometimes nice to work with an interpreter rather than a compiler,
since for small 'toy-examples' for 'explorative' regular expressions
programming the interpreter is (usually) fast enough, and there's no
time to waste on compilation.
-
And anyway, perhaps 'partial evaluation' with respect to a particular goal
will allow us to obtain a significant increase of efficiency? (Has anybody
tried this?)
-
Since the (kernel) of the program is completely declarative, new insights
into a calculus of regular expressions might be gained just by carefully
studying it.
-
On the top of an interpreter it is might be possible to build a 'development
environment' for regular expressions, exploiting standard logic programming
techniques for generating traces, proofs, and so on.
The set of REGI operators is a subset of the operators over regular languages
and relations defined by Karttunen et al., in the context of the the Xerox
Finite-State Tools (XFST). Notation differs slightly from the XFST notation,
however (it is just that Prolog operators are less flexible than one would
have liked...). For more information about XFST, see http://www.xrce.xerox.com/research/mltt/fst/
.
REGI operators
Basic operators over regular languages
| A + B |
Concatenation |
| A | B (or A ; B) |
Union |
| A & B |
Intersection |
| A* |
Kleene star |
| A - B |
Difference |
Extended operators over regular languages
| A+ |
Kleene plus |
| ~A |
Complement |
| A\B |
Ignore |
| opt(A) |
Optionality |
| $A |
Containment |
| A => B .. C |
Restriction |
Basic operators over regular relations
| A x B |
Crossproduct |
| A o B |
Composition |
Extended operators over regular relations
| A -> B |
Replacement |
| A <- B |
Inverse replacement |
| A1 -> B1, A2 -> B2 |
Parallel replacement |
| A -> B .. C |
Markup |
| A -> B # C .. D |
Conditional replacement |
Brackets, (), are used for grouping expressions.
Special symbols
| [] |
The empty-string language |
| ? |
The any symbol |
Debugging operators
| trace(A) |
Writes A and the pair of strings denoted by A to stdout. |
REGI is written in SICStus Prolog, and is probably not very portable. The source
is here.