-
Notifications
You must be signed in to change notification settings - Fork 1
Regular Expressions
I am starting to do something similar to TextFSM and pyparsing with the Parser tokenizer routines using more human-readable regular expressions, which is all i think TextFSM and similar offer, also with nice pythonification of data into types like "tables" as well i guess. I came to the conclusion that abstracting the question components into "Values" is not that difficult with string.format().
a = '\**\s*(?:a\.?|\ (?a\))'
b = '\**\s*(?:b\.?|\ (?b\))'
c = '\**\s*(?:c\.?|\ (?c\))'
d = '\**\s*(?:d\.?|\ (?d\))'
e = '\**\s*(?:e\.?|\ (?e\))'
l = '.*\s*'
s = '\s+'
regex = r"(\s*{a}{s}{line}{b}{s}{line}{c}{s}{line}(?:{d}{s}{line})(?:{e}.*)?)".format(
a=a, b=b, c=c, d=d, e=e, line=l, s=s
)
p = re.compile(regex, re.IGNORECASE)
First we setup a "rule" like in Paser.chunk() which looks for sets of options, using the Value "{a}", for example, to stand for an option index list like: a. A. a) etc. and breaks up the input based on option-sets currently comprised of at least three and up to five options. It assumes every other token is then the stem content:
r"(\s*{a}{s}{line}{b}{s}{line}{c}{s}{line}(?:{d}{s}{line})(?:{e}.*)?)"
Even better is QuestParser._quest() which includes the test for the stem in the rule which makes each token itself a full question:
r"{i}{s}{body}{a}{s}{body}{b}{s}{body}{c}{s}{body}(?:{d}{s}{body})?(?:{e}{s}{body})?(?={i}{s})"
{s} is the Value for whitespace which can also be abstracted out (with a class wrapper that allows for switching whitespace consideration) making the above expression even simpler:
r"{i}{body}{a}{body}{b}{body}{c}{body}(?:{d}{body})?(?:{e}{body})?(?={i})"
And finally if we wanted to we could abstract out Values for the stem and the options:
stem = '{i}{body}'
options = '{a}{body}{b}{body}{c}{body}(?:{d}{body})?(?:{e}{body})?'
r"{stem}{options}(?={i})"
Right now the Value expressions are rather limited, but the idea is to create the basic concept of the rule and then the actual Value, {a} in our example, can be tweaked to be as wide or narrow as necessary, including numerical option indexes like 1) 2) 3) etc, but basically the rules never change, or if they need to change then we create a separate Parser class to handle them.