-
Notifications
You must be signed in to change notification settings - Fork 17
"Malformed rule expression" when running extras/terneval.py #2
Comments
Yes. Some rules rely on there being a document creation time, and will fail if it doesn't exist. The warnings that it generates in this case are actually harmless (we'd expect those rules not to execute), but they could be tidied up. |
Alright. I'm not sure what the desired behaviour is here, I guess there are a few options: a. silently substitute another date (e.g. Today - though that will almost certainly be wrong) At the moment, I'm happy with (b), as although it's ugly, it provides extra debugging information - though perhaps this isn't necessary, and (c) is a better option. What do you reckon? |
I like c the most |
Alright, let's do that - would you be in favour of something like a one-line report beginning TERNIP: WARNING: - or something softer? |
There's a ternip.warn function which is what generates the current warning. The ideal solution would actually be to use the logging module and then log at different levels I guess... I might quickly rip out ternip.warn and replace it with logging actually - I never liked the monkeypatching that goes on to get ternip.warn to behave sensibly tbh. |
okay, have switched this to use Python's built in error/warning loggers, etc, so you should be able to do "logger.info('Skipping rule due to not having a DCT')" or something like that, then the user can display it however they want to (the annotate_timex script will print this to stdout). I've not actually tested these changes (don't have NLTK installed here), but will do some at home this evening |
In fact, thinking about this more, wouldn't the best possible behaviour be to not run rules requiring a DCT if no DCT is present? That's quite a separate feature request from this over-verbose error reporting though, so if that is a desired behaviour, it might be best off as a new feature request. |
When running extras/terneval.py, the following output is included:
....
chtb_245.eng.sgm
recognition 0.083
extent 0.0
normalisation 0.0
TERNIP: WARNING: Malformed rule expression
Traceback (most recent call last):
File "../ternip/rule_engine/normalisation_rule.py", line 139, in apply
timex.value = eval(self._value_exp)
File "../ternip/rules/normalisation/gutime-date.ruleblock:134:value", line 1, in
File "../ternip/rule_engine/normalisation_functions/relative_date_functions.py", line 179, in compute_offset_base
ref_m = int(ref_date[4:6])
ValueError: invalid literal for int() with base 10: ''
chtb_183.eng.sgm
recognition 0.267
extent 0.0
normalisation 0.0
....
Does compute_offset_base() need to be called only after validating/sanitising the current reference date?
The text was updated successfully, but these errors were encountered: