Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

"Malformed rule expression" when running extras/terneval.py #2

Open
leondz opened this issue Jun 9, 2011 · 7 comments
Open

"Malformed rule expression" when running extras/terneval.py #2

leondz opened this issue Jun 9, 2011 · 7 comments

Comments

@leondz
Copy link

leondz commented Jun 9, 2011

When running extras/terneval.py, the following output is included:

....

chtb_245.eng.sgm
recognition 0.083
extent 0.0
normalisation 0.0

TERNIP: WARNING: Malformed rule expression
Traceback (most recent call last):
File "../ternip/rule_engine/normalisation_rule.py", line 139, in apply
timex.value = eval(self._value_exp)
File "../ternip/rules/normalisation/gutime-date.ruleblock:134:value", line 1, in
File "../ternip/rule_engine/normalisation_functions/relative_date_functions.py", line 179, in compute_offset_base
ref_m = int(ref_date[4:6])
ValueError: invalid literal for int() with base 10: ''

chtb_183.eng.sgm
recognition 0.267
extent 0.0
normalisation 0.0

....

Does compute_offset_base() need to be called only after validating/sanitising the current reference date?

@cnorthwood
Copy link
Owner

Yes. Some rules rely on there being a document creation time, and will fail if it doesn't exist. The warnings that it generates in this case are actually harmless (we'd expect those rules not to execute), but they could be tidied up.

@leondz
Copy link
Author

leondz commented Jun 10, 2011

Alright. I'm not sure what the desired behaviour is here, I guess there are a few options:

a. silently substitute another date (e.g. Today - though that will almost certainly be wrong)
b. raise a ValueError at the top of each function if it requires a ref_date but doesn't get one, with a more helpful description
c. define a MissingRefdateError class, and raise this, and perhaps catch it (in apply() in normalisation_rule.py) then generate some cleaner output

At the moment, I'm happy with (b), as although it's ugly, it provides extra debugging information - though perhaps this isn't necessary, and (c) is a better option. What do you reckon?

@cnorthwood
Copy link
Owner

I like c the most

@leondz
Copy link
Author

leondz commented Jun 10, 2011

Alright, let's do that - would you be in favour of something like a one-line report beginning TERNIP: WARNING: - or something softer?

@cnorthwood
Copy link
Owner

There's a ternip.warn function which is what generates the current warning. The ideal solution would actually be to use the logging module and then log at different levels I guess... I might quickly rip out ternip.warn and replace it with logging actually - I never liked the monkeypatching that goes on to get ternip.warn to behave sensibly tbh.

@cnorthwood
Copy link
Owner

okay, have switched this to use Python's built in error/warning loggers, etc, so you should be able to do "logger.info('Skipping rule due to not having a DCT')" or something like that, then the user can display it however they want to (the annotate_timex script will print this to stdout).

I've not actually tested these changes (don't have NLTK installed here), but will do some at home this evening

@leondz
Copy link
Author

leondz commented Jul 27, 2011

In fact, thinking about this more, wouldn't the best possible behaviour be to not run rules requiring a DCT if no DCT is present? That's quite a separate feature request from this over-verbose error reporting though, so if that is a desired behaviour, it might be best off as a new feature request.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants