You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 28, 2024. It is now read-only.
Places a dd/mm/yyyy date into one token, whereas ANNIE will give us a SpaceToken, followed by tokens of '31', '/', '12', '/', '2010', and another SpaceToken.
This should be fixed in the GATE plugin (the preprocessing/postprocessing JAPE), so that the ANNIE Tokeniser's output can be mapped slightly more closely to the results of the NLTK tokeniser. It may also help to specify (if not already done) the tokenisation scheme that NLTK expects, to help in other situations where the upstream tokeniser is switched out from the default.
The text was updated successfully, but these errors were encountered:
GATE's ANNIE tokeniser splits on different boundaries to TERNIP's (NLTK). This can cause many TERNIP rules to not match. For example,
Places a dd/mm/yyyy date into one token, whereas ANNIE will give us a SpaceToken, followed by tokens of '31', '/', '12', '/', '2010', and another SpaceToken.
This should be fixed in the GATE plugin (the preprocessing/postprocessing JAPE), so that the ANNIE Tokeniser's output can be mapped slightly more closely to the results of the NLTK tokeniser. It may also help to specify (if not already done) the tokenisation scheme that NLTK expects, to help in other situations where the upstream tokeniser is switched out from the default.
The text was updated successfully, but these errors were encountered: