-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
list dependency for an apparent appositive #536
Comments
The document has a bunch of "NAME, JOB TITLE" combos. I'm not sure if |
The relevant part of the document:
and so on. I think |
Why is "7." tokenized apart? It doesn't actually contain a period, right? I thought it was just a list marker as a whole. |
Punctuations in list item markers are tokenized off in EWT. |
Hm, not sure if we have the energy to standardize this, but it does seem jarring to me, since it really doesn't mean anything. In ON they are mostly untokenized, though I see there are quite a few exceptions. GUM-style corpora are 100% untokenized as well. |
moved tokenization discussion to a new issue: #543 The question for this issue is whether we need to change |
Oh, certainly, wasn't trying to argue about that, I just noticed the LS thing. Thanks for opening the other issue! |
The text was updated successfully, but these errors were encountered: