Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence object with pre-tokenized text #1916

Closed
KDercksen opened this issue Oct 20, 2020 · 3 comments
Closed

Sentence object with pre-tokenized text #1916

KDercksen opened this issue Oct 20, 2020 · 3 comments
Labels
question Further information is requested

Comments

@KDercksen
Copy link
Contributor

Dear Flair team,

I was wondering what the best approach is to use Flair with pre-tokenized text. My data is, broadly speaking, already in the form (["Hi", ",", "how", "are", "you", "Mr.", "President", "?"], ["O", "O", "O", "O", "O", "B-PER", "L-PER", "O"]) for entity tagging. As far as I can see, Sentence does not have functionality for this out of the box. Do I just populate the tokens list manually starting from an empty sentence, or how should I go about this?

Thanks for your awesome work!

@KDercksen KDercksen added the question Further information is requested label Oct 20, 2020
@KDercksen
Copy link
Contributor Author

I think the most elegant way is the following:

text, labels = ["Hi", ...], ["O", ...]

# join tokens with whitespace, then split on whitespace
sentence = Sentence(" ".join(text), use_tokenizer=False)

# add labels
for token, label in zip(sentence, labels):
    token.add_tag("ner", label, confidence=1.0)

Assuming there are no whitespace tokens in the original list, this should work fine.

@AAnirudh07
Copy link

It still works three years later! Thank you

@helpmefindaname
Copy link
Collaborator

I just want to point out, that the Sentence also directly accepts tokens, hence Sentence(tokens) should be more elegant than Sentence(" ".join(tokens), use_tokenizer=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants