Sentence object with pre-tokenized text #1916

KDercksen · 2020-10-20T14:12:12Z

Dear Flair team,

I was wondering what the best approach is to use Flair with pre-tokenized text. My data is, broadly speaking, already in the form (["Hi", ",", "how", "are", "you", "Mr.", "President", "?"], ["O", "O", "O", "O", "O", "B-PER", "L-PER", "O"]) for entity tagging. As far as I can see, Sentence does not have functionality for this out of the box. Do I just populate the tokens list manually starting from an empty sentence, or how should I go about this?

Thanks for your awesome work!

The text was updated successfully, but these errors were encountered:

KDercksen · 2020-10-21T07:18:27Z

I think the most elegant way is the following:

text, labels = ["Hi", ...], ["O", ...]

# join tokens with whitespace, then split on whitespace
sentence = Sentence(" ".join(text), use_tokenizer=False)

# add labels
for token, label in zip(sentence, labels):
    token.add_tag("ner", label, confidence=1.0)

Assuming there are no whitespace tokens in the original list, this should work fine.

AAnirudh07 · 2024-02-07T13:29:02Z

It still works three years later! Thank you

helpmefindaname · 2024-02-08T10:45:03Z

I just want to point out, that the Sentence also directly accepts tokens, hence Sentence(tokens) should be more elegant than Sentence(" ".join(tokens), use_tokenizer=False)

KDercksen added the question Further information is requested label Oct 20, 2020

KDercksen closed this as completed Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentence object with pre-tokenized text #1916

Sentence object with pre-tokenized text #1916

KDercksen commented Oct 20, 2020

KDercksen commented Oct 21, 2020

AAnirudh07 commented Feb 7, 2024

helpmefindaname commented Feb 8, 2024

Sentence object with pre-tokenized text #1916

Sentence object with pre-tokenized text #1916

Comments

KDercksen commented Oct 20, 2020

KDercksen commented Oct 21, 2020

AAnirudh07 commented Feb 7, 2024

helpmefindaname commented Feb 8, 2024