Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: CSVClassificationCorpus and tagger #3392

Open
ch-sander opened this issue Jan 11, 2024 · 1 comment
Open

[Question]: CSVClassificationCorpus and tagger #3392

ch-sander opened this issue Jan 11, 2024 · 1 comment
Labels
question Further information is requested

Comments

@ch-sander
Copy link

Question

I trained a custom model with

tag_type='label'
column_name_map = {0: 'text', 1: tag_type}
corpus = CSVClassificationCorpus("input/test",train_file='text.txt',column_name_map=column_name_map, skip_header=True,delimiter=',',label_type=tag_type)
tag_dictionary = corpus.make_label_dictionary(label_type=tag_type, add_unk = True)
        print(tag_dictionary)


char_embeddings = CharacterEmbeddings()
embeddings = StackedEmbeddings([char_embeddings])
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type,
                        use_crf=True)

trainer = ModelTrainer(tagger, corpus)
trainer.train('resources/taggers/' + model_name,
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=num_epochs)
model_path = 'models/flair/' + model_name
tagger.save(model_path)

When I try to tag some sentence, I get []

def tag_text_with_ner(model_path, text):

    tagger = SequenceTagger.load(model_path)

    sentence = Sentence(text)    
    tagger.predict(sentence)

    tagged_entities = []
    for entity in sentence.get_spans('ner'):
        tagged_entities.append((entity.text, entity.tag, entity.score))

    return tagged_entities 
tagged_entities = tag_text_with_ner(model_path, text)
print(tagged_entities)

The prompt is: 2024-01-11 14:34:55,577 SequenceTagger predicts: Dictionary with 15 tags: <unk>, ...
[]

I have changed 'ner' to 'label' -- no difference. It worked fine with ColumnCorpus in the training but I need a CSV for training, not BIO.

@ch-sander ch-sander added the question Further information is requested label Jan 11, 2024
@ch-sander
Copy link
Author

I guess the issue is sequence labeling vs. text classification...yet, I was wondering if NER training can be done via a CSV file as well instead of BIO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant