Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add reduce transformer vocab plugin #3217

Merged
merged 7 commits into from
Oct 23, 2023

Conversation

helpmefindaname
Copy link
Collaborator

No description provided.

@helpmefindaname helpmefindaname marked this pull request as ready for review April 24, 2023 21:47
@helpmefindaname helpmefindaname force-pushed the reintroduce_transformer_smaller_training_vocab branch 2 times, most recently from 167f0eb to 976c925 Compare April 30, 2023 13:34
@helpmefindaname helpmefindaname force-pushed the reintroduce_transformer_smaller_training_vocab branch from 976c925 to 1737471 Compare July 17, 2023 14:42
@helpmefindaname helpmefindaname force-pushed the reintroduce_transformer_smaller_training_vocab branch from 1737471 to 2ef4187 Compare August 7, 2023 15:37
@helpmefindaname helpmefindaname force-pushed the reintroduce_transformer_smaller_training_vocab branch from 2ef4187 to 8bff328 Compare October 2, 2023 09:33
@helpmefindaname helpmefindaname force-pushed the reintroduce_transformer_smaller_training_vocab branch from 8bff328 to f399f41 Compare October 16, 2023 08:06
@helpmefindaname helpmefindaname force-pushed the reintroduce_transformer_smaller_training_vocab branch from f399f41 to 5c1e3de Compare October 23, 2023 14:43
@alanakbik
Copy link
Collaborator

Looks great, thanks for adding this @helpmefindaname!

Tested locally and got a 25% increase in training speed for this script with reduce_transformer_vocab=True compared to reduce_transformer_vocab=False:

from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# 1. get the corpus
corpus: Corpus = TREC_6()

# 2. what label do we want to predict?
label_type = "question_class"

# 3. create the label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# 4. initialize transformer document embeddings (many models are available)
document_embeddings = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True)

# 5. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)

# 6. initialize trainer
trainer = ModelTrainer(classifier, corpus)

# 7. run training with fine-tuning
trainer.fine_tune(
    "resources/taggers/question-classification-with-transformer",
    reduce_transformer_vocab=True,  # set this to False for slow version
    learning_rate=5.0e-5,
    mini_batch_size=4,
)

@alanakbik alanakbik merged commit ed53c42 into master Oct 23, 2023
1 check passed
@alanakbik alanakbik deleted the reintroduce_transformer_smaller_training_vocab branch October 23, 2023 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants